CAIDA CoralReef Exercises
Introduction to CoralReef Applications
CoralReef
is a set of programs for working with
OcxMONnetwork monitors
and trace files that have been collected from OCxMON monitors. OCxMON
monitors collect data from ATM links carrying IP traffic at either OC3
(155Mbps --OC3MON) or OC12 (622Mbps
--OC12MON). (OC48MON is under development and plans for OC192MON are being made.)
A large number of ``packet header traces'', containing the first 48
bytesWhy 48 bytes?
of each IP packet are available at
http://moat.nlanr.net/Traces/
Typically the traces include data from two interfaces, one collecting
data from each direction.
The CoralReef package is available from
https://www.caida.org/tools/measurement/coralreef/.
Coral reef is intended as a developers environment to allow people to
develop code that works with traces and live monitors. There are a
number of sample programs included with CoralReef. Below are brief
notes about some of them.
Applications
crl\_encode
crl\_encode is used to protect the privacy of the users of the link
being monitored. It encodes the source and destination IP addresses
and removes any cell payload that might have been captured. Each IP
address is mapped to a constant encoded form for this particular
trace. The mapping varies from trace to trace.
All the traces available for these exercises are encoded traces.
crl\_flowscnt
This program counts the number of IP flows in the the connection. An
IP flow is defined as a sequence of packets (possible interleaved with
other packets for other flows) where the IP source and destination
addresses, the IP protocol field and the UDP or TCP source and
destination port number (if the protocol is TCP or UDP) are the same.
The program reports statistics about the number of flows that it finds
in the trace when it is divided into various sized time buckets. That
is the trace is first divided into time intervals of, say, 0.01
seconds and the minimum average median and max number of flows measured in a
bucked is reported. The trace is divided into buckets of a different
sizes and statistics for the number of flows of these new sized bins
are reported.
crl\_hist
crl\_hist produces a number of summaries of packet lengths suitable
for producing histograms.
The output contains a number of components:
1. The output begins with some summary statistics about the trace
including the number of packets, total bytes sent and the number of
non-IP packet.
2. A breakdown of the packets by length. For each
different length, from 1 to the length of the longest packet the
following values are printed:
- number of packets of that length
- the number of packets of that length or smaller
- the percentage of packets of that length or smaller
- item the number of bytes in packets of that length
- item the number of bytes in packets of that length or smaller and
- item the percentage of bytes in packets of that length or smaller is printed
3. Run lengths for various ranges of packet sizes. For each of the
following ranges the number and percentage of packets trains, where
all packets are in the given size range are printed. The size ranges
are:
- 0-44
- 45-90
- 91-180
- 181-260
- 261-576
- 577-1120
- 1121-
4. Traffic by protocol. For each IP protocol (TCP, UDP, ICMP etc) prints
the number of and percentage of packets of that size and the number
and percentage of bytes in packets of that size (including the IP
header).
The protocols are presented as numbers, e.g. 6 for TCP. The mapping
from IP protocol number to name can be found on most Unix systems in
the file /etc/protocols
5. The next two sections of the output are similar breakdowns of just the
TCP traffic by TCP ports. The first on the source port and the second
on the destination port.
The ports, one of which identifies the type of application the the
other provides for unique identification of a connection, are
presented as numbers, e.g. 80 for http traffic. The mapping from IP
port number to name can be found on most Unix systems in the file
/etc/services. Note that many values are used both for TCP and
UDP.
6. The next section contains an entry for each pair of source and
destination ports. Before this table is generated port $>=$ 1024 are
replaced with 0. For `well known' applications (like telnet and http
for example) the port number that identifies the application is $<$
1024. So heuristic, in most cases, classifies communication by
application type.
7. The next sections are a repeat of the packet size distribution as
described in~{sizedist} above but for each TCP protocol.
8. The final sections of the report repeat the TCP sections but for UDP.
Note:
There is a bug in this program for release 3.1.1 of coral. To
correct it add the line:
use lib "/usr/local/Coral/lib"; #coral global libdir#
before:
use lib "../../lib";
at line 54 of the perl script in your
install/bin directory (/usr/local/Coral/bin on most systems).
crl\_hostmatrix
This program performs a similar function to crl\_ipmatrix except that it
resolves the IP addresses into domain names by doing an in-addr
(reverse) DNS lookup.
Notes: There is a bug in this program for release 3.1.1 of coral. To
correct it remove the -f from
open(INFILE, "${script_path}crl_ipmatrix -f $ARGV[0] |")
This program is not useful on traces that have the IP addresses
encoded.
crl\_ipmatrix
This program produces a table of source and destination IP addresses
with a count of how many packets went in each direction between these
hosts. Before the main section the program prints the total number of
packets and the total number of bytes in the trace.
crl\_info
Prints information about the trace and the system it was collected on.
crl\_ipaddr
For each packet in the trace print the source and destination address
and the protocol being carried by the IP packet.
crl\_ipmatrix
Scans the trace for each IP source and destination pair. At the end
of the trace it prints the IP source, IP destination, number of
packets and number of bytes transfered.
crl\_llcsnap
Print out the all of the combinations of ATM headers
and LLC/snap values in the trace, separated out by interface. The
number of occurrences is shown first.
crl\_locality
Calculate source and destination address histograms for a Coral format
packet dump.
crl\_nonip
Print the LLC/SNAP header for each non-IP packet in the trace.
crl\_portsummary
Scans the trace and at the end of the trace prints a line for every
unique (interface, protocol, source port, destination port, packet\_size)
value. The line contains the interface, the IP protocol number, a
flag indicating whether the ports fields are valid for this protocol (1
means valid, 0 means invalid), the source port, the destination port,
the packet length and the number of packets that were in the trace
with these parameter.
To find the total traffic, with a given source and destination port,
each line that matches the required source and destination pair must
be found (these will be for different packet sizes). The packet size
multiplied by the number of packets should be calculated and added to
the total traffic.
crl\_print
Prints ATM cells in hex and ASCII. ATM headers are expanded to a more
readable form, unless the "-r" (raw) option is specified.
crl\_sample
An example program. It has a few more comments than normal!
Scans the trace file and at the end of the file prints the number of
IP packets, the number of data bytes transfered in IP packets. Then
the number of TCP packets and the number of IP data bytes in IP
packets that carried TCP packets is printed. I.E. all byte counts are
IP payload byte counts. That is the header is not included in the
byte count for IP and is included for TCP and the other protocols. It
also prints the number of TCP acks, TCP pushes, number of packets per
protocol, and the number of bytes per protocol.
Finally the number of packets and the byte count is printed for each
protocol carried inside an IP packet (i.e. based on the IP protocol
field.) This, of course, includes TCP, UDP, ICMP etc.
crl\_skel
This program is intended to form an outline for your own coral
programs. It is a generic C skeleton for libcoral based ATM cell
analysis applications. As it is provided it expects standard coral
command line arguments and prints the cells it receives.
crl\_timestamps
Prints one line per cell in trace. Each line has first column the
interface of the cell, then the absolute timestamp of that cell, then
the delta timestamp to previous cell on same interface.
crl\_toascii
Converts a CoralReef format file (or any format that coral reef
recognises) to ASCII.
The output contains: the interface, the total number of packets
processed so far, the time (in seconds), the time since the previous
packet, the IP source, the IP destination, the number of data bytes
carried in the IP packet, the source port and the destination port.
If the protocol is not TCP or UDP the source and destination ports are
(0, 0).
Note: Coral-3.0 has a bug in this perl script. The call to the
method config_get should be config_options.
crl\_tofr
Converts the input file to fr trace format.
Note: Coral-3.0 has a bug in this perl script. The call to the
method config_get should be config_options.
crl\_tofr+
Converts the input file to fr trace format.
Note: Coral-3.0 has a bug in this perl script. The call to the
method config_get should be config_options.
crl\_tos
Prints the percentage of packets with different values of the IP TOS
field on each and both interfaces.
vpvp
Collects information about the vpvc traffic from the trace. For each
vp:vc it reports the number of packets, number of bytes and number of
non-ip packets.
The output concludes with summary information for the hash table,
which is probably not of much interest unless you have very large
traces that are slow to analyse.
Notes:
1. Input data
While this assignment sheet introduces CoralReef from the perspective
of trace files it will also work in real time, directly on an OCcMON
monitor that is collecting data from a live network.
The programs described above can be used directly or the data may be
captured to a trace file using crl_trace
2. Producing Graphs
Several of the coral programs produce output in a form suitable for
producing graphs. For example the following commands could be used to
produce a graph of packet size against cumulative bytes.
crl\_hist ../../ODU-926843183.crl.enc.gz | awk '{ print $1 " " $5 }' ) > /tmp/p
edit /tmp/p so that it contains the data for a single interface
gnuplot plot "/tmp/p" with linespoints
3. IP Addresses
Many (probably all) of the traces that are publicly available,
including the NLANR traces, have the addresses encrypted to protect
the anonymity of the network users. This restricts the usefulness of
some of the applications.