



An analysis by Maurizio Dusi -Universita' degli studi di Brescia, Brescia, Italy- and Wolfgang John -Chalmers University of Technology, Gothenburg, Sweden. For more information contact {mdusi,johnwolf}@caida.org
The paper Estimating Routing Symmetry on Single Links by Passive Flow Measurements will be presented at the 1st International Workshop on TRaffic Analysis and Classification (TRAC) - (link no longer available) colocated with the 6th International Wireless Communications & Mobile Computing Conference (IWCMC 2010) in June 2010.Introduction
Many share a naive assumption about the Internet that traffic on a given link is approximately symmetric, meaning that both directions of a conversation flow across the same physical link. Many developers even embed this assumption in their traffic classification tools [1, 2]. In fact, except at network edges, Internet traffic is often routed asymmetrically [3], which will impair or invalidate the results of tools and models that assume otherwise.
An important cause of this asymmetry is "hot-potato routing" [4], the business practice of configuring traffic crossing one's network to exit as soon as possible, minimizing resource consumption, and thus cost, of one's own infrastructure. Particularly common in commercial settlement-free peering agreements, hot-potato routing implies that the network on the receiving side of a packet will bear higher cost per received packet. The underlying assumption is that if both networks in a settlement-free peering agreement follow this practice, it will even out, and both sides will share evenly in carrying traffic exchanged by their customers.
Another cause of asymmetric traffic is link redundancy, or alternative paths within networks. Since routing decisions occur independently for each packet, load-balancing algorithms may cause packets destined to the same endpoint to follow different paths. Other traffic engineering techniques, e.g., policy-based SPF (Shortest Path First), may also induce asymmetry in internal routing state of large provider networks.
Passive measurements
To quantify traffic symmetry on some real Internet data, we consider traffic samples collected from different types of Internet links: a link inside a Tier2 network, the link between a Tier2 and a Tier1 network, and two Tier1 ISP backbone links. The first two are OC192 links in Swedish networks: GigaSUNET, operative until 2006, and OptoSUNET's connection to NORDUnet. The last two are OC192 backbone links from a single Tier1 ISP in the U.S. Table 1 and Figures 1a-1c list characteristics of each trace; we provide more details in the Data section.
For a given time interval, we say that the five-tuple defining a flow (source and destination IP, port numbers and protocol) generates symmetric traffic on the link if we observe packets in both directions of the IP address/port pair via the specific transport protocol. We quantify the fraction of symmetric traffic according to three different granularities: five-tuple (or "flow"), packets, and bytes. For example, the packet-level symmetry is the fraction of packets sent by tuples exchanging bidirectional traffic.
data length | flows | pkt/s | bytes/s | network location | geogr. location | |
gigasunet 2006-04 | 2x20mins | 3.8M | 154Kp/s | 104MB/s | Tier2 backbone | Sweden |
optosunet 2009-01 | 4x10mins | 36M | 368Kp/s | 217MB/s | Tier2-Tier1 connection | Sweden |
eq-chicago 2008-04 | 1x1hour | 119M | 717Kp/s | 496MB/s | Tier1 backbone | Illinois-Washington |
eq-chicago 2008-05 | 1x1hour | 134M | 936Kp/s | 762MB/s | Illinois-Washington | |
eq-chicago 2008-06 | 1x1hour | 173M | 1111Kp/s | 880MB/s | Illinois-Washington | |
eq-sanjose 2008-07 | 1x1hour | 145M | 680Kp/s | 376MB/s | California | |
Methodology
We split each traffic trace into one-minute chunks, then collate a list of unique five-tuple observed in each direction. For each tuple, we track packets and bytes exchanged across the link. Using a short window leads to border effects, where long-lasting flows may cross many intervals, or short symmetric flows seem asymmetric if packet exchange occurs at the edge of a window. To evaluate the impact of such border effects, we repeated the analysis with observation windows of five and ten minutes. However, larger windows increase the probability of observing multiple flows with an identical tuple within one interval, although such overlap only occurred twice within 10 minutes in our data sets.
Figures 2a-2c plot the percentage of symmetric IP traffic for each trace, in terms of tuples (left), packets (center) and bytes (right) exchanged on the link. Table 2 reports the values of that symmetry (average percentage and standard deviation). On the GigaSUNET link, inside a Tier2 network, most traffic (around 55% of packets and 74% of bytes) is routed symmetrically. Asymmetric traffic is caused by hot-potato routing, due to peering with a regional access point. Another reason for asymmetry is the ring architecture, which does not guarantee that traffic always transits via the shortest path (which may include the tapped link). On the OptoSUNET link, which connects a Tier2 to a Tier1 network, only about 7% of tuples generate symmetric traffic. On this link asymmetry can be explained by an alternative (redundant) route (40Gbit/s) between SUNET and NorduNet. Some SUNET customers are also connected to regional exchange points, so hot-potato routing is also a likely cause. The tuples that generate symmetric traffic are responsible for the 26% of packets and 33% of bytes on the link. On the two Tier1 backbone links, hot-potato routing with high aggregation leads to high asymmetry on the link: only around 2% of tuples generate traffic routed symmetrically, which represents the 8-10% of packets and 9-10% of bytes on the observed links.
IP traffic | |||||||||
time window/link | gigasunet 2006-04 | optosunet 2009-01 | eq-chicago 2008-04 | eq-chicago 2008-05 | eq-chicago 2008-06 | eq-sanjose 2008-07 | |||
tuples | 1min | 57.23 ± 3.82 | 7.42 ± 0.69 | 2.39 ± 0.19 | 2.76 ± 0.11 | 2.22 ± 0.04 | 2.86 ± 0.06 | ||
5min | 55.06 ± 3.57 | 7.17 ± 0.67 | 2.38 ± 0.05 | 2.73 ± 0.05 | 2.15 ± 0.02 | 2.89 ± 0.05 | |||
10min | 54.33 ± 3.64 | 7.15 ± 0.70 | 2.40 ± 0.05 | 2.75 ± 0.05 | 2.15 ± 0.02 | 2.91 ± 0.03 | |||
packets | 1min | 67.17 ± 5.70 | 26.30 ± 2.25 | 8.13 ± 0.49 | 8.88 ± 0.45 | 10.06 ± 0.52 | 8.93 ± 0.57 | ||
5min | 67.23 ± 5.96 | 26.31 ± 2.30 | 8.19 ± 0.15 | 8.90 ± 0.28 | 10.07 ± 0.43 | 8.96 ± 0.49 | |||
10min | 67.24 ± 6.43 | 26.32 ± 2.47 | 8.20 ± 0.13 | 8.91 ± 0.26 | 10.07 ± 0.39 | 8.97 ± 0.42 | |||
bytes | 1min | 74.02 ± 2.25 | 33.83 ± 3.27 | 9.09 ± 0.52 | 8.68 ± 0.37 | 10.43 ± 0.65 | 10.30 ± 0.80 | ||
5min | 74.06 ± 2.00 | 33.81 ± 3.17 | 9.13 ± 0.18 | 8.69 ± 0.21 | 10.42 ± 0.56 | 10.31 ± 0.67 | |||
10min | 74.06 ± 1.97 | 33.81 ± 3.40 | 9.14 ± 0.16 | 8.69 ± 0.14 | 10.42 ± 0.52 | 10.33 ± 0.56 | |||
![]() ![]() ![]() |
A more refined definition of asymmetry would filter out traffic that is inherently asymmetric, such as UDP and ICMP protocols which do not always require packet recipients to reply. Another cause of traffic asymmetry overestimation is background radiation, such as network scanning and probing, which can be a substantial fraction of total flows on some links [5]. Would-be attackers typically execute scans using short uni-directional one-packet flows, asymmetric by definition, trying to identify vulnerable active hosts on a network. If the specific scanned endpoint does not exist, typically no response is sent, depending on host and firewall configurations. To evaluate the impact of such activity on our traffic symmetry estimates, we repeated our analysis considering TCP traffic only, and then further limited the analysis to TCP data traffic. We filtered out ICMP traffic, which includes some scanning activity, together with UDP traffic (Figures 1a-1c shows traffic percentages by protocols we filtered). For the second pass, we also discarded TCP sessions for which we did not observe a successful three-way handshake, which filters out SYN flood attacks and other unsuccessful connection attempts. This filtering was done by only considering TCP packets with ACK bit set and without SYN/FIN/RST flags set and removed around the 20-30% of TCP flows, 2-10% of TCP packets and less than 1% of TCP bytes from the traffic traces. These numbers provide a reasonable estimate for the amount of TCP background radiation on the links measured (see Table 3 for more details).
% of TCP flows | % of TCP packets | % of TCP bytes | |
gigasunet 2006-04 | 29.85% | 9.83% | 0.58% |
optosunet 2009-01 | 31.14% | 2.06% | 0.22% |
eq-chicago 2008-04 | 19.19% | 5.60% | 0.51% |
eq-chicago 2008-05 | 23.62% | 4.31% | 0.34% |
eq-chicago 2008-06 | 21.77% | 3.85% | 0.32% |
eq-sanjose 2008-07 | 25.27% | 8.04% | 0.83% |
Figures 3 and 4 plot the percentage of symmetric traffic for each trace when considering only TCP and TCP data traffic. Table 4 and 5 reports the values of that symmetry (average percentage and standard deviation). Unsurprisingly, the fraction of tuples which generate symmetric traffic is greater than when considering all IP traffic, although the packet and byte values barely change, consistent with the fact that we primarily filtered out signaling traffic without significant payload.
TCP traffic | |||||||||
time window/link | gigasunet 2006-04 | optosunet 2009-01 | eq-chicago 2008-04 | eq-chicago 2008-05 | eq-chicago 2008-06 | eq-sanjose 2008-07 | |||
tuples | 1min | 59.70 ± 5.09 | 9.26 ± 0.74 | 3.39 ± 0.27 | 4.34 ± 0.21 | 3.91 ± 0.11 | 3.17 ± 0.09 | ||
5min | 55.92 ± 5.08 | 8.48 ± 0.70 | 3.54 ± 0.09 | 4.49 ± 0.19 | 4.08 ± 0.07 | 3.23 ± 0.06 | |||
10min | 54.91 ± 4.72 | 8.33 ± 0.71 | 3.58 ± 0.07 | 4.61 ± 0.12 | 4.11 ± 0.07 | 3.26 ± 0.03 | |||
packets | 1min | 67.19 ± 5.90 | 25.98 ± 1.85 | 8.90 ± 0.53 | 9.80 ± 0.41 | 11.28 ± 0.59 | 9.07 ± 0.66 | ||
5min | 67.22 ± 6.15 | 25.98 ± 1.84 | 8.96 ± 0.15 | 9.84 ± 0.21 | 11.29 ± 0.49 | 9.08 ± 0.57 | |||
10min | 67.23 ± 6.64 | 25.99 ± 1.97 | 8.97 ± 0.14 | 9.83 ± 0.14 | 11.30 ± 0.45 | 9.10 ± 0.50 | |||
bytes | 1min | 74.59 ± 2.57 | 34.51 ± 2.75 | 9.54 ± 0.54 | 9.02 ± 0.38 | 10.91 ± 0.68 | 10.96 ± 0.85 | ||
5min | 74.62 ± 2.38 | 34.48 ± 2.54 | 9.58 ± 0.18 | 9.01 ± 0.24 | 10.90 ± 0.59 | 10.96 ± 0.70 | |||
10min | 74.62 ± 2.40 | 34.47 ± 2.67 | 9.59 ± 0.16 | 9.03 ± 0.14 | 10.90 ± 0.55 | 10.98 ± 0.58 | |||
![]() ![]() ![]() |
TCP data traffic | |||||||||
time window/link | gigasunet 2006-04 | optosunet 2009-01 | eq-chicago 2008-04 | eq-chicago 2008-05 | eq-chicago 2008-06 | eq-sanjose 2008-07 | |||
tuples | 1min | 70.71 ± 2.86 | 10.94 ± 1.04 | 3.85 ± 0.29 | 4.99 ± 0.25 | 4.18 ± 0.10 | 3.87 ± 0.12 | ||
5min | 69.01 ± 2.88 | 10.53 ± 1.08 | 4.00 ± 0.10 | 5.36 ± 0.09 | 4.41 ± 0.08 | 4.05 ± 0.09 | |||
10min | 68.61 ± 2.97 | 10.46 ± 1.17 | 4.03 ± 0.08 | 5.48 ± 0.08 | 4.48 ± 0.06 | 4.12 ± 0.07 | |||
packets | 1min | 75.21 ± 2.77 | 26.21 ± 1.89 | 9.01 ± 0.54 | 9.85 ± 0.42 | 11.40 ± 0.62 | 9.31 ± 0.72 | ||
5min | 75.24 ± 2.76 | 26.21 ± 1.88 | 9.07 ± 0.16 | 9.88 ± 0.22 | 11.40 ± 0.52 | 9.32 ± 0.62 | |||
10min | 75.24 ± 2.90 | 26.21 ± 2.02 | 9.08 ± 0.15 | 9.89 ± 0.15 | 11.41 ± 0.47 | 9.34 ± 0.54 | |||
bytes | 1min | 75.11 ± 2.97 | 34.55 ± 2.76 | 9.55 ± 0.54 | 9.02 ± 0.39 | 10.92 ± 0.69 | 10.99 ± 0.86 | ||
5min | 75.13 ± 2.86 | 34.52 ± 2.54 | 9.60 ± 0.18 | 9.03 ± 0.22 | 10.91 ± 0.59 | 11.00 ± 0.71 | |||
10min | 75.13 ± 2.94 | 34.51 ± 2.68 | 9.60 ± 0.16 | 9.03 ± 0.14 | 10.90 ± 0.55 | 11.01 ± 0.59 |
![]() ![]() ![]() |
Conclusions
We have provided some insight into asymmetric routing on a variety of links where we could obtain traffic samples: inside a Tier2 network, between a Tier2 and a Tier1 network, and on two Tier1 backbone links. As we move from the network edge to core Internet backbone links which heavily aggregate flows, traffic asymmetry due to hot-potato routing increases, reducing the likelihood of being able to observe and study complete, bidirectional flows on a single link. Our quantitative metrics of symmetry use the notion of flow tuples, defined as (src IP, dst IP, transport protocol, src port, dest port), which are unlikely to overlap especially for short observation windows. Longer observation windows do not significantly change the value of the symmetry metrics, as shown in Tables 2 and 4-5. Across time and space, traffic asymmetry is actually a stable equilibrium for many links on the modern Internet. Unless intended only for stub access links with no path diversity, traffic analysis tools and methods must thus assume little traffic symmetry.
We also showed that inherently asymmetric traffic, such as UDP, ICMP and TCP background radiation, affects symmetry assessments when defined in terms of tuples. Filtering out such traffic increases the fraction of symmetric tuples, although has little effect on packet and byte symmetry factors. Such a normalized symmetry metric allows fair comparison of symmetry across different links with substantially different traffic decomposition. It is appropriate to judge symmetry considering only TCP data traffic, since TCP is not only an inherently symmetric protocol but also the dominant transport protocol on most modern Internet links (see Figures 1a-1c and [6 - (link not available), 7 - (link not available)]), although this may be changing [8].
References
- [1] A. W. Moore and D. Zuev, Internet traffic classification using bayesian analysis techniques, SIGMETRICS Perform. Eval. Rev., vol. 33(1), 2005.
- [2] L. Bernaille, R. Teixeira, and K. Salamatian, Early Application Identification, in the 2nd ADETTI/ISCTE CoNEXT Conference, 2006.
- [3] W. John and S. Tafvelin, Differences between in- and outbound Internet Backbone Traffic, in Proceedings of Terena Networking Conference, 2007.
- [4] http://en.wikipedia.org/wiki/Hot_potato_routing
- [5] W. John and S. Tafvelin and T. Olovsson, Trends and Differences in connection-behavior within classes of Internet Backbone Traffic, in Proceedings of Passive and Active Measurement Conference, 2008.
- [6] W. John and S. Tafvelin, Analysis of Internet Backbone Traffic and Header Anomalies observed, in Proceedings of Internet Measurement Conference 2007.
- [7] C. Fraleigh et al., Packet-level traffic measurements from the sprint IP backbone, Network, IEEE, vol. 17(6), 2003.
- [8] https://www.caida.org/research/traffic-analysis/classification-overview
Appendix: Data
The data from GigaSUNET was collected on the outside point of an SDH ring running Packet over SONET (PoS), which is the primarily link from the region of Gothenburg to the main Internet outside Sweden. Connected sites include two major universities, a large student residential network, several smaller University Colleges and research institutes, and a regional access point exchanging traffic with commercial, local ISPs. The GigaSUNET dataset analyzed includes two traces of 20 minutes, collected on April 13 and 17 2006 at different times of day. The traces consist of 370 million IP packets, carrying 250 GBytes of data in 3.8 million flows.
The former ring architecture has been upgraded to OptoSUNET, a star structure over leased fibre. SUNET customers (all Swedish Universities, many student networks and research institutes) are redundantly connected to a central Internet access point in Stockholm. Besides some local exchange traffic, the traffic routed to international commodity Internet is carried on two links (40Gbits/s and 10Gbit/s) between SUNET and NORDUnet (the connection Network for the Nordic NRENs, peering with Tier1 networks). The data set, collected on the 10Gbit/s PoS link, includes four traces of 10 minutes, collected between January 5 and 11 2009 at different times of day. In total, the OptoSUNET traces consisted of 884 Million IP packets, carrying 521 GBytes of data in 36 Million flows.
The two OC192 backbone links are operated by a commercial ISP in the United States. The first link connects Chicago and Seattle, monitored at an Equinix datacenter in Chicago. The dataset analyzed includes 3 hours of traffic from April to June 2008: April 30 (5-6pm), May 15 (1-2pm), June 19 (1-2pm). In total, the traces consists of 9950 Million IP packets, carrying 7699 GByte of data in 426 Million flows. The other link connects San Jose and Los Angeles, monitored at an Equinix datacenter in San Jose. The dataset analyzed includes 1 hour of traffic (1-2pm) on July 17 2008, with 2450 Million IP packets carrying 1354 GByte of data in 145 Million flows.