



Mission Statement: CAIDA investigates both practical and theoretical aspects of the Internet, with particular focus on topics that:
- are macroscopic in nature and provide enhanced insight into the function of Internet infrastructure worldwide,
- improve the integrity of the field of Internet science,
- improve the integrity of operational Internet measurement and management,
- inform science, technology, and communications public policies.
Contents
Executive Summary
This annual report covers CAIDA's activites in 2006, summarizing highlights from our research, infrastructure, and outreach activities. Our current research projects, primarily funded by the U.S. National Science Foundation (NSF), include several measurement-based studies of the Internet's core infrastructure, with focus on the health and integrity of the global Internet's topology, routing, addressing, and naming system. Our infrastructure activities, funded by NSF and DHS as well as other government and industry sources, include building a catalog of Internet measurement data sets, contributing to the (DHS-funded) PREDICT repository of datasets to support the (U.S.-based) network research community, and developing and deploying active and passive measurement infrastructure that cost-effectively supports the global Internet research community. We also lead and participate in tool development to support measurement, analysis, indexing, and dissemination of data from operational global Internet infrastructure. Finally, we engage in a variety of outreach activities, including web sites, peer-reviewed papers, technical reports, presentations, blogging, animations, and workshops. CAIDA's program plan for 2007-2010 will be available in July 2007.
Background and recent history
For just over ten years CAIDA has undertaken various approaches to narrowing a gap that now impedes the field of network research as well as telecommunications policy: a dearth of available empirical data on the public Internet since the infrastructure privatized. We have managed to grow and maintain a research group through increasingly difficult science funding periods, yet over the last decade our ability to get data from the commercial Internet has gradually diminished, as has the quality of science in the field of Internet research.
As an Internet data analysis and research group largely supported with public funding to apply measurement and analysis toward understanding and solving globally relevant Internet engineering problems, we accept a responsibility to seek, analyze, and communicate the salient features of the best available data about the Internet. And of course, we want better answers to the oft-asked question by graduate students: ``What Internet research problems are important to work on?'?
In 2003, frustrated with both the lack of access to data and the lack of progress in the Internet research community on a number of important problems in the last ten years, we began a survey to find answers to three questions: (1) What are the top operational and engineering problems in the Internet?; (2) Are we making measurable progress toward solving them?; (3) For problems that we're not making progress on, what is blocking progress?
These problems span four dimensions of the Internet as emerging critical infrastructure: safety, scability, sustainability, and stewardship. Measurement made the list of problems, no surprise there. At CAIDA we have experienced how difficult measurement of operational infrastructure has been, but the biggest problems with measurement had long been economic (cost of instrumentation and data management), ownership (legal access to data), and trust (privacy and security obstacles to measurement). More surprising was that measurement was not unique in this regard: all persistently unsolved operational problems of the Internet are similarly blocked on issues of economics, ownership, and trust (EOT).
Expansion of research agenda into policy and economics
This lesson meant a change in strategy for CAIDA. We recognize it is no longer appropriate to pursue solutions to the Internet's problems without tackling the EOT issues. CAIDA's activities have spanned the four S's (security, scalability, sustainability, stewardship) for years, but we have begun to re-focus current projects, and pursue new projects, that specifically recognize and openly navigate the economic and policy issues via activities that cross the boundaries between technology and policy. The table below lists examples of how we have expanded our horizons in the last couple of years. Most current projects will continue into 2007.
In addition to | we expanded our efforts to |
---|---|
collecting topology for use in modeling and simulation activities, | use the same data to provide a current ranking of Internet service providers according to sample observations of AS topology coverage (ASrank). |
launching DatCat, for use by researchers to index data sets into a common catalog, | reach out to policymakers to show them how such a catalog could support informed policy making. |
upgrading our measurement infrastructure and data curation tools to keep pace with technology changes and research community needs, | propose a project to provide otherwise non-existent incentive for operational networks to use these tools to contribute network measurements to the research community. |
simulating routing protocols that could scale to billions of nodes, | simulate IPv4 address consumption scenarios for ARIN to support immediate policy needs. |
providing the largest simultaneous collection of traffic to DNS root servers in support of the research community, | do macroscopic surveys of the health of the current DNS, and participate on two of ICANN's subcommittees (SSAC and RSSAC). |
If you have comments or questions, please do not hesitate to contact us at info at caida dot org.
Research Projects
CAIDA's research projects span numerous domains related to network science and engineering. We hope to make a tangible impact on the field of (Inter)networking research by providing researchers with increased access to real world data sets for validation of scientific models and the promotion of reproducible science.
Domain Name System (DNS)
Improving the Integrity of Domain Name System (DNS) Monitoring and Protection
Goals
The main goal of our DNS project is to supply the research community with the best available, operationally relevant and methodologically sound DNS measurement data. We also develop tools, models, and analysis methodologies for use by DNS operators and researchers.
Activities
The DNS project spent its 2nd year focused on acquiring operationally relevant DNS data and making that data available to the research community.
-
DNS Root Servers Trace Collection
We conducted two large scale experiments collecting data from the DNS root servers. In September 2005, anycast instances of the C, E, and F-root servers participated in the first "Day in the Life of the DNS Root Servers" data collection event. The second collection took place in January 2006 and involved anycast instances of the C, F, and K root servers. The total volume of data (in compressed format) is 360 GB. This unique data set represents the most comprehensive measurements of the root servers and affords researchers an unprecedented insight into the specifics of root server operations.
Analysis of the collected DNS traffic showed that the current method for limiting the catchment areas of local instances appears to be successful. To determine whether clients actually use the instance closest to them, we examined client locations for each root instance, and the geographic distances between a server and its clients. We found that frequently the choice, entirely determined by BGP routing, is not the geographically closest one. Since additional distance between a client and a server causes higher latency, such clients might benefit from examination and optimization of routing for their local DNS root server instances.
Instance selection by BGP is highly stable. Over a two-day period <2% of both C-root and F-root clients and <5% of K-root clients experienced a change in instance of the anycast node they were using. Most DNS traffic is still using mostly the stateless UDP protocol for connections, so that the vast majority of clients would not be harmed by such changes, apart from the unavoidable delay created by BGP convergence. However, emerging DNS technologies to support security and other extensions to the original DNS architecture will require the use of TCP (stateful) connections, which could encounter problems in the face of anycast instability.
Overall, anycasting the DNS root nameservers not only extended the architectural limit of thirteen (13) DNS root servers, but also appears work well to improve the global robustness, resilience, and performance of the DNS.
-
Influence map of DNS root servers
Based on available data from anycast instances of the C, F, and K root servers, we developed a visualization that illustrates the geographical diversity of DNS root server clients and differences in deployment strategies of different root server nodes.
The visualization consists of two world maps for each root, both of which project the world as viewed from the North Pole. Each individual anycast instance is placed on the map at the 'center of influence' of its observed clients. To determine this location, we consider each of a server's clients as a point with mass=1 and use those client locations to compute the centroid. Thus, anycast root nodes that serve a primarily local clientele remain closer to their actual geographic locations on our map, while nodes that tend to serve a more geographically distant client set are displaced on our maps toward the regions where they have the most clients. In both maps, the location of each node reflects the centroid (center of influence) of its clients' geographic locations rather than the actual geographic location of the server.
Wedges fanning out from each server in the larger Location Map indicate the direction, distance, and number of clients observed within the bounding angle of the wedge. A smaller Displacement Map depicts servers whose actual location and centroid are markedly different.
-
RFC1918 updates
We completed analysis of the properties and sources of spurious RFC1918 updates that the root servers deflect to a specially created protective system of name servers known as AS112. (The so called RFC1918 or private addresses are intended strictly for use inside networks and should not leak to the outside world.) We conducted laboratory experiments to verify the behavior of the most recent Windows versions.
Analysis of update logs collected at two independent AS112 servers showed that leakage of DNS updates is a global network pollution problem, with update volume exceeding millions per hour. We applied three signature techniques to two short packet traces and found that over 97% of DNS updates come from various types of Windows systems and over 99% of unique IP addresses in our traces have at least one Windows machine at or behind it. Windows 2000 accounts for the majority of the update packets while Windows XP is more stringent in sending private DNS updates. Our long term data did not reveal an obvious decreasing trend in RFC1918 update rates due to the evolution of operating system. Users, software vendors, and system administrators should all take responsibility for the behavior of their systems and mitigate polluting behavior with appropriate actions
-
DNS Statistics Collector (dsc)
Under subcontract to CAIDA, Duane Wessels of The Measurement Factory developed the DNS Statistics Collector (dsc) tool during the first year of this project; dsc runs on selected instances of the C, E and F root servers, and is used by some of the OARC members to measure their own DNS infrastructure. With support from this project OARC is making 7-days delayed dsc statistics from the F-root publicly available.
We also installed a dsc collector and presenter on our primary nameserver (ns1.caida.org), a single processor Pentium-4 class server running FreeBSD 5.4. It serves about 5 domains and about 40 local machines. Using the default collector settings, we monitor what types of clients our nameserver has, where those clients are located, and what types of queries the server processes.
-
Open resolvers surveys
We initiated an ongoing survey that looks for open DNS resolvers. Open resolvers threaten the normal DNS functioning since they allow resource squatting, are easy to poison, and can be used in widespread Distributed Denial of Service (DDoS) attacks. Through a subcontract The Measurement Factory developed a web interface for immediate real-time testing of individually entered addresses. We probe each of our target IP addresses no more than once every three days. Since June 2006 we are archiving daily reports showing the number of open resolvers for each autonomous system number. As an example of this data, the report from Friday, June 22, 2007 includes 10,948 autonomous systems with 300,511 open resolvers.
-
Longitudinal DNS RTT data
Using NeTraMet traffic meters installed at various locations in the US, New Zealand, and Japan, we continued monitoring requests to, paired with responses from, root/gTLD servers generated by campus (i.e. large enterprise) networks. The resulting dataset is available for researchers. It contains information useful for evaluating performance conditions and trends on the global Internet. DNS RTTs are influenced by several factors, including remote server load, network congestion, route instability, and local effects such as link or equipment failures.
-
Anycast Simulation
In collaboration with Prof. G. Riley (Georgia Tech University), we began laboratory simulations to study the effectiveness of anycast deployment for DNS root servers. The goal is to determine the impact of BGP (mainly BGP convergence) on anycast deployment as well as possible effects from scaling up anycast deployment on DNS performance.
In our experiments, we simulate a realistic AS level Internet topology using data from CAIDA and the RouteViews project. We apply BGP++'s configuration generator to convert the inferred AS relationships to configuration files for our simulation. Next we model two kinds of failures.
- Silent link failure: Silent link failures can be detected by triggering BGP holdtime timers. Although such failures could occur on any segment of the chosen best AS path, we restrict the current simulation to a single failure of the last hop AS link. In this failure mode, the client would still communicate with the same server (if it is up) through a different path.
- Explicit withdrawal of an anycast prefix by an advertising AS: This failure makes the server unreachable which would cause the client to switch to a different anycast server node in this scenario.
The Georgia Tech team has initial results obtained for a small-sized AS topology for the BGP-ANYCAST simulations. In the link down case, the distribution of requests among server nodes changed insignificantly: the clients could still reach the same servers through other links. In the explicit prefix withdrawal case, the network quickly converged to a new state since the simulated graph was small and well-connected. The requests were re-distributed to other nodes, with only one flip for affected clients.
Major Milestones
- DNS Root servers traces
-
- Presented results of our analysis at July 2006 IEPG meeting.
- Submitted a paper to PAM (accepted in 2007), "Two days in the life of the DNS anycast root servers" by Liu, Huffaker, Fomenkov, Brownlee and claffy.
- Indexed data analyzed in the paper into Datcat.
- Posted 2006 Influence map of DNS root servers.
- RFC1918 analysis
-
- Published in CCR the paper "The windows of private DNS updates" by Broido, Shang, Fomenkov, Hyun, and claffy.
- Published explanations and instructions to end users on how to disable dynamic DNS updates on Microsoft Windows systems.
- RTT data
-
- Released to the research community four years of data on RTTs from several campuses to root/gTLD servers.
- Surveys
- Outreach
-
- Made presentations at IEPG, RSSAC, WIDE, and NANOG.
- "Two days in the life of three DNS root servers" (RSSAC and WIDE)
- "DNS root traffic: how do anycast instances behave?" (IEPG)
- "Identifying and Reducing Private DNS Updates" (WIDE)
- "Traffic on TCP port 53 and DNS Response Sizes at U Auckland" (WIDE)
- "DNS Cache Poisoners Lazy, Stupid, or Evil?"(NANOG)
- Co-organized with ISC and OARC the 2nd DNS-OARC workshop.
- Made presentations at IEPG, RSSAC, WIDE, and NANOG.
Student Involvement
Ziqian Liu, a visiting scientist from China, collaborated with CAIDA researchers in analysis of 48-hour tcpdump traces from 53 anycast root servers (4 from c-root, 33 from f-root, 16 from k-root). He examined geographical and topological coverage from each instance and analyzed the stability of client-to-server affinity.
Two graduate students from Georgia Tech, Sunitha Beeram and Talal Jaafar, worked on laboratory simulations of the anycast deployment effectiveness.
Security
The Collaborative Center for Internet Epidemiology (CCIED)
Goals
CAIDA participates in The Collaborative Center for Internet Epidemiology and Defenses (CCIED, "SeaSide"), a joint effort between researchers at UCSD and the International Computer Science Institute's Center for Internet Research. CCIED addresses critical challenges posed by large-scale Internet-based pathogens, such as worms and viruses. CCIED efforts focus on analyzing the behavior and limitations of Internet pathogens, developing early-warning and forensic capabilities, and developing technologies that can automatically defend against new outbreaks in real-time.
Activities
- Analysis of the Blackworm email virus
-
The Nyxem virus is a 95 kb Visual Basic executable that infects a computer when an unwary user runs an executable email attachment. On the 3rd day of every month, the virus searches for files with 12 common file extensions (.doc, .xls, .mdb, .mde, .ppt, .pps, .zip, .rar, .pdf, .psd, and .dmp) on all available drives and replaces them with the text string "DATA Error [47 0F 94 93 F4 K5]".
The spread of most email viruses is extremely difficult to track due to the their spread mechanism -- using legitimate email addresses gathered from infected computers to spread to the people with whom virus victims normally interact. Unlike many email viruses, computers infected with Nyxem automatically generated a single http request for the url of an online statistics page. Each request for the statistics page was displayed on the page itself. Presumably this behavior was included in the virus so that the virus author could track its progress.
Logs from the online statistics page provided a relatively unique opportunity to track the global spread of an email virus. Unfortunately, uninfected spectators and numerous distributed denial-of-service attacks made it difficult to differentiate which IP addresses represented legitimate infections. Also, use of web proxies and DHCP influenced the number of infected computers represented by each IP address in the logs.
We estimate that between 469,507 and 946,835 computers in more than 200 countries were infected by the Nyxem/Blackworm virus between January 15 23:40:54 UTC 2006 and February 1 05:00:12 UTC. Interestingly, the geographic distribution of computers infected by the Nyxem virus differs significantly from general estimates of Internet usage in countries around the world. The geographic distribution of Nyxem-infected hosts also differs from that of random-spread Internet worms that do not require human intervention such as opening and running an executable attachment from an email. The virus disproportionately affected the Middle East and some countries of South America, particularly Peru.
At least 45,401 of the infected computers were also compromised by other forms of spyware or bot software. These spyware/botnet infections were advertised in the browser string of Nyxem-infected computers and thus were visible in the web logs documenting the spread of the virus.
An animation of the global spread of the Blackworm virus with the Cuttlefish tool.
- Anomaly Detection
- While accurate robust data collection and report generation for large events is largely solved, the ability to detect and report on novel or unusual events remains difficult. Current traffic volumes at both the enterprise and backbone level often prohibit inspection of every packet. Current sampling methods provide at best incomplete data, and at worst no data whatsoever, a about low-volume yet significant events, such as botnet command and control activities. Current research in this area focuses on algorithms to preferentially sample and report on new and unusual traffic, while also providing provably accurate summaries of regularly occurring traffic patterns.
Major Milestones
- Provided Internet Service Providers with lists of Nyxem/Blackworm-infected computers so that customers could be warned before the virus destroyed their data.
- Released web report and animation of the Blackworm/Nyxem Email Virus
Trusted Computing: Quantitative Network Security Analysis
Goals
Our work in quantitative security analysis attempts to move beyond anecdotal assessment of malicious events and accidental detection of compromised infrastructure. Because comprehensive security analysis is difficult, much work thus far on attack prevalence relies on surveys of impressions rather than direct measurement. We aim to develop and apply novel techniques to rigorously describe the volume of spoofed-source denial-of-service attacks, random-scanning Internet worms, and opportunistic host scanning. We also collect data and conduct analyses over a period of years, rather than days, so as to get a comprehensive view of trends in malicious activity over time. We process collected data into datasets useful to those studying various security phenomena and distribute those datasets to researchers. We also worked to develop a real-time monitor of the UCSD Network Telescope that provides an up-to date view of traffic behavior.
Many, if not most, successful attacks to compromise computational infrastructure are discovered accidentally rather than via established monitoring techniques. In some cases, technology to detect these attacks exists but is not economically viable to support and use. However the ability to detect other types of attacks, including identifying compromised routers and user accounts remains elusive. We made significant progress in developing techniques to detect these types of attacks in time to minimize the resulting damage.
Funding for this project ended in July 2006.
Major Milestones
- DDOS Longitudinal Study
-
- In May 2006 we published in the ACM Transactions on Computer Systems (TOCS) a longitudinal study of denial-of-service backscatter activity between January 2001 and November 2004. We identified number, duration and focus of distributed denial-of-service attacks, and quantified attack characteristics, including volume, frequency, duration, and protocol. We also assessed DDoS victims, including targeted services, locations, and attack impact. In total, we observed over 68,000 attacks directed at over 34,000 distinct victim IP addresses -- ranging from well-known e-commerce companies such as Amazon and Hotmail to small foreign ISPs and dial-up connections. We believe our technique is the first to provide quantitative estimates of Internet-wide denial-of-service activity and that this paper describes the most comprehensive public measurements of such activity to date.
- We also released the data necessary to reproduce this study. For more information, see The CAIDA Backscatter-TOCS Dataset.
- Real-time UCSD Network Telescope Monitor
- Toward our goal of providing a real-time view of anomalous traffic reaching the UCSD Network Telescope, we designed, implemented, and tested new monitor software to process and store data, generate graphs, and display traffic reports based on such criteria as application, topological and geographic source and destinations, and transport type. We also adapted our denial-of-service attack identification and classification methods to work in real-time so that we could display separate views of denial-of-service attacks and scanning activities.
UCSD Network Telescope
Goals
The UCSD network telescope acts as a passive data collection system. The network telescope consists of a globally routed /8 network that carries almost no legitimate traffic. Because the legitimate traffic is easy to separate from the incoming packets, the network telescope provides us with a monitoring point for anomalous traffic that represents almost 1/256th of all IPv4 destination addresses on the Internet.
Because a network telescope (also known as a blackhole, an Internet sink, or a darknet) does not contain any real computers, there is no reason that legitimate traffic would be monitored by the network telescope. The network telescope collects traffic as a result of a wide range of events, including misconfiguration (e.g. a human being mis-typing an IP address), malicious scanning of address space by hackers looking for vulnerable targets, backscatter from random source denial-of-service attacks, and the automated spread of malicious software (worms).
Please see the UCSD Network Telescope home page for details about the telescope, related information and data access.
Activities
We have used data from the UCSD Network Telescope to analyze Denial-of-Service attacks, Internet worms, and malicious network scans:
- Denial-of-Service Attacks
- We used the telescope to monitor the spread of random-source distributed denial-of-service attacks. When an attacker wants to make it difficult for the attack victim (and the victim's ISP(s)) to block an incoming attack, the attacker uses a fake source IP address (similar to a fake return address in postal mail) in each packet sent to the victim. Because the attack victim cannot distinguish between incoming requests from an attacker and legitimate inbound requests, the denial-of-service attack victim tries to respond to every request packet it receives. When the attacker spoofs a source address randomly, the telescope will observe responses destined for the fake source addresses that are in the telescopes address space (there is no host at the address). By monitoring these unsolicited responses researchers can identify denial-of-service attack victims and infer information about the volume of the attack, the bandwidth of the victim, the location of the victim, and the types of services the attacker targets.
- Internet Worms
- Many Internet worms spread by randomly generating an IP address to be the target of an infection attempt and sending the worm off to that IP address in the hope that that address is in use by a vulnerable computer. Because the network telescope includes one out of every 256 IPv4 addresses, it receives approximately one out of every 256 probes from hosts infected with randomly scanning worms.
- Malicious Network Scans
- Malicious network scans include automated, semi-automated, and manual attempts to locate exploitable computers on the Internet. Scans often differ from other types of traffic visible on the network telescope because the scan traffic arriving at the telescope is not driven by chance. Rather, the attacker's motives in selecting scan targets appear arbitrary from the perspective of the recipient of the scan. The UCSD Network Telescope receives a multitude of scans continuously, including probes for the existence of a device at a given IP address, sequential scans of ports on a single IP address, methodical scans for a single or a small number of vulnerable ports in an IP address range, and even scans utilizing TCP resets.
Major Milestones
- Data Collection
- Collected, processed, and archived 19.4 TB of Network Telescope data.
- Dataset Distribution
- We released the following datasets:
- Witty Worm Dataset (released January 11, 2006)
- Code-Red Worms Dataset (released February 1, 2006)
- Backscatter-2004-2005 Dataset (released February 15, 2006)
- Backscatter-2006 Dataset (released April 7, 2006)
- Realtime Report Generation
- As detailed above, we implemented and tested new software to generate public, real-time reports of denial-of-service, worm, and scanning activities on the UCSD Network Telescope.
Routing
Toward Mathematical Rigorous Next-Generation Routing Protocols for Realistic Network Topologies
Goals
The Internet's routing system is facing stresses due to its poor fundamental scaling properties. The goal of this research project is to apply key theoretical results in distributed computation to an extremely practical purposes: truly scalable interdomain network routing.
Compact routing is a theoretical research field that studies fundamental limits of routing scalability and designs algorithms that try to meet these limits. In particular, compact routing research shows that shortest-path routing, forming a core of traditional routing algorithms, cannot guarantee routing table (RT) sizes that grow slower than linearly with network size. However, there are plenty of compact routing schemes that relax the shortest-path requirement and allow for improved scaling on static network topologies.
Activities
This year we collaborated with A.Brady and L.Cowen from Tufts to analyze several published compact routing schemes in terms of their average performance parameters (local memory space, stretch, control message sizes, etc.) on realistic Internet topologies as measured by skitter and DIMES. We also began to experiment with new fundamentally different approaches to scalable routing, in particular, exploring the possibility of connecting two currently disconnected areas of research: the Kleinberg model of greedy routing without updates and scale-free network topologies.
In the Internet modeling area we continued our analysis of the properties of the best currently available non-equilibrium network growth model: the Positive-Feedback Preference (PFP) model by Zhou and Mondragon. At the same time, in collaboration with S. Shakkottai from UIUC and T. Vest from USC, we developed a new model of Internet growth that attempts to derive preferential attachment behavior from empirically grounded economic realities of the Internet's AS topology.
Major Milestones
- Analysis of compact routing schemes
-
- Investigated two name-independent schemes, one by Arias and its more recent optimal improvement by Abraham, on synthetic power-law graphs and on realistic Internet topologies.
- Investigated three name-dependent schemes, by Cowen, Thorup and Zwick, and Brady and Cowen, on synthetic power-law graphs and on realistic Internet topologies.
- Found that these schemes yield remarkably low values of memory and stretch on real Internet topologies. For example, the BC routing algorithm guarantees a maximum routing table less than O(e log2n), where n is the total number of nodes in the network. This scaling means that even if IPv6 addresses were fully de-aggregated, the maximum routing table size in such an Internet would still contain not more than ~16,000 entries (~1,000 for the deaggregated IPv4 space).
- Found that none of the analyzed approaches leads to an acceptable scalable routing solution since either the average stretch is too high (for the Arias scheme) or the number of updates remained unbounded (for the name-dependent schemes).
- Internet evolution models
-
- Explained the success of the PFP model by the fact that it produces 2K-random graphs. One of the central elements of this model is its attempt to reproduce the so-called rich club connectivity property which severely constrains the joint degree distribution (JDD). We showed that JDD, in turn, fully defines the AS topology.
- Developed a new model of the Internet's growth. To the best of our knowledge, this is the first Internet evolution model that is realistic, analytically tractable, and entirely based on physical, that is, measurable, parameters.
- Verified that when we substitute the measured values of the parameters into analytic expressions of our Internet evolution model, it predicts and explains the observed value of the exponent of the power-law node degree distribution in real AS topologies.
Student Involvement
The following graduate students participated in this project and were responsible for various research tasks accomplished:
S. Shakkottai (UIUC) and T. Vest (USC) developed a new model of the evolution of the Internet's AS topology.
A. Brady (Tufts) analyzed the performance of existing compact routing schemes on real Internet topologies.
Topology
Macroscopic Topology Measurement, Analysis, and Modeling
Goals
CAIDA's topology research remains focused on three areas: macroscopic topology measurement; analysis of the observable ISP hierarchy; and topology modeling in support of routing research.
Activities
In 2006, CAIDA made large strides with activities in all three focus areas.
-
We continued our macroscopic topology measurements while preparing for an upgrade of our existing active probing system. We regularly post an update of the adjacency matrix of the Internet AS-level graph computed daily from skitter measurements.
-
We investigated the problem of inferring AS business relationships from available routing data. By reformulating the AS relationship inference problem as a MAX2SAT-based multiobjective optimization problem, we fixed a number of serious flaws in the AS relationship inference techniques previously considered state-of-the-art. We also developed an approach to infer peering links with unprecedented accuracy. Finally, we validated our inference results by surveying network administrators of about 40 ASs to collect information on the actual connectivity and policies of their ASs. These survey results confirmed high levels of accuracy: we correctly inferred 96.5% customer-to-provider (c2p), 82.8% peer-to-peer (p2p), and 90.3% sibling-to-sibling (s2s) relationships. Comparison of the reported AS connectivity with the AS connectivity data contained in BGP tables showed that BGP tables miss up to 86.2% of the true adjacencies of the surveyed ASs. The majority of the missing links were of the p2p type, meaning that, in reality, peering links are likely to be more dominant than previous measurement methods have been able to capture.
Based on our new relationship inference methodology, we improved our procedure to rank Autonomous Systems (AS Rank) by their connectivity and inferred business relationships. The ranking of each AS is a function of the number of IP prefixes advertised by this AS, its customer ASes, their customers ASs, and so on.
We introduced a radically new approach that utilizes machine learning techniques to map all the ASes in the Internet into a natural AS taxonomy: large ISPs, small ISPs, customer networks, university networks, Internet exchange points, and network information centers. The input to the classification algorithm includes the inferred AS relationships, WHOIS records, and a few other AS attributes. We successfully classified 95.3% of ASes with estimated accuracy of 78.1%. The resulting dataset annotates in detail every node in the AS topology and reveals vastly different network characteristics, external connectivity patterns, network growth tendencies, and other properties intrinsic to various types of ASes.
-
We revolutionized an approach to synthesizing random network topologies by introducing a new superior topology analysis method: dK-series. We define the dK-series of probability distributions by specifying all degree correlations within d-sized subgraphs of a given graph G. Increasing values of d capture progressively more properties of G at the cost of a more complex representation. The dK-series allow us to quantitatively express the distance between two graphs and to construct random graphs that accurately reproduce virtually all metrics proposed in the literature. Using our approach, we constructed graphs for d = 0, 1, 2, 3 and demonstrated that these graphs reproduced, with increasing accuracy, important properties of measured and modeled Internet topologies. We found that the d = 2 case is sufficient for most practical purposes, while d = 3 essentially reconstructs the Internet AS- and router-level topologies exactly.
The nature of the dK-series method implies that it will also capture any future structurally-based metrics that may be proposed. Therefore, this method represents a significant improvement to the set of tools available to network topology and protocol researchers and is relevant far beyond its initial scope.
Major Milestones
- AS relationship inference technique
-
- Published " AS Relationships: Inference and Validation" in the ACM SIGCOMM Computer Communication Review (CCR) , v.37, n.1, pp. 29-40, 2007 by Dimitropoulos, Krioukov, Fomenkov, Huffaker, Hyun, Riley, and claffy
- Updated the interactive AS-ranking web page and posted weekly updates of the data at https://asrank.caida.org/.
- Publish and archive on a weekly basis the Internet AS-level topologies annotated with AS relationship information.
- Released the set of AS attributes we used to classify ASes, and the resulting empirical AS taxonomy
- Topology modeling
-
- Published " Modeling Autonomous System Relationships"presented at the 20th Principles of Advanced and Distributed Simulation (PADS) by Dimitropolous and Riley.
- Published "Systematic Topology Analysis and Generation Using Degree Correlations"presented at SIGCOMM 2006 by Priya Mahadevan.
- Developed a new topology generator for constructing random graphs with defined correlations at levels d=2 and d=3.
- Sketched an approach for extending the topology generator to arbitrary d.
- Conducted the 1st Workshop on the Internet Topology (WIT).
Student Involvement
The following graduate students participated in this project:
X. Dimitropulos (Georgia Tech) developed new approaches to inferring AS business relationships, conducted an extensive survey of ISPs to verify his algorithms, and created the AS taxonomy based on machine learning algorithms.
P. Mahadevan (UCSD) analyzed AS topologies derived from three different data sources, simulated dK-distributions for d=1, 2, 3 and compared them with real Internet topologies, and developed the dK-random graph generator.
Economics and Policy
Cooperative Measurement and Modeling of Open Networked Syst ems (COMMONS)
Goals
With support from Cisco Systems and the National Science Foundation, the COMMONS project began to take root this year. The COMMONS Project hopes to conduct a scientific experiment that offers measurement technology and connectivity to a research and education backbone infrastructure in exchange for access to measurement of the resulting interdomain network.
Major Milestones
In its first year, the COMMONS project reported the following major milestones.
- 1st COMMONS Workshop
-
- On December 12-13, 2006 at the San Diego Supercomputer Center, CAIDA hosted 40 participants from across North America at the UCSD campus to discuss technical, policy, and operational issues related to the COMMONS initiative. The diversity of workshop participants helped ensure that many categories of stakeholders were represented in the proceedings and provided invaluable insights into the potential pitfalls and opportunities the COMMONS Project faces. We published a COMMONS Workshop Final Report.
Infrastructure Projects
Several of CAIDA's infrastructure projects cut across scientific and funding domains as they strive to provide long-term provision of data, services, and measurement infrastructure. In addition to raw data collection, distribution and archival efforts, the DatCat project offers improved metadata searching and indexing. Further, the PREDICT project develops legal and policy infrastructures for data sharing amongst data providers, data hosting sites, and researchers, respectively.
In 2006, CAIDA took over operational stewardship for all NLANR machines and data. CAIDA gracefully decommissioned obsolete hardware and data and attempted to apply the remaining resources to projects of similar intent and spirit.
DatCat: the Internet Measurement Data Catalog
The goal of this project is to build an Internet Measurement Data Catalog (IMDC) that will facilitate access, archiving, and long-term storage of Internet data as well as sharing the data among Internet researchers. After 3.5 years of development we finally launched the catalog in June 2006 at www.datcat.org. Funding for the project also ended in 2006, so we are currently seeking funding to maintain and extend it.
Goals
CAIDA had three goals for the creation of the IMDC database:
- provide a searchable index of Internet data available for research
- enhance documentation of datasets via a public annotation system
- enable reproducible research in network science
Activities
In 2006, the Trends project completed its final year.
- DatCat
-
Our main priority was the development of DatCat, the Internet Measurement Data Catalog. This searchable catalog archives meta-data for Internet measurement data sets and provides information on data locations and access. We opened DatCat for public viewing on June 12, 2006 and continued to index new data sets and to improve user interfaces throughout the year.
CAIDA hosted the CRAWDAD: a Community Resource for Archiving Wireless Data At Dartmouth team for a small workshop aimed at inter-operation between these two NSF-funded catalogs. We hope to be able to automatically share data and meta-data between the DatCat and CRAWDAD catalogs in Fall of 2007.
- 1st DatCat Community Contribution (DCC 1) Workshop
-
We created this workshop series, and hosted the 1st DatCat Community Contribution (DCC 1) Workshop, to provide community members with the resources they need to contribute data to DatCat. At the time of writing, the First DatCat Community Contribution Workshop (DCC 1) Collection contains 1,239 files from 9 subcollections and 4 publications.
- Data Collection and Software Development
-
We collected and distributed data from passive monitors and 22 active monitors. 310 requests for data have been answered, with more than 220 research groups accessing our data.
We developed a new report generator tool that produces easily configurable, comprehensive, real-time summaries of network traffic on monitored links. The following traffic characteristics can be reported: packet, byte, and flow volume, number of source and destination IP addresses and Autonomous Systems, the geographic regions exchanging traffic, and the types of applications used. This metadata is also ideal information to index in DatCat.
- Data Analysis
-
We continued ongoing efforts to characterize the applications used for peer-to-peer file-sharing exchange and their prevalence on the Internet. We expect to publish new results in this area in 2007.
Major Milestones
- DatCat
-
- Opened DatCat to public viewing on June 12, 2006, initially with 12 collections of data from 3 organizations.
- Created browse interface that complements search interface by allowing users to view catalog contents without specific criteria in mind.
- Developed a comprehensive user Help section including a screen-shot based tutorial
- Augmented advanced search with additional query capabilities and made it more user friendly
- Added ability for users to add "note" annotations to any object
- Added BibTex citations to every DatCat object's detail page
- Releasing CAIDA data to the community
-
- Prepared and released six new datasets:
- Witty Worm Dataset (January 11, 2006)
- Code-Red Worms Dataset (February 1, 2006)
- Backscatter-2004-2005 Dataset (February 15, 2006)
- Backscatter 2006 Dataset (April 7, 2006)
- AS Taxonomy Dataset (April 14, 2006)
- DNS Root Server/gTLD RTT Dataset (August 3,2006)
- Prepared and released six new datasets:
- New traffic report generator
-
- Added geographic analysis of traffic.
- Added a flexible display so that users can view a variety of graphs and tables displaying different time periods.
- Added the ability to collect statistically valid sampled flows on high speed (OC49, OC192, 10GigE) links and display the sampled reports.
PREDICT: Network Traffic Data Repository to Develop Secure IT Infrastructure
Goals
The Protected Repository for the Defense of Infrastructure against Cyber Threats (PREDICT) was designed to provide sensitive security datasets to qualified researchers, while preserving privacy and preventing data misuse. PREDICT offers a secure technical and policy framework to process applications for data sharing from network providers, which includes tools for collection, processing, and hosting of data that will eventually be available through the program as well as secured infrastructure to support serving datasets to researchers. (Note that the PREDICT project has yet to launch, see www.predict.org for details.)
Activities
CAIDA's involvement in the effort, called the Protected Repository for the Defense of Infrastructure against Cyber Threats (PREDICT), includes assisting with: the technical framework to process applications for data; development of communications between Data Providers, Data Hosting Sites, and Researchers; collection and processing of data that will eventually be available through the program; developing and deploying infrastructure to allow CAIDA to serve datasets to researchers; and providing input for the development of a legal framework to support the project. In 2006, CAIDA helped develop Memoranda of Agreements to allow CAIDA to participate in the launch of the PREDICT program as a data provider and a data hosting site, serving OC48 backbone peering link data, denial-of-service backscatter data, Internet worm data, Network Telescope data, and IP topology data to approved researchers.
Major Milestones
- Macroscopic Topology Data Collection
-
- We collected 623 GB (9,181 files) of macroscopic topology measurements for distribution via PREDICT, bringing our total macroscopic topology data collection to 3.5 TB.
- Security Data Collection
-
- We collected 19.4 TB (8,545 files) of security measurements from the UCSD network telescope for distribution via PREDICT, bringing our total security data collection to 52.8 TB.
- New Datasets Released
-
- Witty Worm Dataset (January 11, 2006)
- Code-Red Worms Dataset (February 1, 2006)
- Backscatter-2004-2005 Dataset (February 15, 2006)
- Backscatter-2006 Dataset (April 7, 2006)
- AS Taxonomy Repository (April 14, 2006)
- the DNS Root Server/gTLD RTT Dataset (August 3, 2006)
Archipelago (Ark): A Coordination-Oriented Measurement Infrastructure
Goals
This year, CAIDA began development of Archipelago (Ark), our next generation active measurement infrastructure supporting the Macroscopic Topology Project. We intend for Ark to eventually replace the previous skitter-based infrastructure. Inspired by the Community-Oriented Network Measurement Infrastructure (CONMI) Workshop Report, Ark aspires to achieve greater scalability and flexibility than our current measurement infrastructure and provides steps toward a community-oriented network measurement infrastructure intended to eventually allow collaborators to run their vetted measurement tasks on a security-hardened distributed platform.
We designed Ark with macroscopic topology as its first target application, using the scamper tool to measure topology and latency to large numbers of IPv4 and IPv6 addresses. Once Ark is established and stable, CAIDA intends for the infrastructure to support numerous other types of data collection, measurements and experiments, e.g., one-way loss, bandwidth estimation, DNS performance, router alias resolution, and more. Although geared toward active measurement, Ark could also enable some types of passive measurement.
Ark's design represents a new approach to coordinating the activities of a measurement infrastructure, including supporting multiple researchers attempting different types of measurements. Archipelago's coordination facility is called Marinda, loosely inspired by David Gelernter's tuple-space based Linda coordination language. Archipelago extends Gelernter's tuple space model with features needed to support a globally distributed measurement infrastructure that hosts heterogeneous measurements by a community of researchers.
Activities
The need to extend our aging active measurement architecture to support a wider variety of measurements spurred Ark's design and development. We upgraded the infrastructure to probe a broader range of dynamically generated IP addresses covering all /24 prefixes in the IPv4 address space. We also improved our mechanisms for signaling file and cycle completion.
Major Milestones
In its first year, the Ark project met the following milestones.
- complete initial architecture and design of Archipelago, including security features, communication and coordination facilities, software configuration and installation, and data storage and management.
- initial implementation and testing
- Presentation:
-
-
Young Hyun presented The Archipelago Measurement Infrastructure at the 7th CAIDA-WIDE Workshop, Nov 2006.
-
National Laboratory for Advanced Network Research (NLANR) Resource Stewardship
Goals
The National Laboratory for Applied Network Research (NLANR) Project officially ended February 28, 2006. The funding for the NLANR project expired June 30, 2006 and the National Science Foundation has no plans to continue support. Starting in July 2006, CAIDA took over operational stewardship for all NLANR machines and data. Some of the NLANR data and resources still offer value to the network science, research and development communities, and can help support other forms of network research. Since much of the equipment is too old to maintain, CAIDA conducted a careful audit, gracefully decommissioned obsolete hardware and data, and attempted to apply the remaining resources to similar projects at CAIDA. As we performed this audit, we informed the community of changes to operations of the machines at the sites. At any time hosting sites may decline participation in our measurement activities, but so far many have found our proposed use sufficiently relevant to continue hosting a measurement platform.
NLANR Resources
- The NLANR.NET Domain and Services
CAIDA (temporarily) stewards the NLANR.NET domain, originally registered by Hans-Werner Braun, as well as eleven servers used to serve the various mail, web, application, and database services for the project, including serving the data collected by the Active Measurement Project (AMP) and Passive Measurement and Analysis (PMA) projects.
- PMA Project Resource Reuse
Of 33 original passive collection sites, fourteen appear to have been decommissioned by NLANR staff prior to July 2006. CAIDA has decommissioned five sites with outdated hardware. Seven sites have sufficiently modern equipment that we hope to repurpose them for passive data collection and reporting. Of the remaining nine monitors, eight were connected to a router located at the Indianapolis Internet2 site. This router is no longer part of the Internet2 infrastructure, however, the instruments are capable of measuring oc192/10Gb links so we hope to repurpose them. The final site in Los Angeles hosts the lambdaMON optical network tap, but we have yet to revive it. CAIDA is still working with sites with viable hardware to get acceptable use policies in place to continue to conduct measurement experiments.
- AMP Project Resource Reuse
Of over 150 original active collection sites, approximately 30 have hardware that CAIDA may be able to reuse for its Macroscopic Topology Measurements Project, specifically in the Archipelago infrastructure. We have carefully decommissioned other hosts previously associated with the extensive AMP mesh. To date, CAIDA has directly decommissioned over 100 hosts, and continues to work with sites on another 36 that are not reachable and/or we plan to physically remove from service.
Major Milestones
- July 24, 2006, CAIDA ceased data collection at the PMA sites.
- On approximately August 31, 2006, CAIDA ceased data collection on most of the AMP mesh. (It took several days to track down and disable all of the boxes).
- The PRAGMA project made use of a subset of nodes we left operational for a demonstration presented in October 2006.
Further Requests for Information
CAIDA will happily provide more specific details regarding the status of any of the NLANR sites and data. If sites have questions or comments on these changes, or input regarding future measurement experiments, please send the queries to nlanr-info at caida dot org.
Tools
CAIDA's mission includes providing access to tools for Internet data collection, analysis and visualization to facilitate network measurement and management.
2006 Tool Development
CoralReef
The CoralReef Software suite, developed by CAIDA, provides a comprehensive software solution for data collect and analysis from passive Internet traffic monitors, in real time or from trace files. Real-time monitoring support includes system network interfaces (via libpcap), FreeBSD drivers for a number network capture cards, including the popular Endace DAG (OC3 and OC12, POS and ATM) cards. The package also includes programming APIs for C and perl, and applications for capture, analysis, and web report generation. This package is maintained by CAIDA developers with the support and collaboration of the Internet measurement community.
In 2006, CAIDA continued to develop CoralReef Software adding capabilities to collect realtime flow data on high speed (OC48, OC192/10GigE) links. New routines allow the software to adapt the sampling rate to the traffic mix thus providing provably accurate sampled flows while staying within fixed memory and reporting budget for all possible traffic mixes. Our approach is described in papers "Building a Better NetFlow" and "A Robust System for Accurate Real-time Summaries of Internet Traffic".
We also updated and improved a realtime traffic report generator that allows dynamic selection of displayed reports. We expect to release CoralReef version 3.8 in summer 2007.
Cuttlefish
Cuttlefish produces animated GIFs that reveal the interplay between the diurnal and geographical patterns of displayed data. By showing how the Sun's shadow covers the world map, cuttlefish yields a direct feeling for the time of day at a given geographic region, while moving graphs illustrate the relationship between local time and the visualized events. We released Cuttlefish early in 2006.
CAIDA acknowledges WIDE for their generous support of cuttlefish development.
![]() | This visualization depicts the traffic volume to and from residential broadband customers of a major Japanese ISP between 7:00 am May 4th and 7:00 am May 5th 2005 UTC. Please refer to the Cuttlefish home page for more information about this visualization. |
(Click the image for a larger version.) |
See the Cuttlefish home page for more information, Cuttlefish examples or to download the software.
CAIDA Tools Download Report
The table below displays all the CAIDA developed tools distributed via our home page at https://www.caida.org/tools/ and the number of downloads during 2006.
-
Currently Supported Tools
Tool Description Downloads coralreef A software suite to collect and analyze data from passive Internet traffic monitors. 813 dsc A system for collecting and exploring statistics from DNS servers. 127 dnsstat An application that collects DNS queries on UDP port 53 to report statistics. 313 dnstop is a libpcap application that displays tables of DNS traffic. 3141 sk_analysis_dump A tool for analysis of traceroute-like topology data. 224 walrus A tool for interactively visualizing large directed graphs in 3D space. 4288 libsea A file format and a Java library for representing large directed graphs. 493 Chart::Graph A Perl module that provides a programmatic interface to several popular graphing packages. Note: Chart::Graph is also available on CPAN.org. The numbers here reflect only downloads directly from caida.org, as download statistics from CPAN are not available. 359 plot-latlong A tool for plotting points on geographic maps. 213 -
Past Tools (Unsupported)
Tool Description Downloads Mapnet A tool for visualizing the infrastructure of multiple backbone providers simultaneously. 19895 GeoPlot A light-weight java applet creates a geographical image of a data set. 740 GTrace A graphical front-end to traceroute. 1543 otter A tool used for visualizing arbitrary network data that can be expressed as a set of nodes, links or paths. 1997 plotpaths An application that displays forward and reverse network path data. 177 plankton A tool for visualizing NLANR's Web Cache Hierarchy 107
Data
Data Collected in 2006
In 2006, CAIDA continued its data collection activities, adding over 20 terabytes of new data and bringing the total holdings to over 56 terabytes (uncompressed) of data.
Data Type File count First date Last date Total size (compressed) Total size (uncompressed) Macroscopic Topology Measurements 9181 2006-01-01 2007-01-01 N/A 629 GB Network Telescope 8545 2006-01-01 2007-01-01 6.5 TB 19.4 TB
Data Distributed in 2006
We process the raw data into specialized datasets to increase its utility to researchers and to satisfy security and privacy concerns. In 2006, we released the following datasets:
- Witty Worm Dataset (January 11, 2006)
- Code-Red Worms Dataset (February 1, 2006)
- Backscatter-2004-2005 Dataset (February 15, 2006)
- Backscatter-2006 Dataset (April 7, 2006)
- AS Taxonomy Repository (April 14, 2006)
- the DNS Root Server/gTLD RTT Dataset (August 3, 2006)
While some of the resulting datasets are available to anyone without restriction, the access to others is restricted to academic, government, and non-profit researchers and CAIDA members. We ask for-profit companies to sponsor CAIDA in order to gain access to these restricted datasets. All data access is subject to Acceptable Use Policies (AUP) designed to protect the privacy of monitored communications, ensure security of network infrastructure, and comply with the terms of our agreements with data providers.
-
Publicly Available Data
These datasets require that users agree to an Acceptable Use Policy, but are otherwise freely available.
Data Set Unique visitors Number of Visits Data Downloaded AS Rank 5,933 13,431 9.8 GB AS Adjacencies 2,323 9,209 5.3 GB AS Relationships 1,294 3,505 5.8 GB Router Adjacencies 357 392 955 MB Witty Worm Dataset 256 435 323 MB Code-Red Worms Dataset 222 350 4.6 GB AS Taxonomy** 134 135 63 MB ** AS Taxonomy dataset is included in a mirror of the GA Tech main AS Taxonomy site, and thus does not represent all access to this data.
-
Restricted Access Data
These datasets require that users:
- be academic or government researchers, or join CAIDA
- request an account and provide a brief description of their intended use of the data
- agree to an Acceptable Use Policy
Data Set Unique visitors Number of Visits Data Downloaded Raw Topology Traces 249 2879 7.4 TB Anonymized OC48 Peering Link Traces 198 636 2.2 TB 2003 Internet Topology Data Kit 107 242 39 GB Backscatter Datasets* 77 407 1 TB Witty Worm Dataset 65 148 165 GB DNS Root/gTLD server RTT Dataset 9 18 25.4 MB * The Backscatter-2004-2005 and Backscatter-2006 datasets were released in 2006.
-
Restricted Access Data
This data was collected by the NLANR project. When this project came to an end in July 2006, CAIDA inventoried NLANR equipment and took over curation and distribution of NLANR data. CAIDA now maintains both the NLANR AMP and PMA public data repositories.
Data Set Unique visitors Number of Visits Data Downloaded NLANR PMA Data 973 3,119 8.3 TB NLANR AMP Data 298 2,422 10.68 GB
-
Total Data Distributed in 2006
Unique visitors Number of Visits Data Downloaded All Data Distributed 4,907 35,678 72.1 TB
Workshops
As part of our mission to investigate both practical and theoretical aspects of the Internet, CAIDA staff actively attend, contribute to, and host workshops relevant to research and better understanding of Internet infrastructure, trends, topology, routing, and security.
CAIDA Workshops
In 2006, CAIDA hosted the following workshops:- CAIDA-WIDE Workshops
- The 6th CAIDA/WIDE Workshop was held on March 17-18, 2006 in Marina del Rey, CA and the 7th CAIDA/WIDE Workshop was held on November 3 - 4, 2006 in San Diego Supercomputer Center, La Jolla, CA. These biannual invitation-only workshops traditionally cover IPv6/IPv4 topology studies and DNS, as well as miscellaneous research and technical topics of mutual interest for CAIDA and WIDE participants.
- ISMA Workshop on the Internet Topology (WIT)
- On May 10-12, 2006, CAIDA hosted the first ISMA Workshop on the Internet Topology (WIT) supported by NSF, CAIDA and SDSC. The attendance at the workshop was by invitation only. The workshop produced a final report that appeared in ACM SIGCOMM Computer Communication Review (CCR), v.37, n.1, pp. 69-73, 2007.
- 2nd DNS-OARC Workshop
- CAIDA and ISC organized and Microsoft hosted the 2nd DNS-OARC Workshop on November 16-17, 2006 in Redmond, WA. The focus of this workshop included DNS-related Internet measurements and research, operational co-ordination and trusted communication between root, TLD and other DNS and Internet operators, and the future evolution and governance of OARC. Participation was open to OARC members and by invitation to other parties interested in DNS research and operations.
- 1st COMMONS Workshop
- The 1st COMMONS workshop (by invitation only) was held on December 12-13, 2006 at the San Diego Supercomputer Center in La Jolla, CA. This Workshop brought together representatives from industry, community and municipal networks, regional and state networks, as well as Internet researchers, community organizers, and developers building next-generation data communications technologies. Workshop participants also included heads of research, infrastructure, media, and policy organizations, as well as telecommunications lawyers. This unique event produced the COMMONS Workshop Final Report.
Attended meetings
In 2006 CAIDA staff attended the following workshops and conferences and made presentations there:
- ACM SIGCOMM Special Interest Group on Data Communications
- The OECD ICCP Workshop "The Future of the Internet"
- The 3rd Annual Workshop on Flow Analysis (FloCon 2006)
- Internet Engineering Task Force (67th IETF) and corresponding meeting of the Internet Engineering Planning Group
- The Intimate 2006 Network Anomaly Diagnosis Workshop
- The North American Network Operators' Group (NANOG36)
- The Alternative Telecommunications Policy Forum
- Unofficial DNS Root Server System Advisory Committee Site (RSSAC)
- Passive and Active Measurement (PAM) Workshop
- The Quilt 2006 Workshop, Muni Wireless Conference
- Digital Inclusion Day
- 5th System Administration and Network Engineering Conference (SANE)
Publications
The following table contains the papers published by CAIDA for the calendar year of 2006. Please refer to Papers by CAIDA on our web site for a comprehensive listing of publications.
Year | Author(s) | Title | Publication | Unique Hits |
2006
| Huffaker, B. Friedman, T. claffy, k. | Approche Récursive d'Etiquetage des Chemins Alternatifs au Niveau IP | Colloque Francophone d'Ingénierie des Protocoles (CFIP) |
0
|
2006
| Huffaker, B. Friedman, T. claffy, k. | Evaluation of a Large-Scale Topology Discovery Algorithm | IEEE International Workshop on IP Operations & Management |
0
|
2006
| Krioukov, D. Fall, K. Vahdat, A. | Systematic Topology Analysis and Generation Using Degree Correlations | SIGCOMM |
825
|
2006
| Shang, H. Fomenkov, M. Hyun, Y. claffy, k. | The Windows of Private DNS Updates | ACM SIGCOMM Computer Communications Review (CCR) |
1379
|
2006
| Shannon, C. Brown, D. Voelker, G. Savage, S. | Inferring Internet Denial-of-Service Activity | ACM Transactions on Computer Systems |
2432
|
2006
| Riley, G. | Modeling Autonomous System Relationships | Principles of Advanced and Distributed Simulation (PADS) |
667
|
2006
| Crovella, M. Friedman, T. Shannon, C. Spring, N. | Community-Oriented Network Measurement Infrastructure (CONMI) Workshop Report | ACM SIGCOMM Computer Communications Review (CCR) |
1365
|
2006
| Huffaker, B. Friedman, T. claffy, k. | Implementation and Deployment of a Distributed Network Topology Discovery Algorithm | arXiv cs.NI/62 |
0
|
2006
| Krioukov, D. Riley, G. claffy, k. | Revealing the Autonomous System Taxonomy: The Machine Learning Approach | Passive and Active Network Measurement Workshop (PAM) |
1504
|
2006
| Krioukov, D. Fomenkov, M. Huffaker, B. Dimitropoulos, X. claffy, k. Vahdat, A. | The Internet AS-Level Topology: Three Data Sources and One Definitive Metric | ACM SIGCOMM Computer Communications Review (CCR) |
2693
|
Presentations
The following table contains the presentations and invited talks published by CAIDA for the calendar year of 2006. Please refer to Presentations by CAIDA on our web site for a comprehensive listing.
Web Site Usage
In 2006, CAIDA's web site continued to attract considerable attention from a broad, international audience. Visitors seem to have particular interest in CAIDA's tools and analysis.
The table below presents the monthly history of traffic to www.caida.org for 2006.
Month | Unique visitors | Number of visits | Pages | Hits | Bandwidth |
---|---|---|---|---|---|
Jan 2006 | 123,644 | 274,068 | 1,312,791 | 3,254,556 | 92.29 GB |
Feb 2006 | 123,912 | 330,929 | 1,443,084 | 3,807,364 | 120.28 GB |
Mar 2006 | 109,712 | 421,473 | 2,039,879 | 3,914,781 | 128.43 GB |
Apr 2006 | 97,168 | 241,669 | 1,171,131 | 2,712,955 | 138.25 GB |
May 2006 | 82,183 | 236,902 | 1,158,110 | 2,698,210 | 114.66 GB |
Jun 2006 | 114,924 | 218,982 | 954,935 | 3,429,410 | 97.45 GB |
Jul 2006 | 70,616 | 164,026 | 789,864 | 2,105,385 | 80.12 GB |
Aug 2006 | 72,364 | 140,401 | 548,868 | 1,799,446 | 73.74 GB |
Sep 2006 | 76,076 | 173,234 | 557,594 | 1,849,575 | 69.04 GB |
Oct 2006 | 119,373 | 184,137 | 575,048 | 2,843,483 | 149.63 GB |
Nov 2006 | 71,563 | 172,279 | 516,079 | 1,892,643 | 71.29 GB |
Dec 2006 | 64,831 | 176,216 | 758,431 | 2,010,967 | 69.00 GB |
Total | 1,126,366 | 2,734,316 | 11,825,814 | 32,318,775 | 1,204.13 GB |
Organizational Chart
CAIDA would like to acknowledge the many people who put forth great effort towards making CAIDA a success in 2006. The image below shows the functional organization of CAIDA. Please check the home page For more complete information about CAIDA staff.
CAIDA Functional Organization Chart
Funding Sources
CAIDA thanks our 2006 sponsors, members, and collaborators.
The charts below depict funds received by CAIDA during the 2006 calendar year.
Funding Source | Allocations | Percentage of Total |
---|---|---|
GIFT | $200,000 | 11.9% |
ARIN | $100,000 | 6.0% |
DHS PREDICT | $348,926 | 20.8% |
NSF DNS | $11,000 | 0.7% |
NSF Trends | $343,440 | 20.4% |
UCSD/SDSC | $80,356 | 4.8% |
NSF Infrastructure | $583,900 | 34.8% |
NSF NeTS | $12,000 | 0.7% |
Total | $1,679,622 | 100.0% |

Figure 1. Allocations by funding source received during 2006.
Program Area | Allocations | Percentage of Total |
---|---|---|
Data Sharing | $692,366 | 37.6% |
DNS | $11,000 | 0.6% |
Infrastructure | $723,256 | 39.2% |
Outreach | $2,500 | 0.1% |
Policy | $100,000 | 5.4% |
Routing | $12,000 | 0.7% |
Security | $102,534 | 5.6% |
Sponsors | $200,000 | 10.8% |
Total | $1,843,656 | 100.0% |

Figure 2. Allocations by program area received during 2006.
Operating Expenses
The charts below depict CAIDA's Annual Expense Report for the 2006 calendar year.
LABOR | Salaries and benefits paid to staff and students | ||
IDC | Indirect Costs paid to the University of California, San Diego including grant overhead (52-54%) and telephone, Internet, and other IT services. | ||
SUBCONTRACTS | Subcontracts to the Internet Systems Consortium (ISC) and Georgia Institute of Technology | ||
SUPPLIES & EXPENSES | All office supplies and equipment (including computer hardware and software) costing less than $5000. | ||
TRAVEL | Trips to conferences, PI meetings, operational meetings, and sites of remote monitor deployment. | ||
EQUIPMENT | Computer hardware or other equipment costing more than $5000. | ||
TRANSFERS | Exchange of funds between groups for recharge for IT desktop support and Oracle database services. |
Program Area | Expenses | Percentage of Total |
---|---|---|
Labor | $1,434,383 | 59.4% |
IDC | $624,789 | 25.9% |
Subcontract | $29,781 | 1.2% |
Supplies & Expenses | $106,829 | 4.4% |
Travel | $76,640 | 3.2% |
Equipment | $131,499 | 5.4% |
Transfers | $9,193 | 0.4% |
Total | $2,413,114 | 100.0% |

Figure 3. 2006 Operating Expenses
These numbers do not include salaries or expenses paid by the Computer Science & Engineering Department of the Jacobs School of Engineering at the University of California, San Diego.
Program Area | Expenses | Percentage of Total |
---|---|---|
Data Sharing | $770,571 | 31.9% |
DNS | $495,780 | 20.5% |
Infrastructure | $155,134 | 6.4% |
Outreach | $34,526 | 1.4% |
Policy | $192,934 | 8.0% |
Routing | $473,310 | 19.6% |
Security | $247,919 | 10.3% |
Topology | $42,940 | 1.8% |
Total | $2,413,114 | 100.0% |

Figure 4. 2006 Expenses by Program Area
Funding Source | Expenses | Percentage of Total |
---|---|---|
NSF-Trends | $526,142 | 21.8% |
NSF-Nets | $481,002 | 19.9% |
NSF-DNS | $378,415 | 15.7% |
NSF-Trusted Computing | $82,385 | 3.4% |
NSF-CRI | $15,778 | 0.7% |
NSF-Other | $244,390 | 10.1% |
Gift | $350,895 | 14.5% |
DHS | $245,533 | 10.2% |
UCSD/CNS | $88,574 | 3.7% |
Total | $2,413,114 | 100.0% |

Figure 5. 2006 Expenses by Funding Source