CAIDA's Annual Report for 2006

A report on CAIDA research activities, project progress and results, and financial status for 2006.

Mission Statement: CAIDA investigates both practical and theoretical aspects of the Internet, with particular focus on topics that:

are macroscopic in nature and provide enhanced insight into the function of Internet infrastructure worldwide,
improve the integrity of the field of Internet science,
improve the integrity of operational Internet measurement and management,
inform science, technology, and communications public policies.

Executive Summary
Research Projects
Infrastructure Projects
Tools
Data
Workshops
Publications
Presentations
Web Site Usage
Organizational Chart
Funding Sources
Operating Expenses

Executive Summary

This annual report covers CAIDA's activites in 2006, summarizing highlights from our research, infrastructure, and outreach activities. Our current research projects, primarily funded by the U.S. National Science Foundation (NSF), include several measurement-based studies of the Internet's core infrastructure, with focus on the health and integrity of the global Internet's topology, routing, addressing, and naming system. Our infrastructure activities, funded by NSF and DHS as well as other government and industry sources, include building a catalog of Internet measurement data sets, contributing to the (DHS-funded) PREDICT repository of datasets to support the (U.S.-based) network research community, and developing and deploying active and passive measurement infrastructure that cost-effectively supports the global Internet research community. We also lead and participate in tool development to support measurement, analysis, indexing, and dissemination of data from operational global Internet infrastructure. Finally, we engage in a variety of outreach activities, including web sites, peer-reviewed papers, technical reports, presentations, blogging, animations, and workshops. CAIDA's program plan for 2007-2010 will be available in July 2007.

Background and recent history

For just over ten years CAIDA has undertaken various approaches to narrowing a gap that now impedes the field of network research as well as telecommunications policy: a dearth of available empirical data on the public Internet since the infrastructure privatized. We have managed to grow and maintain a research group through increasingly difficult science funding periods, yet over the last decade our ability to get data from the commercial Internet has gradually diminished, as has the quality of science in the field of Internet research.

As an Internet data analysis and research group largely supported with public funding to apply measurement and analysis toward understanding and solving globally relevant Internet engineering problems, we accept a responsibility to seek, analyze, and communicate the salient features of the best available data about the Internet. And of course, we want better answers to the oft-asked question by graduate students: ``What Internet research problems are important to work on?'?

In 2003, frustrated with both the lack of access to data and the lack of progress in the Internet research community on a number of important problems in the last ten years, we began a survey to find answers to three questions: (1) What are the top operational and engineering problems in the Internet?; (2) Are we making measurable progress toward solving them?; (3) For problems that we're not making progress on, what is blocking progress?

These problems span four dimensions of the Internet as emerging critical infrastructure: safety, scability, sustainability, and stewardship. Measurement made the list of problems, no surprise there. At CAIDA we have experienced how difficult measurement of operational infrastructure has been, but the biggest problems with measurement had long been economic (cost of instrumentation and data management), ownership (legal access to data), and trust (privacy and security obstacles to measurement). More surprising was that measurement was not unique in this regard: all persistently unsolved operational problems of the Internet are similarly blocked on issues of economics, ownership, and trust (EOT).

Expansion of research agenda into policy and economics

This lesson meant a change in strategy for CAIDA. We recognize it is no longer appropriate to pursue solutions to the Internet's problems without tackling the EOT issues. CAIDA's activities have spanned the four S's (security, scalability, sustainability, stewardship) for years, but we have begun to re-focus current projects, and pursue new projects, that specifically recognize and openly navigate the economic and policy issues via activities that cross the boundaries between technology and policy. The table below lists examples of how we have expanded our horizons in the last couple of years. Most current projects will continue into 2007.

In addition to	we expanded our efforts to
collecting topology for use in modeling and simulation activities,	use the same data to provide a current ranking of Internet service providers according to sample observations of AS topology coverage (ASrank).
launching DatCat, for use by researchers to index data sets into a common catalog,	reach out to policymakers to show them how such a catalog could support informed policy making.
upgrading our measurement infrastructure and data curation tools to keep pace with technology changes and research community needs,	propose a project to provide otherwise non-existent incentive for operational networks to use these tools to contribute network measurements to the research community.
simulating routing protocols that could scale to billions of nodes,	simulate IPv4 address consumption scenarios for ARIN to support immediate policy needs.
providing the largest simultaneous collection of traffic to DNS root servers in support of the research community,	do macroscopic surveys of the health of the current DNS, and participate on two of ICANN's subcommittees (SSAC and RSSAC).

If you have comments or questions, please do not hesitate to contact us at info at caida dot org.

Research Projects

CAIDA's research projects span numerous domains related to network science and engineering. We hope to make a tangible impact on the field of (Inter)networking research by providing researchers with increased access to real world data sets for validation of scientific models and the promotion of reproducible science.

Domain Name System (DNS)

Improving the Integrity of Domain Name System (DNS) Monitoring and Protection

Goals

The main goal of our DNS project is to supply the research community with the best available, operationally relevant and methodologically sound DNS measurement data. We also develop tools, models, and analysis methodologies for use by DNS operators and researchers.

Activities

The DNS project spent its 2nd year focused on acquiring operationally relevant DNS data and making that data available to the research community.

DNS Root Servers Trace Collection
We conducted two large scale experiments collecting data from the DNS root servers. In September 2005, anycast instances of the C, E, and F-root servers participated in the first "Day in the Life of the DNS Root Servers" data collection event. The second collection took place in January 2006 and involved anycast instances of the C, F, and K root servers. The total volume of data (in compressed format) is 360 GB. This unique data set represents the most comprehensive measurements of the root servers and affords researchers an unprecedented insight into the specifics of root server operations.

Analysis of the collected DNS traffic showed that the current method for limiting the catchment areas of local instances appears to be successful. To determine whether clients actually use the instance closest to them, we examined client locations for each root instance, and the geographic distances between a server and its clients. We found that frequently the choice, entirely determined by BGP routing, is not the geographically closest one. Since additional distance between a client and a server causes higher latency, such clients might benefit from examination and optimization of routing for their local DNS root server instances.

Instance selection by BGP is highly stable. Over a two-day period <2% of both C-root and F-root clients and <5% of K-root clients experienced a change in instance of the anycast node they were using. Most DNS traffic is still using mostly the stateless UDP protocol for connections, so that the vast majority of clients would not be harmed by such changes, apart from the unavoidable delay created by BGP convergence. However, emerging DNS technologies to support security and other extensions to the original DNS architecture will require the use of TCP (stateful) connections, which could encounter problems in the face of anycast instability.

Overall, anycasting the DNS root nameservers not only extended the architectural limit of thirteen (13) DNS root servers, but also appears work well to improve the global robustness, resilience, and performance of the DNS.
Influence map of DNS root servers
Based on available data from anycast instances of the C, F, and K root servers, we developed a visualization that illustrates the geographical diversity of DNS root server clients and differences in deployment strategies of different root server nodes.

The visualization consists of two world maps for each root, both of which project the world as viewed from the North Pole. Each individual anycast instance is placed on the map at the 'center of influence' of its observed clients. To determine this location, we consider each of a server's clients as a point with mass=1 and use those client locations to compute the centroid. Thus, anycast root nodes that serve a primarily local clientele remain closer to their actual geographic locations on our map, while nodes that tend to serve a more geographically distant client set are displaced on our maps toward the regions where they have the most clients. In both maps, the location of each node reflects the centroid (center of influence) of its clients' geographic locations rather than the actual geographic location of the server.

Wedges fanning out from each server in the larger Location Map indicate the direction, distance, and number of clients observed within the bounding angle of the wedge. A smaller Displacement Map depicts servers whose actual location and centroid are markedly different.
RFC1918 updates
We completed analysis of the properties and sources of spurious RFC1918 updates that the root servers deflect to a specially created protective system of name servers known as AS112. (The so called RFC1918 or private addresses are intended strictly for use inside networks and should not leak to the outside world.) We conducted laboratory experiments to verify the behavior of the most recent Windows versions.

Analysis of update logs collected at two independent AS112 servers showed that leakage of DNS updates is a global network pollution problem, with update volume exceeding millions per hour. We applied three signature techniques to two short packet traces and found that over 97% of DNS updates come from various types of Windows systems and over 99% of unique IP addresses in our traces have at least one Windows machine at or behind it. Windows 2000 accounts for the majority of the update packets while Windows XP is more stringent in sending private DNS updates. Our long term data did not reveal an obvious decreasing trend in RFC1918 update rates due to the evolution of operating system. Users, software vendors, and system administrators should all take responsibility for the behavior of their systems and mitigate polluting behavior with appropriate actions
DNS Statistics Collector (dsc)
Under subcontract to CAIDA, Duane Wessels of The Measurement Factory developed the DNS Statistics Collector (dsc) tool during the first year of this project; dsc runs on selected instances of the C, E and F root servers, and is used by some of the OARC members to measure their own DNS infrastructure. With support from this project OARC is making 7-days delayed dsc statistics from the F-root publicly available.

We also installed a dsc collector and presenter on our primary nameserver (ns1.caida.org), a single processor Pentium-4 class server running FreeBSD 5.4. It serves about 5 domains and about 40 local machines. Using the default collector settings, we monitor what types of clients our nameserver has, where those clients are located, and what types of queries the server processes.
Open resolvers surveys
We initiated an ongoing survey that looks for open DNS resolvers. Open resolvers threaten the normal DNS functioning since they allow resource squatting, are easy to poison, and can be used in widespread Distributed Denial of Service (DDoS) attacks. Through a subcontract The Measurement Factory developed a web interface for immediate real-time testing of individually entered addresses. We probe each of our target IP addresses no more than once every three days. Since June 2006 we are archiving daily reports showing the number of open resolvers for each autonomous system number. As an example of this data, the report from Friday, June 22, 2007 includes 10,948 autonomous systems with 300,511 open resolvers.
Longitudinal DNS RTT data
Using NeTraMet traffic meters installed at various locations in the US, New Zealand, and Japan, we continued monitoring requests to, paired with responses from, root/gTLD servers generated by campus (i.e. large enterprise) networks. The resulting dataset is available for researchers. It contains information useful for evaluating performance conditions and trends on the global Internet. DNS RTTs are influenced by several factors, including remote server load, network congestion, route instability, and local effects such as link or equipment failures.
Anycast Simulation
In collaboration with Prof. G. Riley (Georgia Tech University), we began laboratory simulations to study the effectiveness of anycast deployment for DNS root servers. The goal is to determine the impact of BGP (mainly BGP convergence) on anycast deployment as well as possible effects from scaling up anycast deployment on DNS performance.

In our experiments, we simulate a realistic AS level Internet topology using data from CAIDA and the RouteViews project. We apply BGP++'s configuration generator to convert the inferred AS relationships to configuration files for our simulation. Next we model two kinds of failures.
1. Silent link failure: Silent link failures can be detected by triggering BGP holdtime timers. Although such failures could occur on any segment of the chosen best AS path, we restrict the current simulation to a single failure of the last hop AS link. In this failure mode, the client would still communicate with the same server (if it is up) through a different path.
2. Explicit withdrawal of an anycast prefix by an advertising AS: This failure makes the server unreachable which would cause the client to switch to a different anycast server node in this scenario.
The Georgia Tech team has initial results obtained for a small-sized AS topology for the BGP-ANYCAST simulations. In the link down case, the distribution of requests among server nodes changed insignificantly: the clients could still reach the same servers through other links. In the explicit prefix withdrawal case, the network quickly converged to a new state since the simulated graph was small and well-connected. The requests were re-distributed to other nodes, with only one flip for affected clients.

Major Milestones

DNS Root servers traces
- Presented results of our analysis at July 2006 IEPG meeting.
- Submitted a paper to PAM (accepted in 2007), "Two days in the life of the DNS anycast root servers" by Liu, Huffaker, Fomenkov, Brownlee and claffy.
- Indexed data analyzed in the paper into Datcat.
- Posted 2006 Influence map of DNS root servers.
RFC1918 analysis
- Published in CCR the paper "The windows of private DNS updates" by Broido, Shang, Fomenkov, Hyun, and claffy.
- Published explanations and instructions to end users on how to disable dynamic DNS updates on Microsoft Windows systems.
RTT data
- Released to the research community four years of data on RTTs from several campuses to root/gTLD servers.
Surveys
- Post daily reports identifying open DNS resolvers.
Outreach
- Made presentations at IEPG, RSSAC, WIDE, and NANOG.
  - "Two days in the life of three DNS root servers" (RSSAC and WIDE)
  - "DNS root traffic: how do anycast instances behave?" (IEPG)
  - "Identifying and Reducing Private DNS Updates" (WIDE)
  - "Traffic on TCP port 53 and DNS Response Sizes at U Auckland" (WIDE)
  - "DNS Cache Poisoners Lazy, Stupid, or Evil?"(NANOG)
- Co-organized with ISC and OARC the 2nd DNS-OARC workshop.

Student Involvement

Ziqian Liu, a visiting scientist from China, collaborated with CAIDA researchers in analysis of 48-hour tcpdump traces from 53 anycast root servers (4 from c-root, 33 from f-root, 16 from k-root). He examined geographical and topological coverage from each instance and analyzed the stability of client-to-server affinity.

Two graduate students from Georgia Tech, Sunitha Beeram and Talal Jaafar, worked on laboratory simulations of the anycast deployment effectiveness.

Security

The Collaborative Center for Internet Epidemiology (CCIED)

Goals

CAIDA participates in The Collaborative Center for Internet Epidemiology and Defenses (CCIED, "SeaSide"), a joint effort between researchers at UCSD and the International Computer Science Institute's Center for Internet Research. CCIED addresses critical challenges posed by large-scale Internet-based pathogens, such as worms and viruses. CCIED efforts focus on analyzing the behavior and limitations of Internet pathogens, developing early-warning and forensic capabilities, and developing technologies that can automatically defend against new outbreaks in real-time.

Activities

Analysis of the Blackworm email virus
The Nyxem virus is a 95 kb Visual Basic executable that infects a computer when an unwary user runs an executable email attachment. On the 3rd day of every month, the virus searches for files with 12 common file extensions (.doc, .xls, .mdb, .mde, .ppt, .pps, .zip, .rar, .pdf, .psd, and .dmp) on all available drives and replaces them with the text string "DATA Error [47 0F 94 93 F4 K5]".

The spread of most email viruses is extremely difficult to track due to the their spread mechanism -- using legitimate email addresses gathered from infected computers to spread to the people with whom virus victims normally interact. Unlike many email viruses, computers infected with Nyxem automatically generated a single http request for the url of an online statistics page. Each request for the statistics page was displayed on the page itself. Presumably this behavior was included in the virus so that the virus author could track its progress.

Logs from the online statistics page provided a relatively unique opportunity to track the global spread of an email virus. Unfortunately, uninfected spectators and numerous distributed denial-of-service attacks made it difficult to differentiate which IP addresses represented legitimate infections. Also, use of web proxies and DHCP influenced the number of infected computers represented by each IP address in the logs.

We estimate that between 469,507 and 946,835 computers in more than 200 countries were infected by the Nyxem/Blackworm virus between January 15 23:40:54 UTC 2006 and February 1 05:00:12 UTC. Interestingly, the geographic distribution of computers infected by the Nyxem virus differs significantly from general estimates of Internet usage in countries around the world. The geographic distribution of Nyxem-infected hosts also differs from that of random-spread Internet worms that do not require human intervention such as opening and running an executable attachment from an email. The virus disproportionately affected the Middle East and some countries of South America, particularly Peru.

At least 45,401 of the infected computers were also compromised by other forms of spyware or bot software. These spyware/botnet infections were advertised in the browser string of Nyxem-infected computers and thus were visible in the web logs documenting the spread of the virus.

An animation of the global spread of the Blackworm virus with the Cuttlefish tool.
Anomaly Detection
While accurate robust data collection and report generation for large events is largely solved, the ability to detect and report on novel or unusual events remains difficult. Current traffic volumes at both the enterprise and backbone level often prohibit inspection of every packet. Current sampling methods provide at best incomplete data, and at worst no data whatsoever, a about low-volume yet significant events, such as botnet command and control activities. Current research in this area focuses on algorithms to preferentially sample and report on new and unusual traffic, while also providing provably accurate summaries of regularly occurring traffic patterns.

Major Milestones

Provided Internet Service Providers with lists of Nyxem/Blackworm-infected computers so that customers could be warned before the virus destroyed their data.
Released web report and animation of the Blackworm/Nyxem Email Virus

Trusted Computing: Quantitative Network Security Analysis

Goals

Our work in quantitative security analysis attempts to move beyond anecdotal assessment of malicious events and accidental detection of compromised infrastructure. Because comprehensive security analysis is difficult, much work thus far on attack prevalence relies on surveys of impressions rather than direct measurement. We aim to develop and apply novel techniques to rigorously describe the volume of spoofed-source denial-of-service attacks, random-scanning Internet worms, and opportunistic host scanning. We also collect data and conduct analyses over a period of years, rather than days, so as to get a comprehensive view of trends in malicious activity over time. We process collected data into datasets useful to those studying various security phenomena and distribute those datasets to researchers. We also worked to develop a real-time monitor of the UCSD Network Telescope that provides an up-to date view of traffic behavior.

Many, if not most, successful attacks to compromise computational infrastructure are discovered accidentally rather than via established monitoring techniques. In some cases, technology to detect these attacks exists but is not economically viable to support and use. However the ability to detect other types of attacks, including identifying compromised routers and user accounts remains elusive. We made significant progress in developing techniques to detect these types of attacks in time to minimize the resulting damage.

Funding for this project ended in July 2006.

Major Milestones

DDOS Longitudinal Study
- In May 2006 we published in the ACM Transactions on Computer Systems (TOCS) a longitudinal study of denial-of-service backscatter activity between January 2001 and November 2004. We identified number, duration and focus of distributed denial-of-service attacks, and quantified attack characteristics, including volume, frequency, duration, and protocol. We also assessed DDoS victims, including targeted services, locations, and attack impact. In total, we observed over 68,000 attacks directed at over 34,000 distinct victim IP addresses -- ranging from well-known e-commerce companies such as Amazon and Hotmail to small foreign ISPs and dial-up connections. We believe our technique is the first to provide quantitative estimates of Internet-wide denial-of-service activity and that this paper describes the most comprehensive public measurements of such activity to date.
- We also released the data necessary to reproduce this study. For more information, see The CAIDA Backscatter-TOCS Dataset.
Real-time UCSD Network Telescope Monitor
Toward our goal of providing a real-time view of anomalous traffic reaching the UCSD Network Telescope, we designed, implemented, and tested new monitor software to process and store data, generate graphs, and display traffic reports based on such criteria as application, topological and geographic source and destinations, and transport type. We also adapted our denial-of-service attack identification and classification methods to work in real-time so that we could display separate views of denial-of-service attacks and scanning activities.

UCSD Network Telescope

Goals

The UCSD network telescope acts as a passive data collection system. The network telescope consists of a globally routed /8 network that carries almost no legitimate traffic. Because the legitimate traffic is easy to separate from the incoming packets, the network telescope provides us with a monitoring point for anomalous traffic that represents almost 1/256th of all IPv4 destination addresses on the Internet.

Because a network telescope (also known as a blackhole, an Internet sink, or a darknet) does not contain any real computers, there is no reason that legitimate traffic would be monitored by the network telescope. The network telescope collects traffic as a result of a wide range of events, including misconfiguration (e.g. a human being mis-typing an IP address), malicious scanning of address space by hackers looking for vulnerable targets, backscatter from random source denial-of-service attacks, and the automated spread of malicious software (worms).

Please see the UCSD Network Telescope home page for details about the telescope, related information and data access.

Activities

We have used data from the UCSD Network Telescope to analyze Denial-of-Service attacks, Internet worms, and malicious network scans:

Denial-of-Service Attacks
We used the telescope to monitor the spread of random-source distributed denial-of-service attacks. When an attacker wants to make it difficult for the attack victim (and the victim's ISP(s)) to block an incoming attack, the attacker uses a fake source IP address (similar to a fake return address in postal mail) in each packet sent to the victim. Because the attack victim cannot distinguish between incoming requests from an attacker and legitimate inbound requests, the denial-of-service attack victim tries to respond to every request packet it receives. When the attacker spoofs a source address randomly, the telescope will observe responses destined for the fake source addresses that are in the telescopes address space (there is no host at the address). By monitoring these unsolicited responses researchers can identify denial-of-service attack victims and infer information about the volume of the attack, the bandwidth of the victim, the location of the victim, and the types of services the attacker targets.
Internet Worms
Many Internet worms spread by randomly generating an IP address to be the target of an infection attempt and sending the worm off to that IP address in the hope that that address is in use by a vulnerable computer. Because the network telescope includes one out of every 256 IPv4 addresses, it receives approximately one out of every 256 probes from hosts infected with randomly scanning worms.
Malicious Network Scans
Malicious network scans include automated, semi-automated, and manual attempts to locate exploitable computers on the Internet. Scans often differ from other types of traffic visible on the network telescope because the scan traffic arriving at the telescope is not driven by chance. Rather, the attacker's motives in selecting scan targets appear arbitrary from the perspective of the recipient of the scan. The UCSD Network Telescope receives a multitude of scans continuously, including probes for the existence of a device at a given IP address, sequential scans of ports on a single IP address, methodical scans for a single or a small number of vulnerable ports in an IP address range, and even scans utilizing TCP resets.

Major Milestones

Data Collection
Collected, processed, and archived 19.4 TB of Network Telescope data.
Dataset Distribution
We released the following datasets:
- Witty Worm Dataset (released January 11, 2006)
- Code-Red Worms Dataset (released February 1, 2006)
- Backscatter-2004-2005 Dataset (released February 15, 2006)
- Backscatter-2006 Dataset (released April 7, 2006)
Realtime Report Generation
As detailed above, we implemented and tested new software to generate public, real-time reports of denial-of-service, worm, and scanning activities on the UCSD Network Telescope.

Routing

Toward Mathematical Rigorous Next-Generation Routing Protocols for Realistic Network Topologies

Goals

The Internet's routing system is facing stresses due to its poor fundamental scaling properties. The goal of this research project is to apply key theoretical results in distributed computation to an extremely practical purposes: truly scalable interdomain network routing.

Compact routing is a theoretical research field that studies fundamental limits of routing scalability and designs algorithms that try to meet these limits. In particular, compact routing research shows that shortest-path routing, forming a core of traditional routing algorithms, cannot guarantee routing table (RT) sizes that grow slower than linearly with network size. However, there are plenty of compact routing schemes that relax the shortest-path requirement and allow for improved scaling on static network topologies.

Activities

This year we collaborated with A.Brady and L.Cowen from Tufts to analyze several published compact routing schemes in terms of their average performance parameters (local memory space, stretch, control message sizes, etc.) on realistic Internet topologies as measured by skitter and DIMES. We also began to experiment with new fundamentally different approaches to scalable routing, in particular, exploring the possibility of connecting two currently disconnected areas of research: the Kleinberg model of greedy routing without updates and scale-free network topologies.

In the Internet modeling area we continued our analysis of the properties of the best currently available non-equilibrium network growth model: the Positive-Feedback Preference (PFP) model by Zhou and Mondragon. At the same time, in collaboration with S. Shakkottai from UIUC and T. Vest from USC, we developed a new model of Internet growth that attempts to derive preferential attachment behavior from empirically grounded economic realities of the Internet's AS topology.

Major Milestones

Analysis of compact routing schemes
- Investigated two name-independent schemes, one by Arias and its more recent optimal improvement by Abraham, on synthetic power-law graphs and on realistic Internet topologies.
- Investigated three name-dependent schemes, by Cowen, Thorup and Zwick, and Brady and Cowen, on synthetic power-law graphs and on realistic Internet topologies.
- Found that these schemes yield remarkably low values of memory and stretch on real Internet topologies. For example, the BC routing algorithm guarantees a maximum routing table less than O(e log²n), where n is the total number of nodes in the network. This scaling means that even if IPv6 addresses were fully de-aggregated, the maximum routing table size in such an Internet would still contain not more than ~16,000 entries (~1,000 for the deaggregated IPv4 space).
- Found that none of the analyzed approaches leads to an acceptable scalable routing solution since either the average stretch is too high (for the Arias scheme) or the number of updates remained unbounded (for the name-dependent schemes).
Internet evolution models
- Explained the success of the PFP model by the fact that it produces 2K-random graphs. One of the central elements of this model is its attempt to reproduce the so-called rich club connectivity property which severely constrains the joint degree distribution (JDD). We showed that JDD, in turn, fully defines the AS topology.
- Developed a new model of the Internet's growth. To the best of our knowledge, this is the first Internet evolution model that is realistic, analytically tractable, and entirely based on physical, that is, measurable, parameters.
- Verified that when we substitute the measured values of the parameters into analytic expressions of our Internet evolution model, it predicts and explains the observed value of the exponent of the power-law node degree distribution in real AS topologies.

Student Involvement

The following graduate students participated in this project and were responsible for various research tasks accomplished:

S. Shakkottai (UIUC) and T. Vest (USC) developed a new model of the evolution of the Internet's AS topology.

A. Brady (Tufts) analyzed the performance of existing compact routing schemes on real Internet topologies.

Topology

Macroscopic Topology Measurement, Analysis, and Modeling

Goals

CAIDA's topology research remains focused on three areas: macroscopic topology measurement; analysis of the observable ISP hierarchy; and topology modeling in support of routing research.

Activities

In 2006, CAIDA made large strides with activities in all three focus areas.

We continued our macroscopic topology measurements while preparing for an upgrade of our existing active probing system. We regularly post an update of the adjacency matrix of the Internet AS-level graph computed daily from skitter measurements.
We investigated the problem of inferring AS business relationships from available routing data. By reformulating the AS relationship inference problem as a MAX2SAT-based multiobjective optimization problem, we fixed a number of serious flaws in the AS relationship inference techniques previously considered state-of-the-art. We also developed an approach to infer peering links with unprecedented accuracy. Finally, we validated our inference results by surveying network administrators of about 40 ASs to collect information on the actual connectivity and policies of their ASs. These survey results confirmed high levels of accuracy: we correctly inferred 96.5% customer-to-provider (c2p), 82.8% peer-to-peer (p2p), and 90.3% sibling-to-sibling (s2s) relationships. Comparison of the reported AS connectivity with the AS connectivity data contained in BGP tables showed that BGP tables miss up to 86.2% of the true adjacencies of the surveyed ASs. The majority of the missing links were of the p2p type, meaning that, in reality, peering links are likely to be more dominant than previous measurement methods have been able to capture.

Based on our new relationship inference methodology, we improved our procedure to rank Autonomous Systems (AS Rank) by their connectivity and inferred business relationships. The ranking of each AS is a function of the number of IP prefixes advertised by this AS, its customer ASes, their customers ASs, and so on.

We introduced a radically new approach that utilizes machine learning techniques to map all the ASes in the Internet into a natural AS taxonomy: large ISPs, small ISPs, customer networks, university networks, Internet exchange points, and network information centers. The input to the classification algorithm includes the inferred AS relationships, WHOIS records, and a few other AS attributes. We successfully classified 95.3% of ASes with estimated accuracy of 78.1%. The resulting dataset annotates in detail every node in the AS topology and reveals vastly different network characteristics, external connectivity patterns, network growth tendencies, and other properties intrinsic to various types of ASes.
We revolutionized an approach to synthesizing random network topologies by introducing a new superior topology analysis method: dK-series. We define the dK-series of probability distributions by specifying all degree correlations within d-sized subgraphs of a given graph G. Increasing values of d capture progressively more properties of G at the cost of a more complex representation. The dK-series allow us to quantitatively express the distance between two graphs and to construct random graphs that accurately reproduce virtually all metrics proposed in the literature. Using our approach, we constructed graphs for d = 0, 1, 2, 3 and demonstrated that these graphs reproduced, with increasing accuracy, important properties of measured and modeled Internet topologies. We found that the d = 2 case is sufficient for most practical purposes, while d = 3 essentially reconstructs the Internet AS- and router-level topologies exactly.

The nature of the dK-series method implies that it will also capture any future structurally-based metrics that may be proposed. Therefore, this method represents a significant improvement to the set of tools available to network topology and protocol researchers and is relevant far beyond its initial scope.

Major Milestones

AS relationship inference technique
- Published " AS Relationships: Inference and Validation" in the ACM SIGCOMM Computer Communication Review (CCR) , v.37, n.1, pp. 29-40, 2007 by Dimitropoulos, Krioukov, Fomenkov, Huffaker, Hyun, Riley, and claffy
- Updated the interactive AS-ranking web page and posted weekly updates of the data at https://asrank.caida.org/.
- Publish and archive on a weekly basis the Internet AS-level topologies annotated with AS relationship information.
- Released the set of AS attributes we used to classify ASes, and the resulting empirical AS taxonomy
Topology modeling
- Published " Modeling Autonomous System Relationships"presented at the 20th Principles of Advanced and Distributed Simulation (PADS) by Dimitropolous and Riley.
- Published "Systematic Topology Analysis and Generation Using Degree Correlations"presented at SIGCOMM 2006 by Priya Mahadevan.
- Developed a new topology generator for constructing random graphs with defined correlations at levels d=2 and d=3.
- Sketched an approach for extending the topology generator to arbitrary d.
- Conducted the 1st Workshop on the Internet Topology (WIT).

Student Involvement

The following graduate students participated in this project:

X. Dimitropulos (Georgia Tech) developed new approaches to inferring AS business relationships, conducted an extensive survey of ISPs to verify his algorithms, and created the AS taxonomy based on machine learning algorithms.

P. Mahadevan (UCSD) analyzed AS topologies derived from three different data sources, simulated dK-distributions for d=1, 2, 3 and compared them with real Internet topologies, and developed the dK-random graph generator.

Economics and Policy

Cooperative Measurement and Modeling of Open Networked Syst ems (COMMONS)

Goals

With support from Cisco Systems and the National Science Foundation, the COMMONS project began to take root this year. The COMMONS Project hopes to conduct a scientific experiment that offers measurement technology and connectivity to a research and education backbone infrastructure in exchange for access to measurement of the resulting interdomain network.

Major Milestones

In its first year, the COMMONS project reported the following major milestones.

1st COMMONS Workshop
- On December 12-13, 2006 at the San Diego Supercomputer Center, CAIDA hosted 40 participants from across North America at the UCSD campus to discuss technical, policy, and operational issues related to the COMMONS initiative. The diversity of workshop participants helped ensure that many categories of stakeholders were represented in the proceedings and provided invaluable insights into the potential pitfalls and opportunities the COMMONS Project faces. We published a COMMONS Workshop Final Report.

Infrastructure Projects

Several of CAIDA's infrastructure projects cut across scientific and funding domains as they strive to provide long-term provision of data, services, and measurement infrastructure. In addition to raw data collection, distribution and archival efforts, the DatCat project offers improved metadata searching and indexing. Further, the PREDICT project develops legal and policy infrastructures for data sharing amongst data providers, data hosting sites, and researchers, respectively.

In 2006, CAIDA took over operational stewardship for all NLANR machines and data. CAIDA gracefully decommissioned obsolete hardware and data and attempted to apply the remaining resources to projects of similar intent and spirit.

DatCat: the Internet Measurement Data Catalog

The goal of this project is to build an Internet Measurement Data Catalog (IMDC) that will facilitate access, archiving, and long-term storage of Internet data as well as sharing the data among Internet researchers. After 3.5 years of development we finally launched the catalog in June 2006 at www.datcat.org. Funding for the project also ended in 2006, so we are currently seeking funding to maintain and extend it.

Goals

CAIDA had three goals for the creation of the IMDC database:

provide a searchable index of Internet data available for research
enhance documentation of datasets via a public annotation system
enable reproducible research in network science

Activities

In 2006, the Trends project completed its final year.

DatCat
Our main priority was the development of DatCat, the Internet Measurement Data Catalog. This searchable catalog archives meta-data for Internet measurement data sets and provides information on data locations and access. We opened DatCat for public viewing on June 12, 2006 and continued to index new data sets and to improve user interfaces throughout the year.

CAIDA hosted the CRAWDAD: a Community Resource for Archiving Wireless Data At Dartmouth team for a small workshop aimed at inter-operation between these two NSF-funded catalogs. We hope to be able to automatically share data and meta-data between the DatCat and CRAWDAD catalogs in Fall of 2007.
1st DatCat Community Contribution (DCC 1) Workshop
We created this workshop series, and hosted the 1st DatCat Community Contribution (DCC 1) Workshop, to provide community members with the resources they need to contribute data to DatCat. At the time of writing, the First DatCat Community Contribution Workshop (DCC 1) Collection contains 1,239 files from 9 subcollections and 4 publications.
Data Collection and Software Development
We collected and distributed data from passive monitors and 22 active monitors. 310 requests for data have been answered, with more than 220 research groups accessing our data.

We developed a new report generator tool that produces easily configurable, comprehensive, real-time summaries of network traffic on monitored links. The following traffic characteristics can be reported: packet, byte, and flow volume, number of source and destination IP addresses and Autonomous Systems, the geographic regions exchanging traffic, and the types of applications used. This metadata is also ideal information to index in DatCat.
Data Analysis
We continued ongoing efforts to characterize the applications used for peer-to-peer file-sharing exchange and their prevalence on the Internet. We expect to publish new results in this area in 2007.

Major Milestones

DatCat
- Opened DatCat to public viewing on June 12, 2006, initially with 12 collections of data from 3 organizations.
- Created browse interface that complements search interface by allowing users to view catalog contents without specific criteria in mind.
- Developed a comprehensive user Help section including a screen-shot based tutorial
- Augmented advanced search with additional query capabilities and made it more user friendly
- Added ability for users to add "note" annotations to any object
- Added BibTex citations to every DatCat object's detail page
Releasing CAIDA data to the community
- Prepared and released six new datasets:
  - Witty Worm Dataset (January 11, 2006)
  - Code-Red Worms Dataset (February 1, 2006)
  - Backscatter-2004-2005 Dataset (February 15, 2006)
  - Backscatter 2006 Dataset (April 7, 2006)
  - AS Taxonomy Dataset (April 14, 2006)
  - DNS Root Server/gTLD RTT Dataset (August 3,2006)
New traffic report generator
- Added geographic analysis of traffic.
- Added a flexible display so that users can view a variety of graphs and tables displaying different time periods.
- Added the ability to collect statistically valid sampled flows on high speed (OC49, OC192, 10GigE) links and display the sampled reports.

PREDICT: Network Traffic Data Repository to Develop Secure IT Infrastructure

Goals

The Protected Repository for the Defense of Infrastructure against Cyber Threats (PREDICT) was designed to provide sensitive security datasets to qualified researchers, while preserving privacy and preventing data misuse. PREDICT offers a secure technical and policy framework to process applications for data sharing from network providers, which includes tools for collection, processing, and hosting of data that will eventually be available through the program as well as secured infrastructure to support serving datasets to researchers. (Note that the PREDICT project has yet to launch, see www.predict.org for details.)

Activities

CAIDA's involvement in the effort, called the Protected Repository for the Defense of Infrastructure against Cyber Threats (PREDICT), includes assisting with: the technical framework to process applications for data; development of communications between Data Providers, Data Hosting Sites, and Researchers; collection and processing of data that will eventually be available through the program; developing and deploying infrastructure to allow CAIDA to serve datasets to researchers; and providing input for the development of a legal framework to support the project. In 2006, CAIDA helped develop Memoranda of Agreements to allow CAIDA to participate in the launch of the PREDICT program as a data provider and a data hosting site, serving OC48 backbone peering link data, denial-of-service backscatter data, Internet worm data, Network Telescope data, and IP topology data to approved researchers.

Major Milestones

Macroscopic Topology Data Collection
- We collected 623 GB (9,181 files) of macroscopic topology measurements for distribution via PREDICT, bringing our total macroscopic topology data collection to 3.5 TB.
Security Data Collection
- We collected 19.4 TB (8,545 files) of security measurements from the UCSD network telescope for distribution via PREDICT, bringing our total security data collection to 52.8 TB.
New Datasets Released
- Witty Worm Dataset (January 11, 2006)
- Code-Red Worms Dataset (February 1, 2006)
- Backscatter-2004-2005 Dataset (February 15, 2006)
- Backscatter-2006 Dataset (April 7, 2006)
- AS Taxonomy Repository (April 14, 2006)
- the DNS Root Server/gTLD RTT Dataset (August 3, 2006)

Archipelago (Ark): A Coordination-Oriented Measurement Infrastructure

Goals

This year, CAIDA began development of Archipelago (Ark), our next generation active measurement infrastructure supporting the Macroscopic Topology Project. We intend for Ark to eventually replace the previous skitter-based infrastructure. Inspired by the Community-Oriented Network Measurement Infrastructure (CONMI) Workshop Report, Ark aspires to achieve greater scalability and flexibility than our current measurement infrastructure and provides steps toward a community-oriented network measurement infrastructure intended to eventually allow collaborators to run their vetted measurement tasks on a security-hardened distributed platform.

We designed Ark with macroscopic topology as its first target application, using the scamper tool to measure topology and latency to large numbers of IPv4 and IPv6 addresses. Once Ark is established and stable, CAIDA intends for the infrastructure to support numerous other types of data collection, measurements and experiments, e.g., one-way loss, bandwidth estimation, DNS performance, router alias resolution, and more. Although geared toward active measurement, Ark could also enable some types of passive measurement.

Ark's design represents a new approach to coordinating the activities of a measurement infrastructure, including supporting multiple researchers attempting different types of measurements. Archipelago's coordination facility is called Marinda, loosely inspired by David Gelernter's tuple-space based Linda coordination language. Archipelago extends Gelernter's tuple space model with features needed to support a globally distributed measurement infrastructure that hosts heterogeneous measurements by a community of researchers.

Activities

The need to extend our aging active measurement architecture to support a wider variety of measurements spurred Ark's design and development. We upgraded the infrastructure to probe a broader range of dynamically generated IP addresses covering all /24 prefixes in the IPv4 address space. We also improved our mechanisms for signaling file and cycle completion.

Major Milestones

In its first year, the Ark project met the following milestones.

complete initial architecture and design of Archipelago, including security features, communication and coordination facilities, software configuration and installation, and data storage and management.
initial implementation and testing
Presentation:
- Young Hyun presented The Archipelago Measurement Infrastructure at the 7th CAIDA-WIDE Workshop, Nov 2006.

National Laboratory for Advanced Network Research (NLANR) Resource Stewardship

Goals

The National Laboratory for Applied Network Research (NLANR) Project officially ended February 28, 2006. The funding for the NLANR project expired June 30, 2006 and the National Science Foundation has no plans to continue support. Starting in July 2006, CAIDA took over operational stewardship for all NLANR machines and data. Some of the NLANR data and resources still offer value to the network science, research and development communities, and can help support other forms of network research. Since much of the equipment is too old to maintain, CAIDA conducted a careful audit, gracefully decommissioned obsolete hardware and data, and attempted to apply the remaining resources to similar projects at CAIDA. As we performed this audit, we informed the community of changes to operations of the machines at the sites. At any time hosting sites may decline participation in our measurement activities, but so far many have found our proposed use sufficiently relevant to continue hosting a measurement platform.

NLANR Resources

The NLANR.NET Domain and Services
CAIDA (temporarily) stewards the NLANR.NET domain, originally registered by Hans-Werner Braun, as well as eleven servers used to serve the various mail, web, application, and database services for the project, including serving the data collected by the Active Measurement Project (AMP) and Passive Measurement and Analysis (PMA) projects.
PMA Project Resource Reuse
Of 33 original passive collection sites, fourteen appear to have been decommissioned by NLANR staff prior to July 2006. CAIDA has decommissioned five sites with outdated hardware. Seven sites have sufficiently modern equipment that we hope to repurpose them for passive data collection and reporting. Of the remaining nine monitors, eight were connected to a router located at the Indianapolis Internet2 site. This router is no longer part of the Internet2 infrastructure, however, the instruments are capable of measuring oc192/10Gb links so we hope to repurpose them. The final site in Los Angeles hosts the lambdaMON optical network tap, but we have yet to revive it. CAIDA is still working with sites with viable hardware to get acceptable use policies in place to continue to conduct measurement experiments.
AMP Project Resource Reuse
Of over 150 original active collection sites, approximately 30 have hardware that CAIDA may be able to reuse for its Macroscopic Topology Measurements Project, specifically in the Archipelago infrastructure. We have carefully decommissioned other hosts previously associated with the extensive AMP mesh. To date, CAIDA has directly decommissioned over 100 hosts, and continues to work with sites on another 36 that are not reachable and/or we plan to physically remove from service.

Major Milestones

July 24, 2006, CAIDA ceased data collection at the PMA sites.
On approximately August 31, 2006, CAIDA ceased data collection on most of the AMP mesh. (It took several days to track down and disable all of the boxes).
The PRAGMA project made use of a subset of nodes we left operational for a demonstration presented in October 2006.

Further Requests for Information

CAIDA will happily provide more specific details regarding the status of any of the NLANR sites and data. If sites have questions or comments on these changes, or input regarding future measurement experiments, please send the queries to nlanr-info at caida dot org.

Tools

CAIDA's mission includes providing access to tools for Internet data collection, analysis and visualization to facilitate network measurement and management.

2006 Tool Development

CoralReef

The CoralReef Software suite, developed by CAIDA, provides a comprehensive software solution for data collect and analysis from passive Internet traffic monitors, in real time or from trace files. Real-time monitoring support includes system network interfaces (via libpcap), FreeBSD drivers for a number network capture cards, including the popular Endace DAG (OC3 and OC12, POS and ATM) cards. The package also includes programming APIs for C and perl, and applications for capture, analysis, and web report generation. This package is maintained by CAIDA developers with the support and collaboration of the Internet measurement community.

In 2006, CAIDA continued to develop CoralReef Software adding capabilities to collect realtime flow data on high speed (OC48, OC192/10GigE) links. New routines allow the software to adapt the sampling rate to the traffic mix thus providing provably accurate sampled flows while staying within fixed memory and reporting budget for all possible traffic mixes. Our approach is described in papers "Building a Better NetFlow" and "A Robust System for Accurate Real-time Summaries of Internet Traffic".

We also updated and improved a realtime traffic report generator that allows dynamic selection of displayed reports. We expect to release CoralReef version 3.8 in summer 2007.

Cuttlefish

Cuttlefish produces animated GIFs that reveal the interplay between the diurnal and geographical patterns of displayed data. By showing how the Sun's shadow covers the world map, cuttlefish yields a direct feeling for the time of day at a given geographic region, while moving graphs illustrate the relationship between local time and the visualized events. We released Cuttlefish early in 2006.

CAIDA acknowledges WIDE for their generous support of cuttlefish development.

	This visualization depicts the traffic volume to and from residential broadband customers of a major Japanese ISP between 7:00 am May 4th and 7:00 am May 5th 2005 UTC. Please refer to the Cuttlefish home page for more information about this visualization.
(Click the image for a larger version.)

See the Cuttlefish home page for more information, Cuttlefish examples or to download the software.

CAIDA Tools Download Report

The table below displays all the CAIDA developed tools distributed via our home page at https://www.caida.org/tools/ and the number of downloads during 2006.

Currently Supported Tools

Tool Description Downloads

coralreef A software suite to collect and analyze data from passive Internet traffic monitors. 813

dsc A system for collecting and exploring statistics from DNS servers. 127

dnsstat An application that collects DNS queries on UDP port 53 to report statistics. 313

dnstop is a libpcap application that displays tables of DNS traffic. 3141

sk_analysis_dump A tool for analysis of traceroute-like topology data. 224

walrus A tool for interactively visualizing large directed graphs in 3D space. 4288

libsea A file format and a Java library for representing large directed graphs. 493

Chart::Graph A Perl module that provides a programmatic interface to several popular graphing packages. Note: Chart::Graph is also available on CPAN.org. The numbers here reflect only downloads directly from caida.org, as download statistics from CPAN are not available. 359

plot-latlong A tool for plotting points on geographic maps. 213

Tool	Description	Downloads
coralreef	A software suite to collect and analyze data from passive Internet traffic monitors.	813
dsc	A system for collecting and exploring statistics from DNS servers.	127
dnsstat	An application that collects DNS queries on UDP port 53 to report statistics.	313
dnstop	is a libpcap application that displays tables of DNS traffic.	3141
sk_analysis_dump	A tool for analysis of traceroute-like topology data.	224
walrus	A tool for interactively visualizing large directed graphs in 3D space.	4288
libsea	A file format and a Java library for representing large directed graphs.	493
Chart::Graph	A Perl module that provides a programmatic interface to several popular graphing packages. Note: Chart::Graph is also available on CPAN.org. The numbers here reflect only downloads directly from caida.org, as download statistics from CPAN are not available.	359
plot-latlong	A tool for plotting points on geographic maps.	213

Past Tools (Unsupported)

Tool Description Downloads

Mapnet A tool for visualizing the infrastructure of multiple backbone providers simultaneously. 19895

GeoPlot A light-weight java applet creates a geographical image of a data set. 740

GTrace A graphical front-end to traceroute. 1543

otter A tool used for visualizing arbitrary network data that can be expressed as a set of nodes, links or paths. 1997

plotpaths An application that displays forward and reverse network path data. 177

plankton A tool for visualizing NLANR's Web Cache Hierarchy 107

Tool	Description	Downloads
Mapnet	A tool for visualizing the infrastructure of multiple backbone providers simultaneously.	19895
GeoPlot	A light-weight java applet creates a geographical image of a data set.	740
GTrace	A graphical front-end to traceroute.	1543
otter	A tool used for visualizing arbitrary network data that can be expressed as a set of nodes, links or paths.	1997
plotpaths	An application that displays forward and reverse network path data.	177
plankton	A tool for visualizing NLANR's Web Cache Hierarchy	107

Data

Data Collected in 2006

In 2006, CAIDA continued its data collection activities, adding over 20 terabytes of new data and bringing the total holdings to over 56 terabytes (uncompressed) of data.

Data Type File count First date Last date Total size (compressed) Total size (uncompressed)

Macroscopic Topology Measurements 9181 2006-01-01 2007-01-01 N/A 629 GB

Network Telescope 8545 2006-01-01 2007-01-01 6.5 TB 19.4 TB

Data Type	File count	First date	Last date	Total size (compressed)	Total size (uncompressed)
Macroscopic Topology Measurements	9181	2006-01-01	2007-01-01	N/A	629 GB
Network Telescope	8545	2006-01-01	2007-01-01	6.5 TB	19.4 TB

Data Distributed in 2006

We process the raw data into specialized datasets to increase its utility to researchers and to satisfy security and privacy concerns. In 2006, we released the following datasets:

Witty Worm Dataset (January 11, 2006)
Code-Red Worms Dataset (February 1, 2006)
Backscatter-2004-2005 Dataset (February 15, 2006)
Backscatter-2006 Dataset (April 7, 2006)
AS Taxonomy Repository (April 14, 2006)
the DNS Root Server/gTLD RTT Dataset (August 3, 2006)

While some of the resulting datasets are available to anyone without restriction, the access to others is restricted to academic, government, and non-profit researchers and CAIDA members. We ask for-profit companies to sponsor CAIDA in order to gain access to these restricted datasets. All data access is subject to Acceptable Use Policies (AUP) designed to protect the privacy of monitored communications, ensure security of network infrastructure, and comply with the terms of our agreements with data providers.

Publicly Available Data

These datasets require that users agree to an Acceptable Use Policy, but are otherwise freely available.

Data Set Unique visitors Number of Visits Data Downloaded

AS Rank 5,933 13,431 9.8 GB

AS Adjacencies 2,323 9,209 5.3 GB

AS Relationships 1,294 3,505 5.8 GB

Router Adjacencies 357 392 955 MB

Witty Worm Dataset 256 435 323 MB

Code-Red Worms Dataset 222 350 4.6 GB

AS Taxonomy** 134 135 63 MB

** AS Taxonomy dataset is included in a mirror of the GA Tech main AS Taxonomy site, and thus does not represent all access to this data.

Data Set	Unique visitors	Number of Visits	Data Downloaded
AS Rank	5,933	13,431	9.8 GB
AS Adjacencies	2,323	9,209	5.3 GB
AS Relationships	1,294	3,505	5.8 GB
Router Adjacencies	357	392	955 MB
Witty Worm Dataset	256	435	323 MB
Code-Red Worms Dataset	222	350	4.6 GB
AS Taxonomy**	134	135	63 MB
** AS Taxonomy dataset is included in a mirror of the GA Tech main AS Taxonomy site, and thus does not represent all access to this data.

Restricted Access Data

These datasets require that users:
- be academic or government researchers, or join CAIDA
- request an account and provide a brief description of their intended use of the data
- agree to an Acceptable Use Policy

Data Set Unique visitors Number of Visits Data Downloaded

Raw Topology Traces 249 2879 7.4 TB

Anonymized OC48 Peering Link Traces 198 636 2.2 TB

2003 Internet Topology Data Kit 107 242 39 GB

Backscatter Datasets* 77 407 1 TB

Witty Worm Dataset 65 148 165 GB

DNS Root/gTLD server RTT Dataset 9 18 25.4 MB

* The Backscatter-2004-2005 and Backscatter-2006 datasets were released in 2006.

Data Set	Unique visitors	Number of Visits	Data Downloaded
Raw Topology Traces	249	2879	7.4 TB
Anonymized OC48 Peering Link Traces	198	636	2.2 TB
2003 Internet Topology Data Kit	107	242	39 GB
Backscatter Datasets*	77	407	1 TB
Witty Worm Dataset	65	148	165 GB
DNS Root/gTLD server RTT Dataset	9	18	25.4 MB
* The Backscatter-2004-2005 and Backscatter-2006 datasets were released in 2006.

Restricted Access Data

This data was collected by the NLANR project. When this project came to an end in July 2006, CAIDA inventoried NLANR equipment and took over curation and distribution of NLANR data. CAIDA now maintains both the NLANR AMP and PMA public data repositories.

Data Set Unique visitors Number of Visits Data Downloaded

NLANR PMA Data 973 3,119 8.3 TB

NLANR AMP Data 298 2,422 10.68 GB

Data Set	Unique visitors	Number of Visits	Data Downloaded
NLANR PMA Data	973	3,119	8.3 TB
NLANR AMP Data	298	2,422	10.68 GB

Total Data Distributed in 2006

Unique visitors Number of Visits Data Downloaded

All Data Distributed 4,907 35,678 72.1 TB

	Unique visitors	Number of Visits	Data Downloaded
All Data Distributed	4,907	35,678	72.1 TB

Workshops

As part of our mission to investigate both practical and theoretical aspects of the Internet, CAIDA staff actively attend, contribute to, and host workshops relevant to research and better understanding of Internet infrastructure, trends, topology, routing, and security.

CAIDA Workshops

In 2006, CAIDA hosted the following workshops:

CAIDA-WIDE Workshops
The 6th CAIDA/WIDE Workshop was held on March 17-18, 2006 in Marina del Rey, CA and the 7th CAIDA/WIDE Workshop was held on November 3 - 4, 2006 in San Diego Supercomputer Center, La Jolla, CA. These biannual invitation-only workshops traditionally cover IPv6/IPv4 topology studies and DNS, as well as miscellaneous research and technical topics of mutual interest for CAIDA and WIDE participants.
ISMA Workshop on the Internet Topology (WIT)
On May 10-12, 2006, CAIDA hosted the first ISMA Workshop on the Internet Topology (WIT) supported by NSF, CAIDA and SDSC. The attendance at the workshop was by invitation only. The workshop produced a final report that appeared in ACM SIGCOMM Computer Communication Review (CCR), v.37, n.1, pp. 69-73, 2007.
2nd DNS-OARC Workshop
CAIDA and ISC organized and Microsoft hosted the 2nd DNS-OARC Workshop on November 16-17, 2006 in Redmond, WA. The focus of this workshop included DNS-related Internet measurements and research, operational co-ordination and trusted communication between root, TLD and other DNS and Internet operators, and the future evolution and governance of OARC. Participation was open to OARC members and by invitation to other parties interested in DNS research and operations.
1st COMMONS Workshop
The 1st COMMONS workshop (by invitation only) was held on December 12-13, 2006 at the San Diego Supercomputer Center in La Jolla, CA. This Workshop brought together representatives from industry, community and municipal networks, regional and state networks, as well as Internet researchers, community organizers, and developers building next-generation data communications technologies. Workshop participants also included heads of research, infrastructure, media, and policy organizations, as well as telecommunications lawyers. This unique event produced the COMMONS Workshop Final Report.

Attended meetings

In 2006 CAIDA staff attended the following workshops and conferences and made presentations there:

ACM SIGCOMM Special Interest Group on Data Communications
The OECD ICCP Workshop "The Future of the Internet"
The 3rd Annual Workshop on Flow Analysis (FloCon 2006)
Internet Engineering Task Force (67th IETF) and corresponding meeting of the Internet Engineering Planning Group
The Intimate 2006 Network Anomaly Diagnosis Workshop
The North American Network Operators' Group (NANOG36)
The Alternative Telecommunications Policy Forum
Unofficial DNS Root Server System Advisory Committee Site (RSSAC)
Passive and Active Measurement (PAM) Workshop
The Quilt 2006 Workshop, Muni Wireless Conference
Digital Inclusion Day
5th System Administration and Network Engineering Conference (SANE)

Publications

The following table contains the papers published by CAIDA for the calendar year of 2006. Please refer to Papers by CAIDA on our web site for a comprehensive listing of publications.

Year	Author(s)	Title	Publication	Unique Hits
2006	Donnet, B. Huffaker, B. Friedman, T. claffy, k.	Approche Récursive d'Etiquetage des Chemins Alternatifs au Niveau IP	Colloque Francophone d'Ingénierie des Protocoles (CFIP)	0
2006	Donnet, B. Huffaker, B. Friedman, T. claffy, k.	Evaluation of a Large-Scale Topology Discovery Algorithm	IEEE International Workshop on IP Operations & Management	0
2006	Mahadevan, P. Krioukov, D. Fall, K. Vahdat, A.	Systematic Topology Analysis and Generation Using Degree Correlations	SIGCOMM	825
2006	Broido, A. Shang, H. Fomenkov, M. Hyun, Y. claffy, k.	The Windows of Private DNS Updates	ACM SIGCOMM Computer Communications Review (CCR)	1379
2006	Moore, D. Shannon, C. Brown, D. Voelker, G. Savage, S.	Inferring Internet Denial-of-Service Activity	ACM Transactions on Computer Systems	2432
2006	Dimitropoulos, X. Riley, G.	Modeling Autonomous System Relationships	Principles of Advanced and Distributed Simulation (PADS)	667
2006	claffy, k. Crovella, M. Friedman, T. Shannon, C. Spring, N.	Community-Oriented Network Measurement Infrastructure (CONMI) Workshop Report	ACM SIGCOMM Computer Communications Review (CCR)	1365
2006	Donnet, B. Huffaker, B. Friedman, T. claffy, k.	Implementation and Deployment of a Distributed Network Topology Discovery Algorithm	arXiv cs.NI/62	0
2006	Dimitropoulos, X. Krioukov, D. Riley, G. claffy, k.	Revealing the Autonomous System Taxonomy: The Machine Learning Approach	Passive and Active Network Measurement Workshop (PAM)	1504
2006	Mahadevan, P. Krioukov, D. Fomenkov, M. Huffaker, B. Dimitropoulos, X. claffy, k. Vahdat, A.	The Internet AS-Level Topology: Three Data Sources and One Definitive Metric	ACM SIGCOMM Computer Communications Review (CCR)	2693

Presentations

The following table contains the presentations and invited talks published by CAIDA for the calendar year of 2006. Please refer to Presentations by CAIDA on our web site for a comprehensive listing.

Year	Month	Presenter(s)	Title	Venue	Unique Hits
2006	Dec	Keys, K	Introduction to CSS for HTML	CAIDA	216
2006	Nov	Shannon, C	Internet Measurement Data Catalog and Security Research Overview	WIDE	142
2006	Nov	Krioukov, D	Scalability of Routing: Compactness and Dynamics	IETF	99
2006	Nov	Hyun, Y	The Archipelago Measurement Infrastructure	WIDE	314
2006	Nov	Huffaker, B	Two days in the life of three DNS root servers	WIDE	145
2006	Nov	Huffaker, B	Two days in the life of three DNS root servers	RSSAC	145
2006	Oct	Moore, D	Anomaly Sampling (bringing diversity to network security)	Flocon	103
2006	Oct	Meinrath, S	LANs, MANs, & Beyond: Community Intranets & the COMMONS Project.	Alternative Telecommunications Policy	146
2006	Oct	claffy, k	the Internet as emerging critical infrastructure: what needs to be measured?	The Quilt	162
2006	Oct	claffy, k	Cooperative Measurement and Modeling of Open Networked Systems (COMMONS) proposal	Muni Wireless Conference	189
2006	Jul	Moore, D	Anomaly Sampling (bringing diversity to network security)	Intimate Workshop	207
2006	Jul	Brownlee, N	DNS root traffic: how do anycast instances behave?	IEPG	969
2006	Jun	Krioukov, D	Something We Always Wanted to Know about ASs: Relationships and Taxonomy	LIP6	156
2006	Jun	Krioukov, D	dK-series: Systematic Topology Analysis and Generation Using Degree Correlations	LIAFA	140
2006	Jun	Huffaker, B	CAIDA Recent and Future Work	WIDE	370
2006	May	Mahadevan, P	A Basis for Systematic Analysis of Network Topologies	WIT	299
2006	May	Krioukov, D	Power laws as a pre-asymptotic regime of the PFP mode	WIT	214
2006	May	Fomenkov, M	The Joint Degree Distribution as a Definitive Metric of the Internet AS-level Topologies	WIT	280
2006	May	Dimitropoulos, X	A New Approach to Topology Generation: Random Annotated Graphs	WIT	132
2006	May	claffy, k	Internet measurement: what have we learned?	SANE	2190
2006	Apr	Krioukov, D	Flat Routing on Curved Spaces	UC Berkeley	128
2006	Mar	Vest, T	toward an empirical network macro-economics	WIDE	360
2006	Mar	Vest, T	The Wealth of Networks	SDSC	1415
2006	Mar	Shang, H	Identifying and Reducing Private DNS Updates	WIDE	340
2006	Mar	Kim, H	Classifying the Internet Backbone Traffic	WIDE	591
2006	Mar	Huffaker, B	cuttlefish: diurnal and geographic animated GIF generating tool	WIDE	420
2006	Mar	Huffaker, B	IPv4 Consumption: the end	WIDE	426
2006	Mar	claffy, k	the Internet as emerging critical infrastructure: what needs to be measured?	OECD	1230
2006	Mar	claffy, k	CAIDA Update	WIDE	365
2006	Mar	Brownlee, N	Traffic on TCP port 53 and DNS Response Sizes at U Auckland	WIDE	758
2006	Feb	Wessels, D	DNS Cache Poisoners Lazy, Stupid, or Evil?	NANOG	814
2006	Feb	claffy, k	caida priorities: 2006-2008	WIDE	461

Web Site Usage

In 2006, CAIDA's web site continued to attract considerable attention from a broad, international audience. Visitors seem to have particular interest in CAIDA's tools and analysis.

The table below presents the monthly history of traffic to www.caida.org for 2006.

Month	Unique visitors	Number of visits	Pages	Hits	Bandwidth
Jan 2006	123,644	274,068	1,312,791	3,254,556	92.29 GB
Feb 2006	123,912	330,929	1,443,084	3,807,364	120.28 GB
Mar 2006	109,712	421,473	2,039,879	3,914,781	128.43 GB
Apr 2006	97,168	241,669	1,171,131	2,712,955	138.25 GB
May 2006	82,183	236,902	1,158,110	2,698,210	114.66 GB
Jun 2006	114,924	218,982	954,935	3,429,410	97.45 GB
Jul 2006	70,616	164,026	789,864	2,105,385	80.12 GB
Aug 2006	72,364	140,401	548,868	1,799,446	73.74 GB
Sep 2006	76,076	173,234	557,594	1,849,575	69.04 GB
Oct 2006	119,373	184,137	575,048	2,843,483	149.63 GB
Nov 2006	71,563	172,279	516,079	1,892,643	71.29 GB
Dec 2006	64,831	176,216	758,431	2,010,967	69.00 GB
Total	1,126,366	2,734,316	11,825,814	32,318,775	1,204.13 GB

Organizational Chart

CAIDA would like to acknowledge the many people who put forth great effort towards making CAIDA a success in 2006. The image below shows the functional organization of CAIDA. Please check the home page For more complete information about CAIDA staff.

[Image of CAIDA Functional Organization Chart]

CAIDA Functional Organization Chart

Funding Sources

CAIDA thanks our 2006 sponsors, members, and collaborators.

The charts below depict funds received by CAIDA during the 2006 calendar year.

Funding Source	Allocations	Percentage of Total
GIFT	$200,000	11.9%
ARIN	$100,000	6.0%
DHS PREDICT	$348,926	20.8%
NSF DNS	$11,000	0.7%
NSF Trends	$343,440	20.4%
UCSD/SDSC	$80,356	4.8%
NSF Infrastructure	$583,900	34.8%
NSF NeTS	$12,000	0.7%
Total	$1,679,622	100.0%

Figure 1. Allocations by funding source received during 2006.

Program Area	Allocations	Percentage of Total
Data Sharing	$692,366	37.6%
DNS	$11,000	0.6%
Infrastructure	$723,256	39.2%
Outreach	$2,500	0.1%
Policy	$100,000	5.4%
Routing	$12,000	0.7%
Security	$102,534	5.6%
Sponsors	$200,000	10.8%
Total	$1,843,656	100.0%

Figure 2. Allocations by program area received during 2006.

Operating Expenses

The charts below depict CAIDA's Annual Expense Report for the 2006 calendar year.


LABOR	Salaries and benefits paid to staff and students
IDC	Indirect Costs paid to the University of California, San Diego including grant overhead (52-54%) and telephone, Internet, and other IT services.
SUBCONTRACTS	Subcontracts to the Internet Systems Consortium (ISC) and Georgia Institute of Technology
SUPPLIES & EXPENSES	All office supplies and equipment (including computer hardware and software) costing less than $5000.
TRAVEL	Trips to conferences, PI meetings, operational meetings, and sites of remote monitor deployment.
EQUIPMENT	Computer hardware or other equipment costing more than $5000.
TRANSFERS	Exchange of funds between groups for recharge for IT desktop support and Oracle database services.

Program Area	Expenses	Percentage of Total
Labor	$1,434,383	59.4%
IDC	$624,789	25.9%
Subcontract	$29,781	1.2%
Supplies & Expenses	$106,829	4.4%
Travel	$76,640	3.2%
Equipment	$131,499	5.4%
Transfers	$9,193	0.4%
Total	$2,413,114	100.0%

Figure 3. 2006 Operating Expenses

These numbers do not include salaries or expenses paid by the Computer Science & Engineering Department of the Jacobs School of Engineering at the University of California, San Diego.

Program Area	Expenses	Percentage of Total
Data Sharing	$770,571	31.9%
DNS	$495,780	20.5%
Infrastructure	$155,134	6.4%
Outreach	$34,526	1.4%
Policy	$192,934	8.0%
Routing	$473,310	19.6%
Security	$247,919	10.3%
Topology	$42,940	1.8%
Total	$2,413,114	100.0%

Figure 4. 2006 Expenses by Program Area

Funding Source	Expenses	Percentage of Total
NSF-Trends	$526,142	21.8%
NSF-Nets	$481,002	19.9%
NSF-DNS	$378,415	15.7%
NSF-Trusted Computing	$82,385	3.4%
NSF-CRI	$15,778	0.7%
NSF-Other	$244,390	10.1%
Gift	$350,895	14.5%
DHS	$245,533	10.2%
UCSD/CNS	$88,574	3.7%
Total	$2,413,114	100.0%

Figure 5. 2006 Expenses by Funding Source

Contents

2006 Tool Development

CAIDA Tools Download Report

Currently Supported Tools

Past Tools (Unsupported)