Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis
CAIDA's Annual Report for 2009
A report on CAIDA research initiatives, project progress and results, data sets, tool development, publications, presentations, workshops, web site statistics, funding sources, and operating expenses for 2009.

Mission Statement: CAIDA investigates practical and theoretical aspects of the Internet, focusing on activities that:

  • provide insight into the macroscopic function of Internet infrastructure, behavior, usage, and evolution,
  • foster a collaborative environment in which data can be acquired, analyzed, and (as appropriate) shared,
  • improve the integrity of the field of Internet science,
  • inform science, technology, and communications public policies.

Contents


Executive Summary

This annual report covers CAIDA's activities in 2009, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our current research projects span topology, routing, traffic, economics, and policy. Our infrastructure activities support several measurement-based studies of the Internet's core infrastructure, with focus on the health and integrity of the global Internet's topology, routing, addressing, and naming systems.

We made significant advances (again...) in Internet topology research, supported by the expanding Ark measurement infrastructure and growing interest in understanding more about the Internet's robustness, security, and scalability. We continue to share the largest Internet topology data sets (IPv4 and IPv6) available to academic researchers, and we share many aggregated annotated derivative data sets publicly, including rankings of ISPs annotated with (our estimated) business relationships between autonomous networks. Our topology measurement platform supports IPv6, and ten of our hosting sites provide IPv6 connectivity. We have developed substantial additional software to better support distributed measurement experiments. Specific to our IPv4 topology mapping project, we have taken on the task of optimizing and improving on existing techniques for IP address alias resolution for large Internet graphs, and are planning to package up and release an implementation of our algorithms next year. In 2009 we expanded the capability of other researchers to use the Ark infrastructure for independent experiments, including an extensive Internet-wide test of network filtering hygiene.

On the theoretical side of topology research, we finally published our topology modeling framework that treats annotations as an extended correlation profile of a network, which supports rescaling topologies while retaining the same (measured) annotation profile. We also advanced our exploration of geometric structure underlying Internet-like topologies as observed in our and other measurements. Specifically, hyperbolic geometry captures an important property of complex networks: exponential expansion in space. We explored even deeper connections between network topological structure (e.g., degree distribution, clustering) and physical phenomena such as curvature and temperature.

These discoveries about topology drive our routing research agenda, a long-term objective of which is to enable dramatically more scalable global Internet routing. We explored the ramifications of the discoveries we made last year regarding efficient routing on graph topologies statistically similar to those of the Internet. Based on the evidence, e.g, clustering, observable on the Internet and other complex networks, we found that underlying hyperbolic hidden metric spaces provide a natural explanation for why so many of these complex networks found in nature can achieve such phenomenally efficient (greedy) routing without distributing global topology knowledge. Since the distribution of global knowledge about network structure is perhaps the most critically limiting requirement of the current Internet interdomain routing system, we are still investigating theoretical details of a potentially radical solution to Internet routing scalability, which takes advantage of what nature knows that we do not (yet).

We undertook several traffic analysis activities, including creating a structured taxonomy of Internet traffic classification papers and their data sets, and analyzing the "Day in the Life of the Internet" 2009 data set, consisting of 24 hours of detailed DNS packet data collected at many participating root servers as well other high-profile DNS servers. We have reduced our traffic analysis activities in lieu of pursuing progress in the policy space through participation in DHS's PREDICT project (Protected Repository of Data for Internet Cyber Threats). As part of this project, we have proposed a more flexible privacy-sensitive data-sharing framework and an experiment to test it on the UCSD network telescope instrumentation next year.

We are growing the scope of our economics and policy research. We responded to several requests from Internet governance as well as U.S. government agencies for comments and guidance on policy matters. We launched a workshop series in Internet economics, to try to begin framing a research agenda for the emerging but stunted field of Internet infrastructure economics. On the theoretical side, we published an analytically tractable model of Internet evolution at the level of Autonomous Systems (ASes), which builds on the preferential attachment (PA) model but captures fundamental differences between transit and non-transit networks. This multi-class PA model predicts a definitive set of statistics characterizing the AS topology structure, closing the "measure-model-validate-predict" loop, and providing further evidence that preferential attachment is the main driving force behind Internet evolution.

Finally, we engaged in a variety of tool development, data-sharing, and outreach activities, including web sites, peer-reviewed papers, technical reports, presentations, blogging, animations, and workshops. Details of our activities are below. CAIDA's program plan for 2010-2013 is available at https://www.caida.org/home/about/progplan/progplan2010/. Please do not hesitate to send comments or questions to info at caida dot org.


Research Projects


Topology

Macroscopic Topology Measurements, Analysis, and Modeling

Goals

CAIDA's topology research agenda includes three strategic areas: 1) macroscopic topology measurement; 2) analysis of the observable AS-level and router-level hierarchy; 3) topology modeling in support of routing research.

Activities

  1. Macroscopic Topology Measurements:
    1. We continued large-scale macroscopic topology measurements using Archipelago (Ark), our state-of-the-art global measurement platform. We completed the second full calendar year of the IPv4 Routed /24 Topology Dataset. By the end of 2009, we increased the number of vantage points to 40 Ark monitors deployed in 22 countries.
    2. We added more monitors with native IPv6 connectivity to the Ark infrastructure. As of the end of 2009, Ark had 10 monitors collecting the IPv6 Topology Dataset for researchers to get a view of the emerging IPv6 global topology.
    3. We continued to collect automated DNS reverse lookups for IP addresses discovered by the Ark probes and annotated the IPv4 topology data with corresponding DNS names.
  2. Analysis of the Observable Topology:
    1. We improved our measurement techniques and analysis methodologies for alias resolution inferences. We use the Ark platform and run the following three tools: kapar, iffinder and MIDAR. We then combine the outcomes in order to map IPs to routers as accurately and completely as feasible. Using publicly available data from many networks and ground-truth data provided to us by a large ISP, we tested the efficiency and veracity of various combinations of alias resolution methods. Our preliminary results were submitted to ACM Computer Communications Review (CCR), and appeared ("Internet-Scale IP Alias Resolution Techniques") in the January 2010 issue.
    2. We continued to produce the AS-level topologies annotated with business relationships between ASes dataset on a bi-weekly basis. We use our published algorithms to infer these relationships, recognizing their directional nature, and annotate each link in an AS topology as a customer-provider or a peer-to-peer (settlement-free interconnection) relationship.
    3. We created a new version of our popular AS Core Graph visualizations for both IPv4 and IPv6 address space using January 2009 data collected by Ark monitors.
  3. Topology Modeling:
    1. We introduced a network topology modeling framework that treats annotations as an extended correlation profile of a network. The framework includes an algorithm to rescale and construct networks of varying size that still reproduce the original measured annotation profile. These results are published in a paper "Graph Annotations in Modeling Complex Network Topologies" in ACM Transactions on Modeling and Computer Simulation (TOMACS).
    2. We developed an analytically tractable model of Internet evolution at the level of Autonomous Systems (ASes) -- the multi-class preferential attachment (MPA) model. All of the model parameters are measurable from available Internet topology data. Given the estimated values of these parameters, our analytic results predict a definitive set of statistics characterizing the AS topology structure that is not part of the model formulation. The MPA model thus closes the "measure-model-validate-predict" loop, and provides further evidence that preferential attachment is the main driving force behind Internet evolution. The results were published in "Evolution of the Internet AS-Level Ecosystem", presented at the First International Conference on Complex Sciences: Theory and Applications (Complex'2009).
    3. We established a connection between observed scale-free topologies and hidden hyperbolic geometries of complex networks. Space expands exponentially in hyperbolic geometry, and scale-free topologies emerge as a consequence of this exponential expansion. Fermi-Dirac statistics connects observed topology to hidden geometry: observed edges are fermions, hidden distances are their energies; the curvature of the hidden space affects the heterogeneity of the degree distribution, while clustering is a function of temperature. Understanding the connection between topology and geometry of complex networks contributes to studying the efficiency of their functions, and may find practical applications in many disciplines, ranging from Internet routing to brain, cell signaling, or protein folding research. We published the paper "Curvature and Temperature of Complex Networks" in Physical Review E.
    4. We showed that the global structure of some real networks is statistically determined by the distributions of local motifs (small building blocks of complex networks) of size at most 3, once we augment motifs to include node degree information. We applied our analysis to various complex networks, such as: a social web of trust, protein interactions, scientific collaborations, air transportation, the Internet, and a power grid. In all cases except the power grid, random networks that maintain the degree-enriched connectivity profiles for node triples in the original network reproduce all its local and global properties. Therefore, network topology generators are guaranteed to reproduce essential local and global network properties as soon as they reproduce 3-node connectivity statistics. Our results are published on our web site ("How Small Are Building Blocks of Complex Networks") and in arxiv.

Major Milestones

Funding Sources

Our topology research received support from:


Routing

Toward Mathematically Rigorous Next-Generation Routing Protocols for Realistic Network Topologies

Goals

The primary objective of CAIDA's research in Internet routing is to develop and evaluate solutions to the impending routing scalability problems. Our relevant activities focused on two related sub-topics: greedy routing based on hidden metric spaces underlying real networks; and the relationship between routing efficiency and the structure of the network topology. While motivated by Internet routing, our work in this area has profound implications for network science in other disciplines (physics, biology, chemistry, social sciences).

Activities

  1. We studied the process of routing information through networks as a universal phenomenon existing in both natural and man-made complex systems. In many complex networks found in nature, nodes communicate efficiently even without full knowledge of global network connectivity. We demonstrated that the peculiar structural characteristics of observable complex networks is consistent with maximizing communication efficiency when using greedy routing approaches without global knowledge. We also described a general mechanism that explains this connection between network structure and function, in "Navigability of complex networks" published in Nature Physics and given significant press coverage.
  2. Random scale-free networks are ultrasmall worlds since the average length of the shortest paths in networks of size N scales as lnlnN. We showed that these ultrasmall worlds can be navigated in ultrashort time. Greedy routing on scale-free networks embedded in metric spaces uses only local information yet finds asymptotically the shortest paths, direct computation of which requires global topology knowledge. Our findings imply that the peculiar structure of complex networks ensures that the lack of global topological awareness has asymptotically no impact on the length of communication paths. These results have important consequences for communication systems such as the Internet, where maintaining knowledge of current topology is a major scalability bottleneck. We published "Navigating Ultrasmall Worlds in Ultrashort Time" in Physical Review Letters. This paper received favorable press coverage in Nature, NewScientist, and PhysOrg.
  3. We showed that complex (scale-free) network topologies naturally emerge from hyperbolic metric spaces. The negatively curved hyperbolic spaces also ensure extremely efficient greedy forwarding on these topologies, achieving almost 100% reachability and optimal (i.e., shortest) path lengths, even under dynamic network conditions. Our findings suggest that forwarding information through complex networks like the Internet may be possible without the current overhead of routing protocols, and may also find practical applications in overlay networks for tasks such as application-level routing, information sharing, and data distribution. These results are published in "Greedy Forwarding in Scale-Free Networks Embedded in Hyperbolic Metric Spaces" in ACM SIGMETRICS Performance Evaluation Review.

Major Milestones

Student Involvement

An undergraduate student Connie Liu developed informative visualizations representing hyperbolic spaces and other routing research results.

Funding Sources

Our routing research received support from:


Traffic Analysis

Internet traffic measurement, classification, and analysis

Goals

CAIDA has a long history of passive traces acquisition and curation aimed at traffic monitoring, classification, and workload characterization. In 2009 we continued to host visiting researchers who, in collaboration with CAIDA researchers, analyzed properties of available traces.

Activities

  1. With the help of visiting scholar Mia Zhang, we created a structured taxonomy of Internet traffic classification papers and their data sets.
  2. Visiting scholars Maurizio Dusi and Wolfgang John developed a flow-based symmetry estimation tool to evaluate routing asymmetry in Internet traffic.
  3. Maurizio Dusi developed and tested his new tool, gt, which gathers and indexes ground truth information about passively collected network traffic. A paper describing the tool, "GT: picking up the truth from the ground for Internet traffic" was published in ACM SIGCOMM Computer Communication Review (CCR).
  4. Working with traffic traces from backbone links in the US and in Sweden collected over the period 2002-2009, visiting scholars Wolfgang John and Mia Zhang analyzed UDP traffic in the Internet. They found that most UDP flows use random high ports and carry few packets with little content, consistent with UDP's role in signaling protocols for increasingly popular P2P applications.

Major Milestones

Student Participants

CAIDA hosted the following visiting graduate student scholars:

  • Maurizio Dusi from the Universita di Brescia, Italy
  • Mia Zhang from Beijing Jiatung University, China
  • Wolfgang John from Chalmers University of Technology, Sweden

Funding Sources

Our traffic research received indirect support from the following institutions through their generous sponsorship of capable graduate students to visit our lab to collaborate on research.


DNS

Improving the Integrity of Domain Name System (DNS) Monitoring and Protection

Goals

CAIDA researchers conduct DNS measurements and develop tools, models, and analysis methodologies for use by DNS operators and researchers.

Activities

NSF funding supporting CAIDA DNS research ended in August 2009. However, we continued collection and analysis of data from the DNS root nameservers continuing the series of annual Day-in-the-life-of-the-Internet (DITL) experiments.

  1. In collaboration with ISC and OARC, we held the fourth large-scale data collection event on March 30 - April 1, 2009 (DITL 2009). We captured tcpdump traces at nearly all anycast instances of the A, C, E, F, H, K, L, and M root servers as well as numerous AS112, gTLD and ccTLD domain servers. The 2009 collection spans three full days of continuous capture. This unique dataset again represents the most comprehensive measurements of the root servers to date, and provides researchers with unprecedented insight into root server workload characteristics and performance. OARC published a summary of the collection event. These data are available to the research community via the DNS-OARC. Academic researchers can participate in the DNS-OARC for free.
  2. We also capture tcpdump traces of these DNS queries for other potential annotations and for analysis of EDNS0, DNSSEC, and other emerging protocols.

Major Milestones

  • DNS Research Update
    • Participation in DITL 2009 large-scale simultaneous DNS root data collection event.
    • kc claffy prepared and distributed CAIDA's DNS Research Updates to the attendees of the DNS RSSAC meeting in March 2009.

Student Involvement

CAIDA hosted the following visiting graduate student scholars:

  • Mia Zhang, a PhD candidate from Beijing Jiatung University, China; and
  • Wolfgang John, a PhD candidate from Chalmers University of Technology, Sweden.

Funding Sources

Our DNS research received support from NSF grant (SCI-0427144) DNS-ITR: "Improving the Integrity of Domain Name System (DNS) Monitoring and Protection" (though it ended early this year).

Our traffic research received indirect support from the following institutions through their generous sponsorship of capable graduate students to visit our lab to collaborate on research.


Data Sharing for Security

Goals

CAIDA recognizes the UCSD Network Telescope, a passive data collection system focused on a globally routed /8 network that carries almost no legitimate traffic, as a unique resource whose data may provide insights for network security researchers. Because we can easily separate the legitimate traffic from the incoming packets, the network telescope provides us with a monitoring point for anomalous traffic that represents almost 1/256th of all IPv4 destination addresses on the Internet.

Because a network telescope (also known as a blackhole, an Internet sink, or a darknet) does not contain any real computers, the monitor does not capture legitimate traffic, but rather communications that results from wide range of events, including misconfiguration (e.g. a human being mis-typing an IP address), malicious scanning of address space by hackers looking for vulnerable targets, backscatter from random source denial-of-service attacks, and the automated spread of malicious software (worms).

To deliver such data to the research community requires technology to accomplish the data capture and further requires policy infrastructure to protect the rights and avoid risk to stakeholders. CAIDA spent much effort in 2009 on building the policy infrastructure and data sharing framework required to enable the sharing of the data we capture with the network security researcher community.

Activities

  1. UCSD Network Telescope
    • In line with our mission to foster a collaborative environment for data acquisition and sharing, we made Two Days of data in November from our network telescope available to researchers.
    • Reports in late November 2008 of a worm outbreak drew our attention to our telescope to look for evidence. We published a web report, "Conficker/Conflicker/Downadup worm as seen from the UCSD Network Telescope" that includes background information on the worm, a description of the scanning behavior we observed, heuristics for determining which packets were likely associated with the Conficker worm, some animations that show the growth of TCP/445 scanning globally, and some correlations we observed with other data sources.
    • Development and refinement of measurement and analysis tools for one-way unsolicited traffic monitoring.
  2. Data Sharing Policy Development

Major Milestones

Funding Sources

Our research in data sharing for security comes from DHS contract, (NBCHC 070133) "Supporting Research and Development of Security Technologies through Network and Security Data Collection".


Economics, Ownership, and Trust

Goals

Our dependence on the Internet for our professional, personal, and political lives has rapidly grown much stronger than our comprehension of its underlying structure, performance limits, dynamics, and evolution. In light of recent milestones in regulatory policy, our understanding of the underlying economic forces and dynamics of the Internet is of increasing relevance.

To follow up on our several years of work studying IPv4 exhaustion and IPv6 deployment (or lack thereof) in response to RIR needs, in 2009 we offered a draft recommendation for an IPv4 exhaustion research agenda (none of which, so far as we know, have been pursued). We also responded to requests from government agencies and policymaking bodies (including the FCC, DHS, FTC) for comments and positions on inform policy with the best available empirical data. As society recognizes the need for an equitable way to pay for this new communications infrastructure, policymakers will need metrics to more effectively describe, and policies for more transparently reporting on, infrastructure penetration, performance, peering, and prices for bit transmission services.

Activities

  1. In March, PI kc claffy presented "Broadband Conditions" at the NTIA Broadband Technology Opportunities Program meeting. The video is also available at the meeting URL under March 23, 2009 "Show Session 1: Roundtable on Nondiscrimination and Interconnection Obligations", and a text transcript of that session 1 is available on the NTIA website. kc claffy followed up with a posting of the Top ten ($7.2B) broadband stimulus: ideal conditions
  2. kc claffy presented "Ten Things the FCC Should Know about the Internet" to the Federal Communications Commission in Washington D.C. on May 29, 2009.
  3. Early in the year, we put forth a proposal for an ICANN/RIR scenario planning exercise to conduct a more structured conversation according to established discipline of scenario planning. While this never happened, later in the year, on September 23, 2010 CAIDA, in collaboration with Georgia Tech, hosted the 1st Workshop on Internet Economics via web videoconference. The event made use of the electronic conference hosting facilities supported by the California Institute of Technology (CalTech) EVO Collaboration Network. The goal of this workshop was to bring together researchers, commercial Internet facilities and service providers, technologists, theorists, policy makers, RIR stakeholders, and pundits of Internet economics to try to frame a concrete and useful research agenda for the emerging but stunted field of Internet infrastructure economics. We published the final report in ACM SIGCOMM Computer Communication Review (CCR), April 2010. Vol 40, no. 2, pp. 55-59.
  4. In October, Dmitri Krioukov presented "Evolution of the Internet Ecosystem" at the Southern California Symposium on Network Economics and Game Theory (SoCal NEGT). The paper was also presented at The First International Conference on Complex Sciences: Theory and Applications (Complex'2009), and published in the European Physical Journal B, vol. 74, no. 2, March 2010, pp. 271-278.

Major Milestones

Funding Sources

Our economics research received support from:


Infrastructure Projects


Archipelago (Ark): A Coordination-Oriented Measurement Infrastructure

Goals

On September 12, 2007, our next generation active measurement infrastructure, Archipelago (Ark) began collecting its first production data as part of the IPv4 Routed /24 Topology Dataset, using the scamper probing tool. Ark provides the hardware and software infrastructure for the Macroscopic Topology Project and replaces the previous skitter-based infrastructure. Ark achieves greater scalability and flexibility than the previous measurement infrastructure and provides steps toward a community-oriented network measurement infrastructure intended to support vetted measurement tasks on a dedicated distributed platform.

Ark's uniquel design considers coordination the fundamental activity of a measurement infrastructure. Coordination allows the many pieces of the infrastructure to work together efficiently toward a common goal and is necessary to enable collaborative use of the infrastructure by multiple researchers. Archipelago utilizes Marinda, a coordination facility inspired by David Gelernter's tuple-space based Linda coordination language. Archipelago extends Gelernter's tuple space model with features needed to support a globally distributed measurement infrastructure that hosts heterogeneous measurements by a community of researchers.

Activities

  1. The Archipelago (Ark) Project expanded its infrastructure scope in 2009, from 30 monitors in 21 countries at the end of 2008 to 41 monitors in 25 countries at the end of 2009. We also implemented IPv6 measurements on 10 Ark boxes, and a prototyped a systemwide topology-measurement-on-demand service.
  2. We improved our infrastructure for meta-data annotations of Autonomous Systems and IP addresses, augmenting it with DNS data.
  3. Building on our study of existing state-of-the-art IP address alias resolution technology, we did research, development, and evaluation of probing and inference algorithms to resolve independent IP addresses into the same physical device (router). We are planning to publish the results of this work in 2010.
  4. Our team-probing application uses scamper as its primary active measurement topology tool. Developed by Matthew Luckie, it supports IPv4 & IPv6, TCP-, UDP-, and ICMP traceroutes, ping, path MTU discovery, fine-grained multiplexing of destination lists, programmatic control via socket, warts format files with more information than arts++ files including cycle start & end markers and measurement metadata (e.g., probing parameters). We contributed patches to scamper, and several software tools to make it easier to write measurement tools and servers: ScamperDataFeed, ScamperIO. We also implemented a derivative tool based on scamper to enable lighter weight measurements that can still benefit from of some of scamper's functionality.
  5. We implemented persistence in the Marinda tuple space, allowing us to transparently checkpoint and restart the global server without disrupting ongoing experiments. We wrote extensive Marinda installation and programming guides and shared the software with collaborators for evaluation and feedback before we release it more broadly.
  6. In collaboration with Rob Beverly of the Naval Postgraduate School, we developed software support to enhance the spoofer project, which used Ark to globally expand its measurement of source address validation and filtering. Using Ark's distributed infrastructure and approximately 12,000 active measurement clients, our measurements revealed little improvement over four years of measurement. 80% of the source address filters we observed were implemented a single IP hop from sources, with over 95% of blocked packets observably filtered within the source's autonomous system. Our results were published and presented at IMC2009 in ``Understanding the Efficacy of Deployed Internet Source Address Validation Filtering''.

Major Milestones


Tools

CAIDA's mission includes providing access to tools for Internet data collection, analysis and visualization to facilitate network measurement and management. However, CAIDA does not receive specific funding for support and maintenance of the tools we develop. Please check our home page for a complete listing and taxonomy of CAIDA tools.

2009 Tool Development

CoralReef

The CoralReef Software suite, developed by CAIDA, provides a comprehensive software solution for data collect and analysis from passive Internet traffic monitors, in real time or from trace files. Real-time monitoring support includes system network interfaces (via libpcap), FreeBSD drivers for a number network capture cards, including the popular Endace DAG (10GE/OC192, POS and ATM) cards. The package also includes programming APIs for C and perl, and applications for capture, analysis, and web report generation. This package is maintained by CAIDA developers with the support and collaboration of the Internet measurement community.

We released CoralReef version 3.8.6 in June of 2009.

CAIDA Tools Download Report

The table below displays all the CAIDA developed tools distributed via our home page at https://www.caida.org/tools/ and the number of downloads of each version during 2009.

  • Currently Supported Tools

    ToolDescriptionDownloads
    AutofocusTool for generating Internet traffic reports and time-series graphs.324
    CoralReefA software suite to collect and analyze data from passive Internet traffic monitors.818
    CuttlefishProduces animated graphs showing diurnal and geographical patterns.150
    dscA system for collecting and exploring statistics from DNS servers.1,570
    dnsstatAn application that collects DNS queries on UDP port 53 to report statistics.199
    dnstopA libpcap application that displays tables of DNS traffic.8,944
    iffinderOne of several tools to perform alias resolution, to discover IP interfaces belonging to the same router.288
    sk_analysis_dumpA tool for analysis of traceroute-like topology data.164
    WalrusA tool for interactively visualizing large directed graphs in 3D space.3,410
    libseaA file format and a Java library for representing large directed graphs.353
    Chart::GraphA Perl module that provides a programmatic interface to several popular graphing packages. Note: Chart::Graph is also available on CPAN.org. The numbers here reflect only downloads directly from caida.org, as download statistics from CPAN are not available. 91
    plot-latlongA tool for plotting points on geographic maps.248
  • Past Tools (Unsupported)

    ToolDescriptionDownloads
    arts++A binary file format for storing network data.1647
    cflowdA Netflow analysis tool311
    MapnetA tool for visualizing the infrastructure of multiple backbone providers simultaneously.12,215
    GeoPlotA light-weight java applet creates a geographical image of a data set.552
    GTraceA graphical front-end to traceroute.579
    otterA tool used for visualizing arbitrary network data that can be expressed as a set of nodes, links or paths.374
    plotpathsAn application that displays forward and reverse network path data.114
    planktonA tool for visualizing NLANR's Web Cache Hierarchy35

Data

In 2009, CAIDA captured and curated data from three primary sources of network data:

  • macroscopic topology data with the Archipelago infrastructure,
  • passive traffic traces at tier1 OC192 Internet Backbone links,
  • passive traffic traces from the UCSD Network Telescope
We derived several datasets from this data that we make publicly available to researchers. These include our AS Rank, AS adjacencies, and Router adjacencies datasets. In addition we released a Telescope Internet "background radiation" dataset and a Telescope Conficker dataset. Some datasets are made publicly available by CAIDA without restrictions to the user, while access to other datasets is restricted to academic researchers and CAIDA members, with data access subject to Acceptable Use Policies (AUP) designed to protect the privacy of monitored communications, to ensure security of network infrastructure, and to comply with the terms of our agreements with data providers.

Major Milestones

Data Collected in 2009

Data TypeFirst dateLast dateTotal size1
Macroscopic Topology Measurements, IPv4 (Archipelago)2009-01-012009-12-31388 GB (1.2 TB)
Macroscopic Topology Measurements, IPv6 (Archipelago)2009-01-012009-12-31222 MB (784 MB)
Internet backbone Traces2009-01-152009-12-174.8 TB (8.8 TB)
Network Telescope Datasets2009-01-012009-12-3196 GB (233 MB)
"Live" Network Telescope Data2009-01-012009-12-313.2 TB (6.2 TB)2
DNS Names for IPv4 Routed /24 Topology Dataset2009-01-012009-12-314.7 GB (17 GB) 2
DNS root/gTLD RTT Dataset2009-01-012009-12-311.2 GB
1The total size represents actual disk space. If data are stored in compressed form, the uncompressed size is given in brackets.
2The size of these datasets varies over time as we store and serve a rotating window of the last 30 days only.

Data Distributed in 2009

We process raw data into specialized datasets to increase its utility to researchers and to satisfy security and privacy concerns. In 2009, this resulted in the following datasets:

  • Publicly Available Data

    These datasets require that users agree to an Acceptable Use Policy, but are otherwise freely available.

DatasetUnique visitors (IPs)Data Downloaded
AS Rank34194.7 GB
AS Links (AS Adjacencies)3612.40 GB
AS Relationships133618.2 GB
Router Adjacencies287657.5 MB
Witty Worm Dataset94231 MB
AS Taxonomy197104.3 MB*
Code-Red Worms Dataset723.8 GB
We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly undercounting the total volume of data served through our dataservers in 2008, but is necessary because of limitations in dataserver logging combined with abberant user behaviour.
* AS Taxonomy dataset is included in a mirror of the GA Tech main AS Taxonomy site, and thus does not represent all access to this data.
  • Restricted Access Data

    These datasets require that users:

    • be academic or government researchers, or join CAIDA;
    • request an account and provide a brief description of their intended use of the data; and
    • agree to an Acceptable Use Policy.
DatasetUnique visitors (usernames)Data Downloaded *
Anonymized Internet Backbone Traces1294.5 TB
Backscatter Datasets39567 GB
(Raw Topology Traces from Archipelago infrastructure)
351.0 TB
Raw Topology Traces (skitter)211.2 TB
Witty Worm Dataset1171 GB
DNS Names for IPv4 Routed /24 Topology Dataset2819.4 GB
2003 Internet Topology Data Kit192.9 GB
DNS Root/gTLD server RTT Dataset12.7 MB
* We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly undercounting the total volume of data served through our dataservers in 2008, but is necessary because of limitations in dataserver logging combined with abberant user behaviour.
  • Restricted Access Data Requests

    The following table shows some statistics about data requests for CAIDA datasets: the number of requests received, the number of users whose request was granted, and the number of users that actually downloaded data.

    We received about 4% more requests in 2009 then in 2008, and approved 16% more requests for access to restricted datasets. Almost 80% of the users that are granted access actually access our webservers to download data.

DatasetNumber of requests receivedNumber of users granted accessNumber of users that accessed data
Anonymized Backbone and Peering Link Traces242181151
Active Topology Trace Datasets1369063
Backscatter Datasets1016245
Witty Worm Dataset281814
DNS Root/gTLD server RTT Dataset733
Totals514354276

Workshops

As part of our mission to investigate both practical and theoretical aspects of the Internet, CAIDA staff actively attend, contribute to, and host workshops relevant to research and better understanding of Internet infrastructure, trends, topology, routing, and security. Our web site has a complete listing of past and upcoming CAIDA Workshops.

ISMA 2009 AIMS - 1st Workshop on Active Internet Measurements

On February 12-13, 2009, CAIDA hosted the 1st Active Internet Measurements (AIMS) Workshop supporting science and policy in La Jolla, CA. This workshop sought to define priority directions of various active measurement infrastructures especially aimed at macroscopic security, stability, and performance measurements.

2nd CAIDA/WIDE/CASFI Workshop

The 2nd CAIDA/WIDE/CASFI Workshop was held on April 4-5, 2009 in Seoul, South Korea. This workshop continued a tradition of workshops supporting a three-way collaboration between researchers from CAIDA (USA), WIDE (Japan) and CASFI (South Korea). The main areas of the Workshop are: Internet measurement projects, analysis of data to reveal current Internet trends, and DNS research. The Workshop will also cover miscellaneous research and technical topics of mutual interest for CAIDA, WIDE and CASFI participants.

1st Workshop on Internet Economics (WIE'09)

The 1st Workshop on Internet Economics (WIE'09) hosted by CAIDA and Georgia Tech was held on September 23, 2009 by web videoconference. The goal of this workshop was to bring together researchers, commercial Internet facilities and service providers, technologists, theorists, policy makers, RIR stakeholders, and pundits of Internet economics to try to frame a concrete and useful research agenda for the emerging but stunted field of Internet infrastructure economics.


Publications

The following table contains the papers published by CAIDA for the calendar year of 2009. Please refer to Papers by CAIDA on our web site for a comprehensive listing of publications.

YearMonthAuthor(s)TitlePublication
2009Nov
  1. Beverly, Robert
  2. Berger, Arthur
  3. Hyun, Young
  4. claffy, kc
Understanding the Efficacy of Deployed Internet Source Address Validation FilteringACM Internet Measurement Conference (IMC)
2009Oct
  1. Gringoli, Francesco
  2. Salgarelli, Luca
  3. Dusi, Maurizio
  4. Cascarano, Niccolo
  5. Risso, Fulvio
  6. claffy, kc
GT: picking up the truth from the ground for Internet trafficACM SIGCOMM Computer Communication Review (CCR)
2009Oct
  1. Kenneally, Erin
  2. claffy, kc
An Internet Data Sharing Framework For Balancing Privacy and UtilityEngaging Data: First International Forum on the Application and Management of Personal Electronic Information
2009Oct
  1. claffy, kc
  2. Fomenkov, Marina
  3. Katz-Bassett, Ethan
  4. Beverly, Robert
  5. Cox, Beverly
  6. Luckie, Matthew
The Workshop on Active Internet Measurements (AIMS) ReportACM SIGCOMM Computer Communication Review (CCR)
2009Oct
  1. Dimitropoulos, Xenofontas
  2. Krioukov, Dmitri
  3. Vahdat, Amin
  4. Riley, George
Graph Annotations in Modeling Complex Network TopologiesACM Transactions on Modeling and Computer Simulation (TOMACS)
2009Sep
  1. Papadopoulos, Fragkiskos
  2. Krioukov, Dmitri
  3. Boguñá, Marián
  4. Vahdat, Amin
Greedy Forwarding in Scale-Free Networks Embedded in Hyperbolic Metric SpacesACM SIGMETRICS Performance Evaluation Review
2009Sep
  1. Krioukov, Dmitri
  2. Papadopoulos, Fragkiskos
  3. Vahdat, Amin
  4. Boguñá, Marián
On Curvature and Temperature of Complex NetworksPhysical Review E
2009Sep
  1. Jamakovic, Almerima
  2. Mahadevan, Priya
  3. Vahdat, Amin
  4. Boguñá, Marián
  5. Krioukov, Dmitri
How Small Are Building Blocks of Complex NetworksarXiv physics.soc-ph/0908.1143
2009Jul
  1. Papadopoulos, Fragkiskos
  2. Krioukov, Dmitri
  3. Boguñá, Marián
  4. Vahdat, Amin
Greedy Forwarding in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric Spaces: Technical ReportCooperative Association for Internet Data Analysis (CAIDA)
2009Mar
  1. claffy, kc
  2. Hyun, Young
  3. Keys, Ken
  4. Fomenkov, Marina
  5. Krioukov, Dmitri
Internet Mapping: from Art to ScienceIEEE DHS Cybersecurity Applications and Technologies Conference for Homeland Security (CATCH)
2009Feb
  1. Boguñá, Marián
  2. Krioukov, Dmitri
Navigating ultrasmall worlds in ultrashort timePhysical Review Letters
2009Feb
  1. Shakkottai, Srinivas
  2. Fomenkov, Marina
  3. Koga, Ryan
  4. Krioukov, Dmitri
  5. claffy, kc
Evolution of the Internet AS-Level EcosystemInternational Conference on Complex Sciences (Complex)
2009Jan
  1. Boguñá, Marián
  2. Krioukov, Dmitri
  3. claffy, kc
Navigability of Complex NetworksNature Physics

Presentations

The following table contains the presentations and invited talks published by CAIDA for the calendar year of 2009. Please refer to Presentations by CAIDA on our web site for a comprehensive listing.

YearMonthPresenters(s)TitleVenue
2009 Dec
  1. claffy, kc
Historical and Architectural Context for Traffic Management Needs Today FCC Technical Advisory Process Workshop
2009 Nov
  1. claffy, kc
Archipelago Measurement Infrastructure Africa-Asia Forum Workshop
2009 Oct
  1. Krioukov, Dmitri
Hyperbolic geometry of complex networks USC Center for Applied Mathematical Sciences Colloquia
2009 Oct
  1. Kenneally, Erin
An Internet Data Sharing Framework For Balancing Privacy and Utility Engaging Data: First International Forum on the Application and Management of Personal Electronic Information
2009 Oct
  1. Krioukov, Dmitri
Evolution of the Internet Ecosystem Southern California Symposium on Network Economics and Game Theory (SoCal NEGT)
2009 Sep
  1. claffy, kc
Leveraging the Science and Technology of Internet Mapping for Homeland Security DHS Cybersecurity PI Meeting
2009 Aug
  1. claffy, kc
CAIDA participation in PREDICT DHS PREDICT PI Meeting
2009 Jun
  1. Papadopoulos, Fragkiskos
Greedy Forwarding in Scale-Free Networks Embedded in Hyperbolic Metric Spaces ACM SIGMETRICS MAMA
2009 Jun
  1. Krioukov, Dmitri
dK-series and hidden hyperbolic metric spaces Telefonica Research
2009 Jun
  1. Kenneally, Erin
Belmont Report Overview DHS Directorate for Science and Technology Workshop
2009 May
  1. Kenneally, Erin
Legal Issues in Network Research: Determining Content in Web-Browsing Communications DHS Directorate for Science and Technology Workshop
2009 May
  1. claffy, kc
Ten Things the FCC Should Know about the Internet Federal Communications Commission (FCC)
2009 Apr
  1. Huffaker, Bradley
CAIDA's Topology Updates and Analysis WIDE-CASFI
2009 Apr
  1. Fomenkov, Marina
CAIDA Report WIDE-CASFI
2009 Apr
  1. Huffaker, Bradley
DatCat: Lessons Learned WIDE-CASFI
2009 Apr
  1. Brownlee, Nevil
netmap: the mini-Ark project WIDE-CASFI
2009 Apr
  1. Zhang, Min
State of the Art in Traffic Classification WIDE-CASFI
2009 Apr
  1. Zhang, Min
A Measurement-Based Study of Xunlei WIDE-CASFI
2009 Apr
  1. Krioukov, Dmitri
Hidden Metric Spaces and Navigability of Complex Networks NeTS FIND
2009 Mar
  1. claffy, kc
DNS Research Update from CAIDA DNS RSSAC Meeting
2009 Mar
  1. claffy, kc
Broadband Conditions NTIA Broadband Technology Opportunities Program
2009 Mar
  1. Hyun, Young
Internet Visualization with Walrus Simulation Interoperability Workshop
2009 Feb
  1. Krioukov, Dmitri
Evolution of the Internet AS-Level Ecosystem International Conference on Complex Sciences (Complex)
2009 Feb
  1. Keys, Ken
IP-to-Router Mapping Techniques ISMA Workshop on Active Internet Measurements (AIMS)
2009 Feb
  1. Hyun, Young
Archipelago Measurement Infrastrucure: Updates and Analyses ISMA Workshop on Active Internet Measurements (AIMS)
2009 Jan
  1. Aben, Emile
Conficker ISOI

Web Site Usage

In 2009, CAIDA's web site continued to attract considerable attention from a broad, international audience. The portion of the increased traffic in the earlier half of the year can be attributed to attention following our release of the "Conficker/Conflicker/Downadup worm as seen from the UCSD Network Telescope" web report.

The table below presents the monthly history of traffic to www.caida.org for 2009. To show a more accurate representation of website traffic, these statistics do not include traffic from spiders, crawlers or other robots.



MonthUnique visitorsNumber of visitsPagesHitsBandwidth (GB)
Jan 200938,52164,600203,6541,155,04940.14 GB
Feb 200941,43667,709208,3711,246,72146.61 GB
Mar 200942,99670,676229,0581,338,53864.77 GB
Apr 200940,48664,965209,2671,183,41844.84 GB
May 200933,77757,081201,4411,013,89637.83 GB
Jun 200934,25357,040174,4551,105,56643.86 GB
Jul 200931,55254,348178,023954,57238.93 GB
Aug 200933,14554,392170,5971,040,29637.07 GB
Sep 200936,82462,205200,5981,034,68442.82 GB
Oct 200939,74267,003220,9841,080,40752.81 GB
Nov 200935,26258,451254,2901,007,64446.47 GB
Dec 200931,37753,110204,633868,40235.87 GB
Total439,371731,5802,455,37113,029,193532.02 GB


Organizational Chart

CAIDA would like to acknowledge the many people who put forth great effort towards making CAIDA a success in 2009. The image below shows the functional organization of CAIDA. Please check the home page For more complete information about CAIDA staff.

[Image of CAIDA Functional Organization Chart]

CAIDA Functional Organization Chart


Funding Sources

CAIDA thanks our 2009 sponsors, members, and collaborators.

The charts below depict funds received by CAIDA during the 2009 calendar year.

Funding SourceAllocationsPercentage of Total
DHS331,81631%
DOI488,83046%
NSF17,1602%
GIFT200,00019%
CSE27,7163%
Total1,065,522100%

Figure 1. Allocations by funding source received during 2009.


Operating Expenses

The charts below depict CAIDA's Annual Expense Report for the 2009 calendar year.

LABORSalaries and benefits paid to staff and students
IDCIndirect Costs paid to the University of California, San Diego including grant overhead (52-54%) and telephone, Internet, and other IT services.
SUBCONTRACTS There were no subcontracts in 2009.
TRAVELTrips to conferences, PI meetings, operational meetings, and sites of remote monitor deployment.
SUPPLIES & EXPENSESAll office supplies and equipment (including computer hardware and software) costing less than $5000.
EQUIPMENTComputer hardware or other equipment costing more than $5000.
TRANSFERSExchange of funds between groups for recharge for IT desktop support and Oracle database services.
Program AreaExpensesPercentage of Total
Labor1,413,58760%
IDC769,47433%
Subcontract00%
Travel62,4273%
Supplies & Expenses69,5503%
Equipment27,5861%
Transfers6,0460%
Total2,293,238100.0%

Figure 2. 2009 Operating Expenses

These numbers do not include salaries or expenses paid by the Computer Science & Engineering Department of the Jacobs School of Engineering at the University of California, San Diego.


Program AreaExpensesPercentage of Total
DNS407,44717.3%
Infrastructure909,44238.7%
Routing399,36017.0%
Policy36,1331.5%
Topology550,30423.4%
Outreach45,9852.0%
Total2,348,671100.0%

Figure 3. 2009 Expenses by Program Area

  Last Modified: Tue Oct-13-2020 22:21:53 UTC
  Page URL: https://www.caida.org/home/about/annualreports/2009/index.xml