Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis
CAIDA's Annual Report for 2008
A report on CAIDA research initiatives, project progress and results, data sets, tool development, publications, presentations, workshops, web site statistics, funding sources, and operating expenses for 2008.

Mission Statement: CAIDA investigates both practical and theoretical aspects of the Internet, with particular focus on topics that:

  • provide insight into the macroscopic function of Internet infrastructure, behavior, usage, and evolution
  • foster a collaborative environment in which data can be acquired, analyzed, and (as appropriate) shared
  • improve the integrity of the field of Internet science
  • inform science, technology, and communications public policies

Contents


Executive Summary

This annual report covers CAIDA's activites in 2008, summarizing highlights from our research, infrastructure, and outreach activities. Our current research projects, funded by the U.S. National Science Foundation (NSF) and the Department of Homeland Security (DHS), include several measurement-based studies of the Internet's core infrastructure, with focus on the health and integrity of the global Internet's topology, routing, addressing, and naming systems. We made fundamental advances in several of our research projects this year, supported by increased coverage by our measurement infrastructure, and increased collaborations with colleagues around the world. We completed the first full calendar year of a continous provisioning of the most comprehensive annotated view of IPv4 topology thus far. We scientifically discerned which IPv4 topology probing method worked the best, and began to integrate and optimize our IP alias resolution techniques for large graphs. We also began to deploy IPv6 Ark nodes, and early IPv6 probe destination lists.

Some of our topology research focused on how different routing approaches in nature are maximally efficient on certain types of peculiarly structured topologies, conveniently, those structured like the Internet AS graph. Further, we found that self-similarity of clustering in real complex networks provides strong empirical evidence that some hidden metric spaces underlie these networks. In trying to model self-similar (scale-free) networks embedded into such a hidden space, we discover that a certain approach to routing -- greedy routing -- is phenomenally successful and efficient in such a model. We are still exploring the ramifications of this intense discovery, and the even more intriguing breakthrough that this hidden space seems to be hyperbolic. Our research into network growth dynamics also yielded two papers with surprising results about different regimes of network growth: (1) that there may be a vast pre-asymptotic regime of complex network growth that gives rise to power-law like effects in degree distribution; (2) a simple customer-provider-based modification of the preferential attachment model can account for Internet topology evolution, including the ISP consolidation toward monopoly.

Per our mission, our infrastructure activities aim to narrow the growing gap that impedes the field of network research, as well as telecommunications policy and infrastructure sustainability: a dearth of available empirical data on the public Internet since the infrastructure privatized in the mid-1990s. In 2008 we continued to maintain a catalog of Internet measurement data sets, contributed to and used the (DHS-funded) PREDICT repository of datasets to support cybersecurity research, and developed and deployed new active and passive measurement infrastructure. We continued expanding our newest active measurement infrastructure, now collecting the most comprehensive set of IPv4 topology measurements ever made available to researchers, enhanced with DNS information. Our data repository includes weekly archives of complete Internet AS-level topologies enriched with AS relationship information, and weekly updates of AS (ISP) rankings. We also coordinated and analyzed another DITL's worth of data, and wrote a few web pages and published a paper in CCR; hopefully someone else will pick up DITL next year since we've not had dedicated funding for it yet.

We also led and participated in tool development to support analysis, indexing, and dissemination of Internet infrastructure data. Highlights include updates to our real-time report generator of passively observed traffic, geographical visualizations of DNS workload to a given set of servers, updates to our IPv4 and IPv6 AScore posters, and visual maps of IPv4 address space consumption.

In 2007 (annual report) CAIDA began to expand its scope to include economics and policy research. Our notable contribution in 2008 was a set of blog entries that became a short Internet research tutorial for lawyers. Finally, we engaged in a variety of outreach activities, including web sites, peer-reviewed papers, technical reports, presentations, blogging, animations, and workshops. Details of our activities are below. CAIDA's program plan for 2007-2010 is available at https://www.caida.org/home/about/progplan/progplan2007/. Please do not hesitate to send comments or questions to info at caida dot org.


Research Projects


Topology

Macroscopic Topology Measurements, Analysis, and Modeling

Goals

CAIDA's topology research agenda is focused on three strategic areas: 1) macroscopic topology measurement; 2) analysis of the observable AS-level and router-level hierarchy; 3) topology modeling in support of routing research.

Activities

In 2008 CAIDA made steady progress in all three of its topology research focus areas.

  1. Macroscopic Topology Measurement:
    1. We continued large-scale macroscopic topology measurements using our set of monitors distributed worldwide and coordinated by Archipelago (Ark), our state-of-the-art global measurement platform. We are aware of the gaps in geography and topology coverage -- still not well-quantified by researchers -- induced by the relatively small number of vantage points. Yet with 30 Ark monitors deployed in 21 countries by the end of 2008, we achieve our most comprehensive view of IPv4 topology thus far, completing the first full calendar year of the IPv4 Routed /24 Topology Dataset.

    2. On December 12, 2008, we began to use the Ark infrastructure to collect IPv6 topology data. We released the IPv6 Topology Dataset for researchers to get a view of the nascent IPv6 global topology as seen by six Ark monitors. More IPv6 topology data will be available in 2009.

    3. Led by Matthew Luckie in the WAND research group at the University of Waikato, we conducted experiments to see which traceroute probing methods captured the most topology information and published our results in "Traceroute Probe Method and Forward IP Path Inference" in IMC '08. We also released the corresponding Traceroute Probe Method 2008-08 Dataset.

    4. We implemented a new technique for alias resolution measurements on Ark platform. Our new tool kapar is a scalable version of APAR developed by M. Gunes and K. Sarac at the University of Texas. Using publicly available data from four networks, we tested the efficiency and veracity of various combinations of alias resolution methods including iffinder, TTL measurements, and kapar. We published our findings as a CAIDA technical report, "IP Alias Resolution Techniques". A paper is in preparation.

  2. Analysis of the Observable Topology:
    1. We continue to annotate the IPv4 topology graph with automated DNS reverse lookups of IP addresses discovered by the probes.

    2. We continued work on techniques to accurately annotate Internet topologies based on observations and inference of Internet structural and commercial characteristics. Our efforts focused on the AS-level Internet with AS links annotated by business relationships between ASes. We infer these relationships, recognizing their bidirectional nature, and annotate each link as a customer-provider or a peer-to-peer (settlement-free interconnection) relationship. The paper "Graph Annotations in Modeling Complex Network Topologies" was accepted for publication in ACM Transactions on Modeling and Computer Simulation (TOMACS).

    3. We created new versions of our popular AS Core Graph visualizations for both IPv4 and IPv6. The 2008 IPv4 graph was the first to make use of the topology data collected on the new Ark platform. Because we did not have enough IPv6 support in the Ark infrastructure, the 2008 IPv6 graph relied on data collected by volunteers responding to a request sent to the North American Network Operators' Group (NANOG) mailing list. We will have semi-automated IPv6 topology discovery running on Ark early in 2009.

    4. Several users of CAIDA's AS relationship inference data asked us why it contained AS relationship cycles, e.g., cases where AS A is a provider of AS B, B is a provider of C, and C is a provider of A, or other cycle types. We published a paper, "On Cycles in AS Relationships" in ACM SIGCOMM Computer Communications Review (CCR), v.38, n.3, p.102-104, 2008 that provides our answers.

  3. Topology Modeling:
    1. We demonstrated that the self-similarity of some scale-free networks with respect to a simple degree-thresholding renormalization scheme finds a natural interpretation in the assumption that network nodes exist in hidden metric spaces. Clustering, i.e., cycles of length three, plays a crucial role in this framework as a topological reflection of the triangle inequality in the hidden geometry. We prove that a class of hidden variable models with underlying metric spaces are able to accurately reproduce the self-similarity properties that we measured in the real networks. Our findings indicate that hidden geometries underlying these real networks are a plausible explanation for their observed topologies and, in particular, for their self-similarity with respect to the degree-based renormalization. We published our results in "Self-similarity of Complex Networks and Hidden Metric Spaces" in Physical Review Letters, v. 100, 078701, 2008.

    2. We studied the paradox associated with networks growing according to super-linear preferential attachment: super-linear preference cannot produce scale-free networks in the thermodynamic (asymptotic) limit, but there are super-linearly growing network models that perfectly match the scale-free structure of some real networks, including as the Internet. We demonstrated that a super-linearly growing network model can reproduce, in its pre-asymptotic regime, the structure of a real network, if the model captures some sufficiently strong structural constraints, e.g., rich-club connectivity. These findings suggest that real scale-free networks of finite size may exist in pre-asymptotic regimes of network evolution processes that lead to degenerate network formations in the thermodynamic limit. We published our results in "Scale-free networks as pre-asymptotic regimes of super-linear preferential attachment" in Physical Review E, v.78, 026114, 2008.

    3. In collaboration with S. Shakkottai from Texas A&M University we refined our model of Internet growth which attempts to explain preferential attachment based on economic realities of the AS-level Internet. We simulated a growing AS-level topology and annotated links in the model graph with AS relationships (customer-provider or peer-to-peer). We compared the degree distributions for all different types of nodes in the simulated topology to those observed in measured Internet topologies, and found that the distributions are similar. To our knowledge, this is the first Internet evolution model that is realistic, analytically tractable, and entirely based on physical, that is, measurable, parameters. The paper was accepted for publication in The First International Conference on Complex Sciences: Theory and Applications (Complex 2009).

Major Milestones

Student Involvement

A. Jamakovic (TU Delft) continued her work from 2007 on application of the dK-series methodology to study randomness of various complex networks, which we hope to publish in 2009.

Funding Sources

This research received support from:


Routing

Greedy Routing on Hidden Metric Spaces as a Foundation of Scalable Routing Architectures without Topology Updates

Goals

CAIDA's research in Internet routing continued to focus on two related topics: greedy routing based on hidden metric spaces underlying real networks; and the relationship between routing efficiency and the structure of the network topology. Leveraging a decade of CAIDA institutional knowledge of topology discovery, collection, and analysis, we have a bold research objective: a far-reaching solution to the routing scalability problems of today's Internet. But our work in this area has profound implications for network science in other disciplines (physics, biology, chemistry, social sciences).

To foster our research goals, we developed and pursued the following step-by-step program:

  • Obtain empirical evidence that hidden spaces do underlie complex networks and that they are metric;
  • Identify navigability mechanisms that influence the efficiency of greedy routing in complex networks;
  • Find the basic geometrical and topological properties of hidden spaces that make them maximally congruent with respect to the identified navigability mechanisms;
  • Obtain empirical evidence that hidden spaces underlying real networks do possess these properties; and
  • Find mappings of nodes in real networks to the identified spaces or their models.

Activities

  1. We studied the process of routing information through networks as a universal phenomenon existing in both natural and man-made complex systems. In many complex networks found in nature, nodes communicate efficiently even without full knowledge of global network connectivity. We demonstrated that the peculiar structural characteristics of observable complex networks is consistent with maximizing communication efficiency when using greedy routing approaches without global knowledge. We also described a general mechanism that explains this connection between network structure and function. The paper "Navigability of complex networks" was published in Nature Physics online.

  2. We began the follow-on work to the above, and submitted for publication a paper, "Efficient Navigation in Scale-Free Networks Embedded in Hyperbolic Metric Spaces". This paper shows that the hierarchical structure of complex networks is congruent with negatively curved geometries hidden beneath observed topologies, i.e., the hidden metric space is hyperbolic. Mapping nodes to these hidden metrics leads to scale-free topologies in the observable network, and even more pleasantly surprising, greedy routing on this embedding can achieve 100% reachability and optimal paths. The question remains as to whether we can find hidden metric spaces to map and better navigate real world networks such as the Internet.

Major Milestones

Funding Sources

Our routing research received support from:


DNS

Improving the Integrity of Domain Name System (DNS) Monitoring and Protection

Goals

CAIDA researchers conduct DNS measurements and develop tools, models, and analysis methodologies for use by DNS operators and researchers.

Activities

  1. Measurements of traffic at the DNS Root Servers
    1. During our January 2008 CAIDA/WIDE workshop, participants discussed the Day in the Life of the Internet (DITL) project, reflected on the lessons and results of the 2007 collection event, and compiled a sample list of the top research questions and the corresponding data that researchers would like to procure from the DITL project.

    2. In collaboration with ISC and OARC, we held the third large-scale simultaneous "Day in the Life of the DNS Root Servers" data collection event on March 18-19, 2008 (DITL 2008). We captured tcpdump traces at nearly all anycast instances of the A, C, E, F, H, old-J, K, L, old-L, and M root servers and from two alternative Open Root Server Network (ORSN) servers. In comparison with DITL 2007, the total amount of data doubled. This unique dataset represents the most comprehensive measurements of the root servers to date, and provides researchers with unprecedented insight into the root server workload characteristics and performance. summary of the collection event, and cataloged the data into DatCat. These data are available to the research community via the DNS-OARC. Academic researchers can participate in the DNS-OARC for free.

    3. We began analysis of the DNS root server data collected during the DITL 2008 and presented our findings at NANOG42 and at the DNS-OARC 2008 DNS Ops Workshop in Brooklyn. We published a paper, "A Day at the Root of the Internet" in the ACM Computer Communications Review, v. 38, pp.41-46, 2008.

    4. Sebastian Castro (a visiting student from Chile) collaborated with CAIDA researchers to analyze DNS data collected during the 2006, 2007, and 2008 DITL events. He focused on the "heavy hitters" analysis, which raised more questions than we answered. The sources of heavy pollution (queries that cannot possibly be appropriate) change over time, often associated with application software that is lazy (laissez-faire) about managing its own DNS traffic. One clear pattern is continuous growth -- there is an order of magnitude more pollution (invalid queries) at the roots than valid queries, and the number of invalid queries grows faster than the number of valid queries. Perhaps more importantly, there is no organization who has the incentive and capital to spend to fix this pollution. The root cause of a lot of it has to do with writing lazy software because it is cheaper.

    5. We published a paper "Influence Maps - a novel 2-D visualization of massive geographically distributed data sets" in the Internet Protocol Forum in October, 2008. In this paper, we present a novel visualization technique -- the Influence Map -- which renders a compressed representation of geospatially distributed Internet data.

  2. Other DNS measurements:
    1. To complement the IPv4 Routed /24 Topology Dataset, in March 2008, we began using our custom-built bulk DNS lookup service to resolve the fully-qualified domain names for IP addresses seen by our monitors. We make these names available in the DNS Names for IPv4 Routed /24 Topology Dataset.

    2. Duane Wessels, who became director of DNS-OARC in June 2008, continued our open resolvers survey and posted daily reports that identify open DNS resolvers. These resolvers represent a dangerous vulnerability to Internet users since they allow resource squatting, are easy to poison, and can be used in widespread Distributed Denial of Service (DDoS) attacks.

    3. In collaboration with Prof. N. Brownlee (University of Auckland (UA), New Zealand), we maintained NeTraMet traffic meters installed at various locations in the US, New Zealand, and Japan, and continued monitoring requests to, paired with responses from, root/gTLD servers generated by large campus/enterprise networks. The resulting longitudinal dataset is available for researchers. It contains information useful for evaluating performance conditions and trends on the global Internet, although note that DNS RTTs are influenced by several factors, including remote server load, network congestion, route instability, and local effects such as link or equipment failures.

Major Milestones

Funding Sources

This research received support from:


Economics and Policy

Research and Analysis on Internet Protocol version 6 (IPv6)

Goals

In the face of exhaustion of the Internet Assigned Numbers Authority (IANA) IPv4 address resources in the next several years, CAIDA seeks salient data and objective quantitative analysis that will inform the development of address allocation policies that will accommodate continued growth and innovation of the Internet. We hope to foster discussion of scenarios that accommodate four realties: 1) the current Internet has become critical infrastructure for governments, organizations, and individuals throughout the world, 2) the Internet (on any timeline) requires an upgrade to a more scalable and sustainable addressing solution 3) the fact that any such solution requires an infusion of capital and skilled labor, and 4) the major organizations currently associated with ownership, maintenance, and upgrade of such Internet infrastructure do not currently enjoy resources that would allow for such investments in the required upgrades. All four S's -- security, scalability, sustainability, and stewardship -- in one messy problem.

Activities

  1. In 2008, CAIDA worked with ARIN to collect and analyze information on IPv6 uptake. We conducted two surveys; The March 2008 survey went to respondents from the ARIN region, a September 2008 survey collated responses from all regions. Claffy presented these results at the April and October ARIN Public Policy Meetings, and also presented remotely to the Internet Society's (ISOC) Advisory Council in November.

  2. Claffy chaired a panel at TPRC's 36th Research Conference on Communication, Information, and Internet Policy hosted by the Center for Technology and the Law, George Mason University Law School, Arlington, Virginia in September 2008.

Major Milestones

Student Involvement

REU student Jennifer Hsu worked with CAIDA staff to produce A History of Internet Infrastructure Ownership, but gave up on it until we get a better source of available data.

Funding Sources

This research received support from a gift made by the American Registry for Internet Numbers (ARIN).


Cooperative Measurement and Modeling of Open Networked Systems (COMMONS)

Goals

We continued activity on the COMMONS project, mostly in learning what policy changes or support would be required to support such an experiment. Last year we proposed that NLR and/or Internet2 offer measurement technology and connectivity to community networks in exchange for opt-in access to measurement of the resulting interdomain network and its costs. This year we shifted our efforts on trying to educate enough people in the communications policy community about Internet technology problems so we can have a more interdisciplinary conversation next year.

Activities

  1. In collaboration with co-author Sascha Meinrath, we published, "The COMMONS Initiative: Cooperative Measurement and Modeling of Open Networked Systems", in the CommLaw Conspectus: Journal of Communications Law and Policy, Volume 16.2. This article proposes to develop a requirements document and roadmap to support the use of a national OC-192 transit backbone for community wireless networks and other public sector networks to reach each other. This would enable a large-scale, incentive-based network of Internet workload, performance, economic, and behavioral measurement on an unprecedented national, inter-segment, inter-provider scale. First we should talk for a couple of years about how to respect privacy in Internet research.

  2. In March 2008, kc claffy attended a meeting hosted by Google and Stanford Law School - Legal Futures, which inspired a follow set of blog postings and slightly updated pdf on, "Ten Things Lawyers Should Know About the Internet".

Major Milestones

In its second year, the COMMONS project reported the following major milestones.

Student Involvement

We supervised an undergraduate student, Connie Lyu, who worked with Adobe Illustrator to add graphics and layout to the top ten things lawyers should know about the Internet blog entries to produce the booklet version.

Funding Sources

This research was supported by a gift from Cisco Systems, Inc..


Infrastructure Projects


CONMI: Cooperative Network Measurement Infrastructure

Goals

The core objective of the "Community-Oriented Network Measurement Infrastructure" (CONMI) project is to provide needed data sets to the scientific community studying the Internet. To accomplish this goal, CAIDA deploys both active and passive infrastructure to measure a wide cross-section of the Internet and collect and distribute the resulting data.

Activities

In 2008, we continued to focus our efforts on the two tasks described in the proposal: 1) implementing Archipelago, our state-of-the-art, community-oriented, active measurement infrastructure; and 2) deploying monitors capable of collecting passive traces on Internet links, including new monitors for OC192 backbone links and web pages displaying reports from publicly accessible realtime traffic monitors.

  1. Archipelago (Ark): A Coordination-Oriented Measurement Infrastructure
    1. The Archipelago (Ark) Project made progress in 2008 on monitor deployment and software development. By the end of 2008, CAIDA had 30 Ark montiors deployed in 21 countries, including eight monitors in the US. The topology of Archipelago includes much of that of our previous skitter infrastructure. which used PCs located in networks around the world that send measurement results via the Internet to a central server located at CAIDA at the San Diego Supercomputer Center. The new design pays a great deal of attention to communication & coordination, software installation & execution environment, and data storage & management. In 2008, the Ark infrastructure was primarily used for ongoing topology measurements in the IPv4 address space; we will expand its research scope in 2009.

    2. We continue to expand the Ark infrastructure, adding 1-2 monitors per month. Increasing the number of monitors is vital for topology measurements since it reduces the gaps in coverage, decreases the cycle times and allows us to increase the number of traces attempted for each destination /24. The application load gets distributed across the teams and monitors based on resource availability. Locations interested in hosting an Ark monitor, should send a message to ark-info@caida.org.

    3. We added infrastructure for automated IPv4 Routed /24 AS Links Dataset creation, automated ongoing DNS lookup of IP addresses seen in the Routed /24 Topology traces, and tcpdump captures of DNS query/response traffic.

    4. We released the rb-wartslib library that enables warts data processing from Ruby. We also produced numerous scripts to further automate collection, data management, and archival. All tools for downloading and managing collected data stress scalability and fault tolerance.

    5. With a nod toward the future, we implemented a prototypical probing methodology for Internet Protocol version 6 (IPv6) on six nodes of the Ark infrastructure with the requisite IPv6 connectivity. We will expand IPv6 measurements in 2009.

  2. Deployment of Passive Monitors for Trace Collection on Backbone Links
    1. Early in the year, we spent much time and effort dealing with problematic older unsupported hardware hoping to repurpose it, but have not had any resources to upgrade these older systems. Finally, in July 2008, we successfully deployed four passive traffic monitors on high-speed, tier 1, OC192 Internet backbone links. Working with Equinix and a tier 1 ISP, we sited two monitors (four hosts) to tap two bidirectional links, one from Seattle, WA to Chicago, IL and another from San Jose, CA to Los Angeles, CA.

    2. The first 1-hour trace was obtained in March 2008 as a part of our global Internet measurement experiment, 'Day in the Life of the Internet 2008'. The data was anonymized using the Crypto-PAn prefix-preserving anonymization technique and is available for use by vetted researchers. We continue monthly collection of traces at the Equinix facilities.

    3. We improved our CoralReef based traffic report generator that produces publicly accessible realtime reports from the data we capture on traffic monitors. We also publish observed packet size distributions.

Major Milestones

Student Involvement

CAIDA hosted visits of graduate students Alberto Dainotti and Maurizio Dusi who worked on various research tasks for the CONMI project.

Funding Sources

The CONMI project received support from the NSF grant (CRI 05-51542) "Toward Community-Oriented Network Measurement Infrastructure."


PREDICT: Network Traffic Data Repository to Develop Secure IT Infrastructure

Goals

The Protected Repository for the Defense of Infrastructure against Cyber Threats (PREDICT) was designed to provide sensitive security datasets to qualified researchers, while preserving privacy and preventing data misuse. PREDICT seeks to provide a secure technical and policy framework to process applications for data sharing from network providers that include tools for collection, processing, and hosting of data that PREDICT makes available through the program as well as secured infrastructure to support serving datasets to researchers.

Activities

CAIDA's involvement in the PREDICT effort included assisting with development of background pieces of the project, from iterating on NDAs to deploying measurement infrastructure to curation of data. CAIDA acts as a data provider and a data hosting site, serving denial-of-service backscatter data, Internet worm data, Network Telescope data, and IP topology data to approved researchers. We also hired part-time an attorney with experience in cyberlaw, who gave us feedback on our data descriptions and IRB application, described below.

  1. CAIDA continued delivering both active and passive data as a PREDICT data provider. Our Macroscopic Topology data deliverable is the IPv4 Routed /24 Topology Dataset collected on Ark infrastructure (which has replaced previous skitter-based measurements). Our passive data deliverables are monthly hour-long traffic traces captured by two monitors (four hosts) on two bidirectional links, one from Seattle, WA to Chicago, IL and another from San Jose, CA to Los Angeles, CA. We clean, anonymize, and distribute these traces to researchers.

  2. In October 2008, we submitted an application to the UCSD Human Research Protections Program (HRPP) office requesting review of our research protocol by the campus Institutional Review Board (IRB). The application covered the general traffic and other data analysis work we have done for the last 10 years, not including any research involving payload (which we define as anything past the TCP/IP header). Although we expected it to go to a full panel review, our application was given expedited review and approved within 10 days. Since we would like to begin a longer conversation with our IRB regarding appropriate conduct during network research, we plan to submit a follow up application that will propose privacy-respecting payload analysis and we will ask that the application specifically get a full panel review.

  3. We wrote a report describing the landscape of anonymization tools for network data, "summary of anonymization best practice techniques" and created an anonymization bibliography on our web site.

  4. We participated in the first ACM Workshop on Network Data Anonymization (NDA 2008) which convened in Washington DC in association with the 15th ACM Conference on Computer and Communications Security (CCS). The workshop focused on the theory and practice of anonymization as it applies to network data for use by the Internet measurement research community and operators deploying network measurement technologies. PI Dr. Claffy moderated a panel discussion on Economic, Ownership, and Trust Issues in Network Data Sharing. Workshop participants seemed to agree on the need for two documents:

    1. An ethics-based code of conduct for the network measurement community. PREDICT will host a workshop in 2009 to discuss this topic.
    2. A case for legislative change. All three attorneys at the workshop echoed the belief that by the time Internet measurement-related legislation comes up for revision, the network measurement community better have a compelling story for what we want changed and why.

  5. As a result of internal assessment of the research utility and use of the anonymized telescope data balanced against the (small) privacy risk and (large) cost of upkeep, CAIDA decommissioned the current network telescope data collection infrastructure on October 13, 2008 but then had to immediately kickstart it again because of the onset of the Conficker worm. With much help from Professor Stefan Savage and his team in CSE, we are re-implementing the network telescope with a fresh research agenda, data collection and curation methodology, as well as new hardware, in 2009.

  6. We collaborated with Internet2 to draft a proposal for the Network Research Review Council, which would act in some ways like an IRB for the Internet2 community. Although Internet2 is faced with the same privacy, fear of lawsuits, and operational cost issues as commerical providers, we hope the NRRC can help Internet2 navigate these issues to better provide network data to researchers.

Major Milestones

  • Improvements in data collection infrastructure
    • Completed conversion from skitter-based active measurements to the Archipelago platform.
    • Installed, configured, tested and deployed four OC192 monitors.
  • Adding Metadata to Predict Portal for Researchers
    • Submitted four quarters of the Denial-of-Service (DoS) Backscatter Datasets.
    • Submitted Denial-of-Service (DoS) Backscatter-TOCS Dataset.
  • Outreach

Student Involvement

A graduate student Wolfgang John analyzed data collected on the UCSD Network Telescope looking for worm traffic.

Funding Sources

The PREDICT project received support from the DHS contract, (NBCHC 070133) "Supporting Research and Development of Security Technologies through Network and Security Data Collection."


DatCat: Internet Measurement Data Catalog

Goals

CAIDA's Internet Measurement Data Catalog (IMDC) facilitates access, archiving, and long-term storage of Internet data as well as sharing Internet measurement metadata among Internet researchers. Since its launch in June 2006 at www.datcat.org the catalog has received contributions of metadata for over 100 collections indexing 150,000+ files totaling over 26TB of data. Funding for the project ended in 2006, but we still hope to add some usability features: (1) incorporate extensive user feedback into development of a streamlined contribution mechanism requiring much less time from the contributor; (2) perform more detailed log analysis of DatCat user behavior, and refine user interface to optimize user time searching the catalog; (3) maintain and extend the catalog with additional and newer datasets. We're making slight progress on (1) and (3); well get back to this in 2009 or 2010 if it's still considered useful.

Major Milestones

  • New Contributions of Metadata
    • During 2008, the DatCat catalog received 37 entries documenting the metadata for collections and publications. Highlights include:
      • Collection: Day in the Life of the Internet, March 18-19, 2008 (DITL-2008-03-18)
      • Collection: CAIDA Anonymized 2008 Internet Traces Dataset
      • Collection: Mesh Routing Data Collection - Routing tables and ScanWireless Scans
      • Publication: Traceroute Probe Method and Forward IP Path Inference published 2008-10 in ACM SIGCOMM Internet Measurement Conference

Funding Sources

Though no longer directly funded, DatCat received some support in 2008 from:


Tools

CAIDA's mission includes providing access to tools for Internet data collection, analysis and visualization to facilitate network measurement and management. However, CAIDA does not receive specific funding for support and maintenance of the tools we develop. Please check our home page for a complete listing and taxonomy of CAIDA tools.

2008 Tool Development

CoralReef

The CoralReef Software suite, developed by CAIDA, provides a comprehensive software solution for data collect and analysis from passive Internet traffic monitors, in real time or from trace files. Real-time monitoring support includes system network interfaces (via libpcap), FreeBSD drivers for a number network capture cards, including the popular Endace DAG (10GE/OC192, POS and ATM) cards. The package also includes programming APIs for C and perl, and applications for capture, analysis, and web report generation. This package is maintained by CAIDA developers with the support and collaboration of the Internet measurement community.

We released CoralReef version 3.8.2 late in 2008.

Anonymization Tools Taxonomy

In late 2008, we published the Anonymization Tools Taxonomy to help those searching for tools to help perform anonymization of Internet log files and trace data. The Anonymizations Tools Taxonomy provides a summary of each tool along with pointers to more detailed information in addition to review comments, when available. We also released the Summary of Anonymization Best Practice Techniques as part of the DHS PREDICT Project.

CAIDA Tools Download Report

The table below displays all the CAIDA developed tools distributed via our home page at https://www.caida.org/tools/ and the number of downloads of each version during 2008.

As a change from 2007, this year's download reports do not contain accesses by spiders, crawlers, or other robots, nor does it count multiple accesses by the same downloader.

  • Currently Supported Tools

    ToolDescriptionDownloads
    coralreefA software suite to collect and analyze data from passive Internet traffic monitors.1157
    dscA system for collecting and exploring statistics from DNS servers.2,154
    dnsstatAn application that collects DNS queries on UDP port 53 to report statistics.222
    dnstopA libpcap application that displays tables of DNS traffic.7,405
    sk_analysis_dumpA tool for analysis of traceroute-like topology data.242
    walrusA tool for interactively visualizing large directed graphs in 3D space.3,569
    libseaA file format and a Java library for representing large directed graphs.406
    Chart::GraphA Perl module that provides a programmatic interface to several popular graphing packages. Note: Chart::Graph is also available on CPAN.org. The numbers here reflect only downloads directly from caida.org, as download statistics from CPAN are not available. 127
    plot-latlongA tool for plotting points on geographic maps.255
  • Past Tools (Unsupported)

    ToolDescriptionDownloads
    MapnetA tool for visualizing the infrastructure of multiple backbone providers simultaneously.14,065
    GeoPlotA light-weight java applet creates a geographical image of a data set.538
    GTraceA graphical front-end to traceroute.714
    otterA tool used for visualizing arbitrary network data that can be expressed as a set of nodes, links or paths.535
    plotpathsAn application that displays forward and reverse network path data.136
    planktonA tool for visualizing NLANR's Web Cache Hierarchy51

Data

In 2008, CAIDA captured and curated data from three primary sources of network data: 1) macroscopic topology, 2) passive traffic traces at tier1 Internet Backbone links, and 3) passive traffic traces from the UCSD Network Telescope. We derived several datasets from this data that we make pubilcly available to researchers including our AS Rank, AS adjacencies, and Router adjacencies datasets as well as several Backscatter datasets. CAIDA makes some data available to anyone without restriction. CAIDA makes a subset of its collected data available only to academic researchers and CAIDA members, with data access subject to Acceptable Use Policies (AUP) designed to protect the privacy of monitored communications, ensure security of network infrastructure, and comply with the terms of our agreements with data providers.

Major Milestones

  • We collected our first trace data from the equinix-chicago and equinix-sanjose passive monitors connected to tier1 ISP backbone links at Equinix facilities in Chicago, IL, and San Jose, CA.
  • We deactivated skitter data collection and transitioned to our next generation topology measurement infrastructure named Archipelago (Ark) for collecting IPv4 topology data.
  • We started collecting IPv6 topology data on the Archipelago infrastructure.
  • We collected data on the Conficker worm on the UCSD Network Telescope

Data Collected in 2008

Data TypeFirst dateLast dateTotal size (on disk)
Macroscopic Topology Measurements, IPv4 (Archipelago)2008-01-012008-12-31259 GB
Macroscopic Topology Measurements, IPv6 (Archipelago)2008-12-122008-12-317.6 MB
Internet backbone Traces2008-03-192008-12-172.0 TB
Network Telescope2008-01-012008-12-317.2 TB
A Day In The Life (DITL) of the Internet - OARC 20082008-03-182008-03-191.9 TB
DNS Names for IPv4 Routed /24 Topology Dataset2008-03-012008-12-3111 GB *
DNS root/gTLD RTT Dataset2008-01-012008-12-311.5 GB
* Size of this dataset may vary as we store and serve a rotating window of the last 30 days for this dataset.

Data Distributed in 2008

We process raw data into specialized datasets to increase its utility to researchers and to satisfy security and privacy concerns. In 2008, this resulted in the following datasets:

  • Publicly Available Data

    These datasets require that users agree to an Acceptable Use Policy, but are otherwise freely available.

DatasetUnique visitors (IPs)Data Downloaded
AS Rank420010.8 GB
AS Links (AS Adjacencies)5712.69 GB
AS Relationships77911.4 GB
Router Adjacencies106316 MB
Witty Worm Dataset139244 MB
AS Taxonomy4523.9 MB*
Code-Red Worms Dataset1159.07 GB
We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly undercounting the total volume of data served through our dataservers in 2008, but is necessary because of limitations in dataserver logging combined with abberant user behaviour.
* AS Taxonomy dataset is included in a mirror of the GA Tech main AS Taxonomy site, and thus does not represent all access to this data.
  • Restricted Access Data

    These datasets require that users:

    • be academic or government researchers, or join CAIDA;
    • request an account and provide a brief description of their intended use of the data; and
    • agree to an Acceptable Use Policy.
DatasetUnique visitors (usernames)Data Downloaded *
Anonymized Internet Backbone Traces1223.31 TB
Backscatter Datasets421.81 TB
(Raw Topology Traces from Archipelago infrastructure)
58702 GB
Raw Topology Traces (skitter)48496 GB
Witty Worm Dataset20144 GB
DNS Names for IPv4 Routed /24 Topology Dataset323.57 GB
2003 Internet Topology Data Kit488.10 GB
DNS Root/gTLD server RTT Dataset614.7 MB
* We count the volume of data downloaded per unique user per unique file, so even if a user downloads a file 100 times, we only count that file once for that user. This methodology results in significantly undercounting the total volume of data served through our dataservers in 2008, but is necessary because of limitations in dataserver logging combined with abberant user behaviour.
  • Restricted Access Data Requests

    Statistics on how many requests for data access we got and how many we granted. We got 60% more requests in 2008 then in 2007, and approved 50% more requests for access to restricted datasets.

DatasetNumber of requests receivedNumber of requests granted access
Anonymized Backbone and Peering Link Traces207139
Active Topology Trace Datasets13477
Backscatter Datasets10952
Witty Worm Dataset3321
DNS Root/gTLD server RTT Dataset127
Totals495296
  • NLANR Datasets

    This data was collected by the NLANR project. When this project came to an end in July 2006, CAIDA inventoried NLANR equipment and took over temporary curation and distribution of NLANR data. CAIDA now maintains both the NLANR AMP and PMA public data repositories, on a best effort basis. Our efforts of serving this data are currently unfunded and we plan to cease serving this data in May 2009. For sponsorship or taking over hosting responsibility for this data please contact nlanr-info@caida.org. We've had significant outages and gaps in logging, which make it impossible for us to provide relevant statistics for the AMP Topology Traces.

DatasetUnique visitors (IPs)Data Downloaded
PMA Traffic Traces3991206 TB
AMP Topology Traces----

Workshops

As part of our mission to investigate both practical and theoretical aspects of the Internet, CAIDA hosted the 9th CAIDA-WIDE Worshop, co-hosted the SFI Workshop on Networks and Navigation, and held the 1st CAIDA-WIDE-CASFI workshop.

Please check our web site for a complete listing of past and upcoming CAIDA workshops.

9th CAIDA-WIDE Workshop

The 9th CAIDA/WIDE workshop was held on January 19th and 20th, 2008 (by invitation only) in the East-West Center on the University of Hawaii campus as part of Techs in Paradise (TIP2008). The main topics presented and discussed at the workshop included: Internet measurement projects and DNS. We used the venue to discuss the upcoming 2008 Day In The Life of the Internet event and compiled a list of the top questions and data types.

External - SFI Workshop of Networks and Navigation

On August 4-6, 2008 Santa Fe Institute (SFI) hosted an interdisciplinary "Networks and Navigation" workshop jointly organized and supported by SFI and CAIDA. The main topics of discussion included examination of similarities between complex networks sharing small world characteristics and navigation of such networks using only local information. A deeper understanding of the origin of these locally-navigable structures would (1) clarify the role (if any) that latent metric spaces play in the navigability of networks, and potentially point to novel generative mechanisms based on such spaces, (2) point us toward novel routing algorithms and search protocols for Internet-like topologies, other communication networks, and possibly social networks, (3) shed light on the potentially different behavior of passive versus active spreading on these networks, i.e., diffusion versus search, (4) identify the relationship between navigability and other network properties, e.g., community structure, degree heterogeneity, etc.

10th CAIDA-WIDE Workshop - 1st CAIDA/WIDE/CASFI Workshop

The 10th CAIDA/WIDE workshop was held on August 15-16, 2008 (by invitation only) in Marina del Rey, CA. This workshop supported a three-way collaboration between researchers from CAIDA (USA), WIDE (Japan) and CASFI (South Korea). The main topics presented and discussed at the workshop included: updates on active Internet measurements of Internet topology, reverse paths, DNS source port randomness, analysis of DITL 2008 data, trends in residential user traffic, and automated application signature generation for traffic identification.


Publications

The following table contains the papers published by CAIDA for the calendar year of 2008. Please refer to Papers by CAIDA on our web site for a comprehensive listing of publications.

YearMonthAuthor(s)TitlePublication
2008Dec
  1. Keys, Ken
IP Alias Resolution Techniques: Technical ReportCooperative Association for Internet Data Analysis (CAIDA)
2008Dec
  1. Kim, Hyunchul
  2. claffy, kc
  3. Fomenkov, Marina
  4. Barman, Dhiman
  5. Faloutsos, Michalis
  6. Lee, KiYoung
Internet Traffic Classification Demystified: Myths, Caveats, and the Best PracticesACM SIGCOMM Conference on emerging Networking EXperiments and Technologies (CoNEXT)
2008Oct
  1. Luckie, Matthew
  2. Hyun, Young
  3. Huffaker, Bradley
Traceroute Probe Method and Forward IP Path InferenceACM Internet Measurement Conference (IMC)
2008Oct
  1. Castro, Sebastian
  2. Wessels, Duane
  3. Fomenkov, Marina
  4. claffy, kc
A Day at the Root of the InternetACM SIGCOMM Computer Communication Review (CCR)
2008Oct
  1. Huffaker, Bradley
  2. Fomenkov, Marina
  3. claffy, kc
Influence Maps - a novel 2-D visualization of massive geographically distributed data setsInternet Protocol Forum
2008Sep
  1. Raghavan, Veena
  2. Riley, George
  3. Jaafar, Talal
Realistic Topology Modeling for the Internet BGP Infrastructure IEEE Symposium on Modeling, Analysis and Simulation of Computers and Telecommunication Systems (MASCOTS)
2008Aug
  1. Krapivsky, Paul
  2. Krioukov, Dmitri
Scale-free networks as pre-asymptotic regimes of super-linear preferential attachmentPhysical Review E
2008Aug
  1. claffy, kc
Ten Things Lawyers Should Know About Internet ResearchCooperative Association for Internet Data Analysis (CAIDA)
2008Jul
  1. Dimitropoulos, Xenofontas
  2. Serrano, Mirian Ángeles
  3. Krioukov, Dmitri
On Cycles in AS RelationshipsACM SIGCOMM Computer Communication Review (CCR)
2008Jul
  1. Meinrath, Sascha
  2. claffy, kc
The COMMONS Initiative: Cooperative Measurement and Modeling of Open Networked SystemsCommLaw Conspectus
2008May
  1. Krioukov, Dmitri
  2. Papadopoulos, Fragkiskos
  3. Boguñá, Marián
  4. Vahdat, Amin
Efficient Navigation in Scale-Free Networks Embedded in Hyperbolic Metric SpacesarXiv cond-mat.stat-mech/0805.1266
2008Feb
  1. Serrano, Mirian Ángeles
  2. Krioukov, Dmitri
  3. Boguñá, Marián
Self-similarity of complex networks and hidden metric spacesPhysical Review Letters

Presentations

CAIDA staff and collaborators actively attend and contribute to relevant workshops and conferences and other events to present our research and gain better understanding of Internet infrastructure, trends, topology, routing, and security. Last year, CAIDA staff presented at the ARIN meeting, DHS/SRI, the University of Chile and Jornadas Chilenas de Computación, the Internet Measurement Conference, University of Aveiro and DHS Cybersecurity PI Meeting, the Santa Fe Institute, NANOG, our own WIDE and WIDE-CASFI workshops and DNS/OARC workshops.

The following table contains the presentations and invited talks published by CAIDA for the calendar year of 2008. Please refer to Presentations by CAIDA on our web site for a comprehensive listing.

YearMonthPresenters(s)TitleVenue
2008 Dec
  1. Krioukov, Dmitri
Hyperbolic geometry and scale-free topology of complex networks BCNet Workshop
2008 Nov
  1. claffy, kc
Internet as emerging critical infrastructure: what needs to be measured? UChile / Jornadas Chilenas de Computacion (JCCC)
2008 Oct
  1. claffy, kc
Internet Science: Why Wall Street and Main Street Should Care (a survey of CAIDA activities) DHS/SRI Infosec Technology Transition Council Meeting
2008 Oct
  1. claffy, kc
ARIN & CAIDA IPv6 Survey Summary ARIN
2008 Oct
  1. Huffaker, Bradley
Traceroute Probe Method and Forward IP Path Inference ACM Internet Measurement Conference (IMC)
2008 Sep
  1. Aben, Emile
CAIDA Passive Measurement Infrastructure MOMENT
2008 Sep
  1. claffy, kc
Leveraging the Science and Technology of Internet Mapping for Homeland Security DHS Cybersecurity PI Meeting
2008 Sep
  1. Aben, Emile
DITL 2008 and DatCat MOMENT
2008 Aug
  1. Hyun, Young
Archipelago Measurement Infrastructure: Status and Experiences WIDE-CASFI
2008 Aug
  1. Castro, Sebastian
Measurements of Root Server Traffic in DITL 2008 WIDE-CASFI
2008 Aug
  1. Vest, Tom
What Happens When IPv4 Runs Out? WIDE-CASFI
2008 Aug
  1. Aben, Emile
DITL 2008 and DatCat WIDE-CASFI
2008 Aug
  1. Huffaker, Bradley
ARIN & CAIDA IPv6 Survey WIDE-CASFI
2008 Aug
  1. Papadopoulos, Fragkiskos
Application of Hyperbolic Embedding in Overlay Network Construction Santa Fe Institute (SFI)
2008 Aug
  1. Moore, David
Understanding Global Internet Health Santa Fe Institute (SFI)
2008 Aug
  1. Aben, Emile
CAIDA Passive Measurement Infrastructure CAIDA-WIDE-CASFI Joint Measurement Workshop
2008 Jun
  1. Castro, Sebastian
DITL 2008 Analysis OARC
2008 Jun
  1. Krioukov, Dmitri
Routing in the Internet and Navigability of Scale-Free Networks Various
2008 Jun
  1. Castro, Sebastian
In the search of heavy hitters OARC
2008 May
  1. claffy, kc
CAIDA participation in PREDICT DHS PREDICT PI Meeting
2008 May
  1. Krioukov, Dmitri
What we know and what we don't about the Internet University of Aveiro
2008 Apr
  1. claffy, kc
ARIN & CAIDA IPv6 Survey Results ARIN
2008 Feb
  1. Wessels, Duane
Day In The Life of the Internet 2008 Data Collection Event North American Network Operators' Group (NANOG)
2008 Jan
  1. Aben, Emile
Cataloging DITL data for research use WIDE
2008 Jan
  1. Wessels, Duane
October 2007 survey of open resolvers in the Internet WIDE
2008 Jan
  1. Castro, Sebastian
DNS: comparison of 2006 and 2007 snapshots WIDE
2008 Jan
  1. Polterock, Joshua
DITL 2007 Collection Summary WIDE
2008 Jan
  1. Wessels, Duane
Lessons from DITL 2007 - and what we should do different in 2008 WIDE
2008 Jan
  1. Castro, Sebastian
Comprehensive approach to the analysis of DITL DNS data WIDE
2008 Jan
  1. Huffaker, Bradley
IPv6 Collection 2008 a View of the IPv6 Networks WIDE
2008 Jan
  1. Polterock, Joshua
Bulk DNS Lookup Service WIDE
2008 Jan
  1. Wessels, Duane
DNS nameserver database at OARC WIDE

Web Site Usage

In 2008, CAIDA's web site continued to attract considerable attention from a broad, international audience. Visitors seem to have particular interest in CAIDA's tools and analysis.

The table below presents the monthly history of traffic to www.caida.org for 2008. To show a more accurate representation of website traffic, these statistics do not include traffic from spiders, crawlers or other robots.



MonthUnique visitorsNumber of visitsPagesHitsBandwidth (GB)
Jan 200840,89772,619307,8391,362,935120.67 GB
Feb 200844,87274,938249,3541,437,31058.29 GB
Mar 200842,43978,108278,3661,535,97854.83 GB
Apr 200846,37579,438280,1871,571,98553.85 GB
May 200846,35478,448272,2401,440,67450.09 GB
Jun 200845,45876,786248,1341,431,76541.89 GB
Jul 200844,67175,530235,5411,427,71442.61 GB
Aug 200849,75977,842279,9901,686,47656.82 GB
Sep 200848,80472,929272,9631,607,29449.06 GB
Oct 200850,83076,711279,9881,596,16159.53 GB
Nov 200845,74370,342233,2951,367,13952.13 GB
Dec 200841,95763,733225,1401,301,98846.05 GB
Total548,159897,4243,163,03717,767,419685.81 GB

Organizational Chart

CAIDA would like to acknowledge the many people who put forth great effort towards making CAIDA a success in 2008. The image below shows the functional organization of CAIDA. Please check the CAIDA Staff page for more complete information about CAIDA staff.

[Image of CAIDA Functional Organization Chart]

CAIDA Functional Organization Chart


Funding Sources

CAIDA thanks our 2008 sponsors, members, and collaborators.

The charts below depict funds received by CAIDA during the 2008 calendar year.

Funding SourceAllocationsPercentage of Total
DHS551,85033.1%
DOI483,21629.0%
NSF274,99916.5%
GIFT231,94213.9%
CNS62,4103.7%
CSE61,5033.7%
Total1,665,920100%

Figure 1. Allocations by funding source received during 2008


Operating Expenses

The charts below depict CAIDA's Annual Expense Report for the 2008 calendar year.

LABORSalaries and benefits paid to staff and students
IDCIndirect Costs paid to the University of California, San Diego including grant overhead (52-54%) and telephone, Internet, and other IT services.
SUBCONTRACTSSubcontracts to the Internet Systems Consortium (ISC), Georgia Institute of Technology, and The Measurement Factory
TRAVELTrips to conferences, PI meetings, operational meetings, and sites of remote monitor deployment.
SUPPLIES & EXPENSESAll office supplies and equipment (including computer hardware and software) costing less than $5000.
EQUIPMENTComputer hardware or other equipment costing more than $5000.
TRANSFERSExchange of funds between groups for recharge for IT desktop support and Oracle database services.
Program AreaExpensesPercentage of Total
Labor1,509,13456.5%
IDC819,28530.7%
Subcontract141,0705.3%
Travel109,8334.1%
Supplies & Expenses67,6632.5%
Equipment20,7580.8%
Transfers5,1700.2%
Total2,672,913100.0%

Figure 2. 2008 Operating Expenses

These numbers do not include salaries or expenses paid by the Computer Science & Engineering Department of the Jacobs School of Engineering at the University of California, San Diego.


Program AreaExpensesPercentage of Total
DNS899,19833.7%
Topology551,29120.6%
Infrastructure938,37335.1%
Routing209,3067.8%
Policy51,4831.9%
Outreach20,8300.8%
Total2,670,481100.0%

Figure 3. 2008 Expenses by Program Area

  Last Modified: Tue Oct-13-2020 22:21:53 UTC
  Page URL: https://www.caida.org/home/about/annualreports/2008/index.xml