Network Modeling and Simulation Proposal

Macroscopic Internet data collection and analysis
in support of the NMS community

Macroscopic Internet data measurement and analysis

Correlating sources of possibly predictive data:

routing, performance, topology, and workload

PROPOSAL - Section I - Administrative

BAA-0018
Network Monitoring and Simulation:
Measurement, Validation, and On-line Simulation

Global Internet Data in support of
Modeling and Simulation

Dr. K.C. Claffy, P.I. - Technical POC Cooperative Association for Internet Data Analysis (CAIDA)
University of California's San Diego Supercomputer Center
9500 Gilman Dr. MS-0505
La Jolla, CA 92093-0505
858/534-8333; 858/534-5113 fax
kc@caida.org

Bobbie Velasquez - Administrative POC
University of California's Office of Contract & Grant Administration
9500 Gilman Dr. MS-0934
La Jolla, CA 92093-0934
858/534-0242; 858/534-0280 fax
bvalesquez@ucsd.edu

Signature:__________________________________ Date: _________________
K. C. Claffy, Principal Investigator

Signature: _____________________________ Date: _________________
Peter Arzberger, SDSC Executive Director

OFFICIAL AUTHORIZED TO SIGN FOR INSTITUTION:

Signature:______________________________________ Date:__________________
Richard Attiyeh, VC for Research

UCSD is Other Education - An Institution of Higher Education

I.  Administrative

II.     Detailed Proposal Information

    A. Innovative claims for the proposed research 

    B. Proposal Roadmap 

    C. Technical rationale, technical approach and constructive plan 
        1.  Technical Rationale
        2.  Technical Approach 
            a.  Monitoring 
            b.  Archiving and Serving Data Sets to the Community
            c.  Interactive Visualization and Navigation Tools
            d.  Developing model of Internet core

    D. Deliverables 

    E. Statement of Work (SOW) 

    F. Milestones and Schedule 

    G. Technology Transfer 

    H. Comparison with Other Ongoing Research

    I. Key Personnel 

    J. Description of the Facilities 

    K. Experimentation and Integration Plans 

    L. Cost by task 

III. Additional Information

Section II. Detailed Proposal Information

A. Innovative claims for the proposed research

The Cooperative Association for Internet Data Analysis (CAIDA) is a program of the University of California's San Diego Supercomputer Center. CAIDA is actively involved in the monitoring and analysis of Internet performance, topology, workload, and routing data from the global Internet. This includes DARPA-supported active monitoring of the global infrastructure; OC3 and OC12 passive monitoring at commercial and educational facilities; DARPA-sponsored development of an OC48 level passive monitor; and monitoring of commercial and research multicast traffic.

We firmly believe that modeling and simulation tools provide fundamental building blocks for the research, vendor, and service provider communities. We also feel that reasonable WAN models or simulations should:

include workload, topology, routing and performance data that reflect actual commercial Internet, for setting/validating/calibrating assumptions and for introducing appropriate complexity;
address scalability (e.g., number of network nodes and links and ranges of bandwidth);
address policy-based inter-domain route exchange and inter-domain multicast routing protocols;
include involvement of commercial networks and vendors in defining the scope and priorities of WAN models

Through this proposal, we describe our plan to develop tools, data, and analyses aimed at supporting the network modeling and simulation communities. Specifically, we plan to:

develop and/or enhance Internet measurement tools and collection efforts; expand the quantity and quality of datasets available to the community (e.g., topology, performance, workload, and routing);
develop mechanisms to manipulate, analyze, navigate, and compare large inter-domain routing tables, and compare them not only to one another but to topology maps derived from active probe measurements. A graphical interface to a routing table that is intuitive and efficient could facilitate the ability to identify incongruities quickly.
develop mechanisms to analyze a topological model of the `Internet core', including techniques to identify infrastructural vulnerabilities created by dependencies on critical components, e.g., the relative importance of specific public exchange points and private peering points to reachability of select groups of ASes and networks.
analyze the extent and effect of unicast and multicast routing incongruities on performance. Anticipated growth in streaming media and multicast technologies will increase the need for recommendations regarding measurement, analysis, and mitigation of problems generated by routing incongruities between these types of traffic.
make the resulting tools, data, and lessons learned publicly available through documented web-based resources and outreach efforts targeting users and designers of simulators, faculty, researchers, and vendors.

B. Proposal Roadmap

Area of Concentration	Section
Main goal of work: Enhancing the quality of WAN simulators through acquisition and analysis of real-world Internet data and making such data available to third party (research and commercial) model/simulation efforts, and enhancing the utility of various tools to acquire, navigate, and analyze such data.	C1
Tangible benefits to end users: The availability of Internet datasets and Internet case studies should enhance the relevance of WAN simulators by introducing observations on actual real-world topology, routing, performance, and workload information into the data flow inputs. The project's focus on commercial involvement and education and outreach should facilitate transfer of these data and results into the commercial, research, and academic communities.	C2,D
Critical technical barriers: The primary technical barriers involve: (1) developing techniques for accurate correlation of various Internet data (topology, performance, workload, and routing); (2) determining how to best make available data sets (both data type and formats) in ways that provide meaningful real-world Internet information for simulators; (3) maintaining the ability to do IP measurements in the face of changing underlying transmission and switching technologies. Note that significant barriers facing this project are political and logistical rather than technical: instrumentation of the commercial Internet for collection of relevant workload and routing information requires both new tools and, equally important, a high level of trust from providers. Engaging vendors as partners in the evaluation and refinement of their own measurement functionality similarly requires cultivation of strong relationships with relevant technical and management personnel.	C2a,b,c,d
Main elements of the proposed approach: The four (4) major areas of this proposal are: developing and/or enhancing Internet measurement tools to collect and correlate Internet data relating to multicast and unicast traffic workload, performance, topology, and inter-domain routing; making resulting datasets available to the modeling and simulation community in formats most useful to them; developing mechanisms to manipulate, analyze, navigate, and compare large routing tables and other topology data sets. evaluating topological vulnerabilities in the Internet infrastructure created by dependencies on critical links or nodes in the Internet core	C2a,b,c,d
Specific basis for confidence that the proposed approach will overcome the technical barriers: CAIDA has a history of success working with the research and commercial sectors as leaders in the field of measurement tool development and data collection and analysis of Internet data. We currently have several data collection initiatives underway at sites throughout the Internet, with strong collaborative relationships with collection sites (representing both commercial and non-commercial sector). CAIDA and SDSC have visualization experts with years of experience dealing with large and varied Internet data sets. We work directly with router vendors in evaluation and suggestions for measurement functionality internal to equipment, and have provided several simulation developers (MIL3, ISI, UC Berkeley) with constructive criticism of their products/efforts.	C,G,H,J
Nature of expected results: The products of this effort will include: enhanced public measurement/monitoring tools, new Internet datasets (raw/anonymized and correlated) available in formats the community needs, visualization tools tailored to handle large Internet data sets, and mechanisms and techniques to identify strategic vulnerabilities in macroscopically measured IP topologies.	C2,D
The risk if the work is not done: Realistic Internet modeling and simulation is impossible without multi-dimensional datasets from real-world Internet topologies, traffic and routing. Lack of legitimately representative data has already cost the community significant progress in being able to design tools, architectures, or new protocols.	C1,2
Criteria for evaluating progress and capabilities: The timelines and deliverables described in this project are aggressive. More relevant however will be the response we receive from the modeling and simulation community using our data sets. Public feedback will supply a constant measure of the relevance and usefulness of our efforts.	C,D,F

C. Technical rationale, technical approach and constructive plan

1. Technical Rationale

In the IEEE's Millennial Forecast issue, kc claffy writes about current conditions facing modeling of today's Internet.

"Internet traffic behavior has been resistant to modeling. The reasons derive from the Internet's evolution as a composition of independently developed and deployed (and by no means synergistic) protocols, technologies, and core applications. Moreover, this evolution, though "punctuated" by new technologies, has experienced no equilibrium thus far.
The state of the art, or lack thereof, in high-speed measurement is neither surprising nor profound. It is a natural consequence of the economic imperatives in the current industry, where empirically grounded research in wide-area Internet modeling has been an obvious casualty. Specifically, the engineering know-how required to develop advanced measurement technologies, whether in software or hardware, is essentially the same skill set required to develop advanced routing and switching capabilities. Since the latter draw far greater interest, and profit, from the marketplace, it is where the industry allocates engineering talent.

kc claffy, "Measuring the Internet", IEEE Internet Computing: An Internet Millennial Forecast Mosaic, January/February 2000, see http://computer.org/internet/v4n1/index.htm

The commercial and research communities are vocal about perceived inadequacies in currently available Internet modeling and simulation products, yet the tools available to them remain insufficient for research and development of many WAN-related technologies. Through this project CAIDA proposes to expand Internet data acquisition, analysis, and visualization initiatives in support of efforts to develop realistic Internet datasets for community use in modeling, simulation and related research.

2. Technical Approach and Plan

This research project is predicated on the belief that the development of scalable solutions for future internetworking requires improvements in our understanding of the behavior and trends associated with the existing Internet. Through this research effort, we plan to develop techniques for correlating massive volumes of Internet data (routing, topology, workload and performance) and various tools aimed at enhancing the community's ability to model future traffic and routing behavior of the Internet.

a) Monitoring (Task 1)

This task involves developing new measurement/monitoring tools, and expanding the scope and breadth of existing monitoring measurement initiatives, and creating techniques for correlation of these datasets. Datasets from these efforts will be publicly available to researchers and used in developing the case studies described below. Key tools and data acquisition efforts include:

Multicast Traffic

Multicast represents an important emerging subset of the global Internet infrastructure. Since the technology is still rather young, wide area topologies tend to be relatively small, and there are fewer barriers to inter-ISP data sharing. The Internet's multicast routing architecture is currently evolving from the original DVMRP tunneled infrastructure to the more recent, vendor-recommended, inter-domain multiprotocol MBGP (with PIM-SM+MSDP to build multicast distribution trees). Although mapping and monitoring the DVMRP infrastructure was challenging enough, with diagnostic tools only casually supported by the research and vendor community, the transition to MBGP/PIM/MSDP has made the situation worse, namely: there are very few data collection and analysis efforts for MBGP routing information bases, and there is currently no mechanism to determine the MSDP topology or reasonably estimate the amount of control traffic this portion of the system is generating.

Future expected transitions include replacing the current interdomain solution with the IETF's BGMP (Border Gateway Multicast Protocol), which will impose a new control plane (for tree-building) on top of the routing information exchange plane that will further complicate the measurement problem. BGMP also requires BGP extensions to carry class D addresses in the multicast RIB. Overall, we have found that the best global picture of multicast traffic and usage is through views from routers in different administrative domains.

The Mantra tool (see https://www.caida.org/tools/measurement/mantra/) collects router-based data (e.g., from tables and SNMP MIBs) associated with various aspects of multicast routing. This includes information on routing, such as the usage and characterictics associated with MBGP routes, characteristics of DVMRP routes, traffic flow statistics, AS-specific topology information, and data on the extent to which GLOP Addressing (233/8 Space) and Multicast Source Discovery Protocol (MSDP) are used throughout the Internet (see Figures 1-3).

Figures 1-3: Illustrative Statistical and Topology Data Collected by Mantra on Multicast Traffic at Select Routers, see https://www.caida.org/tools/measurement/mantra/.

[Mantra mroute]

[Mantra routes]

[Mantra mcast topology]

Mantra's early development was led by a UCSB graduate student, Prashant Rajvaidya, with support provided by the NSF Internet Atlas project, see http://atlas.caida.org/. Continued development and enhancement of Mantra's monitoring and related multicast tools (described below) will require external support from this project. Mantra development tasks include:

complete current skeleton of the WWW page. (The first version focused on mlisten-like intra-domain statistics; the second version, only in skeleton format, focuses more on inter-domain routing statistics. It is available at http://atlas.caida.org/.
incorporate additional sites into monitoring database
develop standard format for storing Mantra tables
make data tables available (with permission from networks) for distribution as inter-domain multicast routing data sets
develop models of what ideal operating statistics should look like for individual sites
develop Mantra to be a distributed processing tool where multiple sites generate, process, and exchange data

Other Internet Traffic Datasets

CAIDA is gathering indicators of path-specific network performance (e.g., round-trip latency and loss) on the global infrastructure by CAIDA using skitter active measurement probes (https://www.caida.org/tools/measurement/skitter/) This active monitoring project is supported by DARPA under CA N66001-98-2-8922. Twenty-two monitors are currently deployed, including four at DNS root-server locations, with destination lists targeting several different investigations. Additional monitors will be deployed during the year 2001. Data from this effort are being used by various research groups throughout the U.S. and abroad. Data from nine of these monitors formed the basis of an Asia-Pacific traffic study currently presented at INET'2000: https://www.caida.org/publications/papers/2000/asia_paper/ (paper attached).

These data can assist in developing topologies, performance and route stability scenarios. DARPA's support for skitter continues through July 2001.

Workload (passively monitored) data is available using tools such as the OCxmon/CoralReef (see https://www.caida.org/tools/measurement/coralreef/) and cflowd (see https://www.caida.org/tools/measurement/cflowd/). We plan to develop additional workload datasets and correlate them with topology, performance, and routing information as part of this proposed project. Support for CoralReef is provided through September 2000 by NSF and July 2001 by DARPA. We proposed to continue enhancement of CoralReef under this project in order to maintain its usefulness for measurements involving changes in network transport and applications protocols. Cflowd is supported by CAIDA members.

Several BGP4 route mirror sites are available for analysis of routing data. The foremost among these is the industry-supported University of Oregon's RouteView project (see http://www.antc.uoregon.edu/route-views/). We plan to correlate these routing data (archived at http://moat.nlanr.net/Routing/rawdata/) with other datasets, as well as expand monitoring of routers co-located with skitter and OCxmon/CoralReef monitors (for correlation with active and passive datasets). File formats for various datasets will be published on the web.

b) Archiving and Serving Data Sets to the Community (Task 2)

In their paper "Why We Don't Know How To Simulate The Internet", Paxson and Floyd approach the development of data flows for study by reducing the sources of actual traffic traces to the point that they can derive Poisson arrival times for each flow which they then characterize. We believe that additional details about the underlying networks can provide valuable insights to the development of data flow information for simulators, including data on the topology and performance associated with specific workloads or events.

We also plan to work with the network modeling and simluation (NMS) community to identify what formats of datasets are most useful to them. Each format must focus on the potential use of the data, e.g., a study focusing on emerging protocols may require different data format from one designed to profile the effects of network congestion, outages, or route flapping. Libraries of case studies may be developed on select topics by varying data from different locations (e.g., core vs. edge), times, topology (e.g., what may be typical for one network may be unusual for another), etc.

We propose to work with organizations that publish the API for their simulator, including MIL3's OPnet and the research community's USC/ISI/UCB VINT/ns and Winlab's SSFnet, to develop data flow information inputs using case studies based on observations of real Internet traffic.

Data sets provided will include:

Inter-domain Routing - In collaboration with the University of Oregon's RouteViews project, and NLANR's MOAT group, we will make analzye and archive commercial routing table data from a variety of core Internet providers.
Topology and Performance - CAIDA has for several years collected, and continues to maintain, end-to-end ICMP-based latency data from its skitter source monitors to many thousands of Internet destinations.
Traffic workload - data flow modules from analysis of traffic profiles. This data may come from Coral monitors obtained from NASA's Ames Internet Exchange (AIX), CAIDA's commercial collaborators, e.g., MCI Worldcom's MAE-West Network Access Point (NAP); HPC sites monitored under NLANR's MOAT initiative; or Mantra monitroing for multicast traffic. Note that all data flow information will be anonymized to ensure privacy of networks and their customers.
multicast traffic behavior -- CAIDA and UC, Santa Barbara collaborative effort on Mantra (see https://www.caida.org/tools/measurement/mantra/) enables collection and archiving of router-based data (e.g., from tables and SNMP MIBs) associated with various aspects of multicast routing. This includes information on routing, such as the usage and characterictics associated with MBGP routes, characteristics of DVMRP routes, traffic flow statistics, AS-specific topology information, and data on the extent to which GLOP Addressing (233/8 Space) and Multicast Source Discovery Protocol (MSDP) are used throughout the Internet (see Figures 1-3).

CAIDA analyses involving each of the above types of datasets can be found at https://www.caida.org/tools/measurement/skitter/RSSAC/ and https://www.caida.org/publications/papers/2000/AIX0005/.

As an example, Figure 4 below reflects traffic workload cross-section on UCSD's Internet access link, using using a Coral OC3 passive monitor.

Figure 4: Analysis of UCSD traffic application cross-section as collected 5 aug 00 from UCSD's CERFnet link

[UCSD traffic application cross-section]

Figures 5-6: Protocol breakdown from OC3 CoralReef monitor on AT&T/Cerfnet's connection at SDSC, 3 August 00

[OC3 CoralReef Protocol breakdown]

Emerging applications such as video games and mp3-related files are also possible to track. CoralReef's filtering capabilities permit real-time identification of many of these signatures at OC3 and OC12 speeds. Analysis of traffic at NASA's Ames Internet eXchange (AIX) by CAIDA indicates that up to 5% of packets at this exchange point consisted of traffic from three online games, Quake-1, Quake-2, and StarCraft. Figure 7a shows a decline the in the fraction of traffic generated by these popular online games over the first 8 months of our AIX study. These games were popular in early 1999 but as the graph shows their popularity has since declined fairly steadily. Figure 7b shows quite a different trend from the previous graph, including traffic from several newer games in addition to the older ones. The new graph includes traffic generated by Half Life, Quake 3: Arena, and Unreal in addition to Starcraft, Quake II, and QuakeWorld. Although there is not enough data to determine a trend, the median traffic fractions are much higher on this graph than the previous one. Hence, the overall fraction of online game traffic seems to be on the rise, but it is a moving target. The increase is primarily generated by new games as they gain popularity, while older games seem to decrease in popularity over time. we recognize that emerging-application traffic characterization is difficult if not impossible using current methodologies. Attention to alternative methodologies, or recognition that it is an unsolvable, remain open and possible.

Figure 7a: Fraction of traffic from 3 games at AIX constitutes significant portion of total Internet traffic across the exchange. Traces are collected into weekly bins, and the median and first and third quartiles for each bin are plotted. Note that these games were popular when we started collecting our data, but as the graph shows, their popularity has declined fairly steadily since July, 1999.

[Game traffic at AIX]

Figure 7b: Fraction of online game traffic, including Half Life, Quake II, QuakeWorld, Quake 3: Arena, Starcraft, and Unreal. Traces are collected into weekly bins, and the median and first and third quartiles for each bin are plotted.

[Fraction of online game traffic]

Identifying signatures of certain forms of security infractions is also relatively straight forward in traffic profiling. Denial of service (DOS) attacks, e.g., smurf attacks, and similar packet storms can be identified by monitoring peaks in ICMP or UDP traffic on specific links.

Cataloguing attack profiles from single hosts or from hundreds of intermediary hosts, as in the distributed/hierarchical DOS attack tool known as trinoo, is important to the development of ways to identify and protect against threats. Industry is making some progress in this area, but less attention toward counter-measures, e.g., specific (potentially real-time) filters to install on routers to forestall a catalogued set of profiles.

Acquiring data to support these tasks requires significant instrumentation at the edges of networks to observe bi-directional effects of specific attacks. SDSC's prominence as a target for hackers (un)fortunately provides opportunity for gathering data both on existing threats and new intrusion detection techniques as they develop.

Figure 8: Monitoring of specific protocols, such as ICMP traffic, provides indications of potential suspicious activity as it is occurring, (e.g., note the peak in ICMP traffic).

[ICMP Peak]

Internet Data gathered through existing CAIDA/SDSC initiatives currently amounts to over a terabyte. Continuation of this collection, processing (data reduction) and archiving will require additional storage capabilities and labor devoted to data management and dissemination. Under this project, we have budgeted for an additional 18 -18.2-Gbyte disks for a high performance RAID disk array system. In addition, we will use SDSC's High Performance Storage System (HPSS) for backup of all data. As in other CAIDA initiatives, data are encrypted at the collection points and aggregated at SDSC.

c) Interactive Visualization and Navigation Tools (Task 3)

One of the most intriguing sources of Internet data today is core inter-domain Border Gateway Protocol (BGP) routing tables from backbone transit providers. The University of Oregon's Route Views project [Meyer 97] (see http://www.antc.uoregon.edu/route-views/). provides information about the global routing system from several backbone perspectives around the Internet. While initially intended to help operators assess how the global routing system views their address prefixes and/or Autonomous System (AS) space, many researchers have found it a rich source for routing analysis and visualizations [HWB 97] [Broido 00] [HFMNC 00] [HCN 99] [HCN 99] [CAIDA 00].

A core technology necessary for most of the objectives in this project will be mechanisms to manipulate, analyze, and navigate large BGP routing tables, and compare them not only to one another but to topology maps as derived from active probe measurements. A routing table is a complex, semantically rich graph, and as such, simple graph-drawing tools are insufficient. CAIDA has made modifications to its visualization tool Otter (see https://www.caida.org/tools/visualization/otter/) in order to accommodate the special semantics of routing tables, which open the tool up to fundamentally different and acutely relevant classes of Internet visualization.

At its simplest, a graphical interface to a routing table that is intuitive and efficient could facilitate the ability to identify and remedy problems quickly. Though more complicated, the ability to compare routing tables not only to one another but with actively probed topologies, which are often not the same, as well as with measured performance data across a given topology, has high potential payoff in the Internet research as well as operational community.

To briefly describe, representing a routing table topologically requires more than simple nodes and links. A BGP routing table holds routing information in the form of AS paths, or a sequence of autonomous systems (ISPs) transited from the source to destination node. (Although it is an oversimplified definition, it will suffice for this discussion to think of an `Autonomous System' (AS) as roughly a single administratively independent Internet service provider.) A link between two ASes exists if those two ASes have established a peering session between them, the mechanism via which they announce to eachother a list of routes to which they can carry traffic.

An elementary approach would be to split AS path information into node pairs (2 nodes + 1 link) to display each AS adjacency. Unfortunately, the resulting set of of raw individual AS adjacencies does not reflect anything particularly relevant about operational Internet connectivity, since, in a global Internet routing context (our domain of interest), AS adjacencies have no meaning without regard for the semantics of the peering session represented by that `link'. Indeed, depicting pure AS connectivity on a per-hop basis is misleading, since each link represents only a valid connection for the specific set of routes announced from one AS to the other, and only in one direction. Peering is not semantically symmetric nor an automatically transitive relationship.

It is thus of questionable utility to visualize any one given link between ASes without incorporating the information on the routes exchanged across the session represented by that link. Because routing policy might allow ASes to pass subsets of routes they receive on to other neighboring ASes, any legitimate visualization must incorporate an entire end-to-end `route-ful' AS path as the atomic `unit' of visualization, rather than an individual link between two given ASes. Specifically, we need to display each AS path as an ordered, directed, route-ful set of individual AS adjacencies. We added a new type of object to Otter's data file format for this purpose: an entire path object, which allows more accurate representation of BGP data. The figures below illustrate the difference: on the left Otter is drawing only adjacencies, on the right Otter draws complete AS paths atomically. In the interactive Java applet, resting the mouse on a single link in the figure on the right highlights the entire path of which that link (e.g, peering session) is a part.

Figure 9a: BGP AS path data, reflecting only adjacencies (piece of CERFNET routing table from May 1998)	Figure 9b: BGP AS path data, reflecting complete paths (piece of CERFNET routing table from May 1998)

While Otter's original specification could handle data sets with around 200-700 nodes, several of our tasks, including the routing tables described above, involve tens of thousands of nodes. The figure below illustrates an early and not overwhelmingly successful attempt to view a data set of this size, specifically derived from running Skitter. from a single host to 30,000 destination addresses. Otter's default placement algorithm is unable to provide a meaningful visualization; many nodes end up clumped together obscuring the view. Fortunately, Otter supports minimizing the node size to zero, i.e, just drawing links, a technique we use in figure 11 to reduce clutter for another large graph layout.

Figure 10: Otter visualizing skitter-gathered topology (RTT) data (approximately 30,000 nodes)
[skitter-gathered topology]

Most automatic graph layout algorithms are insufficient to display such large networks interactively. One solution is precomputation of the graph layout, leaving an input file of (X,Y) layout coordinates so that Otter could place nodes directly without the CPU-intensive graph layout. Figure 11 reflects Otter's use of such a technique, with a graph layout algorithm implemented by Hal Burch, a graduate student at Carnegie Mellon University, during a summer internship at Lucent/Bell Laboratories [Ches99]. The placement took 24 hours to pre-compute and about a minute for Otter to draw the approximately 35,000 nodes.

Figure 11: Skitter using precomputed graph layout
[Using precomputed graph layout]

Improved layout algorithms and alternative visualization techniques will be a seconary research goal of this project. We also plan to modify otter or other visualization tools to allow

. manipulation of parameters such as thickness and color of links
. real-time ability to connect to a routing table, and show where forward paths diverge from announced reverse paths.
. each link to pop up information regarding which routes were sent across a given link (peering session)

Tools for effective analysis of large topologies can also help with analyzing two types of incongruity in the Internet routing system that remain largely unexplored. The first is the disparity between the topology suggested by the Autonomous System (AS) BGP paths active at a given routing table, and the actual forward IP path taken by a packet leaving that router. Since BGP semantics are executed and transmitted independently at each hop with each peer, such incongruity is inevitable. AS 1 is telling AS 2 about next-hop information for prefix p according to what AS 2 heard from AS 3. The tricky part is that the `next-hop' is not necessarily AS 2. This complexity is executed/transmitted at every AS boundary (of which there can be a dozen along a single path).

CAIDA's continuing collection and active analysis of large skitter topology graphs, often with routing tables co-located with the source, puts us in a unique position to taxonomize the different classes of incongruity between forward and announced paths.

As an example, one class of incongruity derives from the use of exchange points. Note that an AS may advertise routes to an exchange point, but not actually be involved in forwarding the traffic. In this case, the exchange point's AS will not be in the AS path, though it will occur in the forward path. Appendix A surveys a few examples of forward versus announced AS path incongruity. Such investigations can also elucidate anomalies such as cases of unintended transit. Deriving a list of transit ASes from a routing table, one can use skitter data to identify paths that are using for transit an AS that is only announced as an origin AS according to the routing table.

A second type of topological incongruity is between unicast and multicast routing systems. Multicast routing deployment evolved rather independently from the existing unicast system, and with originally completely different routing models. Although evolution has brought policy functionality to the multicast infrastructure, it is still possible, and typical, for the multicast topologies to be fundamentally different from the unicast topology. This problem will become increasingly relevant given anticipated growth in streaming media and associated expanded deployment of multicast technologies. We expect results of exploratory data analysis tools to allow development of recommendations for measurement, analysis, and mitigation of problems generated by routing incongruities between these types of traffic.

d) Developing model of Internet core (Task 4)

Policy-based topology modeling

To our knowledge, CAIDA is undertaking the most comprehensive Internet public topology analysis effort in the world. Further analysis of multiple datasets will augment CAIDA's current techniques to identify and visualize critical components of the Internet's core and the relative importance of specific public exchange points and private peering points to reachability of select groups of ASes and networks.

A model of the Internet core can provide framework for parameterizing routing policy relationships among ISPs in the Internet today. We seek a model with just enough flexibility to express actual routing policies as implemented by ISPs in the Internet, and no other unnecessary components. Understanding routing policy is key to understanding the structure of the Internet, as routing policy is the vehicle that supports the economic model of the modern commercial Internet.

We propose a network model that extends the model initially proposed by Griffin and Wilfong [GW 00] and extended by Gao and Rexford in [GR 00]. They model the Internet as a directed graph with two types of edges. Nodes in this graph are labelled with (AS number, peering point prefix) pairs, allowing multiple connections between the same two ASes to be included in the analysis. The two types of edges of this graph are distinguished by whether they connect nodes within the same AS, or cross peering points to other ASes. Intra-AS links carry a simple scalar weight value, while inter-AS links need more complex weights, which may depend on the destination prefix of any traffic crossing the link.

In order to apply this model, each of the parameters must be extracted from measurements of the Internet itself. There are three basic types of information required to flesh out the model: AS adjacencies, which can be extracted directly from BGP routing tables; peering point prefixes, which can be extracted from Skitter traces or equivalent traceroute data; and routing policy estimates which may be obtainable from routing tables and Skitter data as well.

AS adjacencies in BGP AS paths are a record of the peering relationships between pairs of ISPs. Others have studied AS adjacencies directly available from BGP routing tables, both with and without regard to per-prefix policy. However, an effective routing policy study cannont treat ASes as point-like objects. We have found several cases in which ASes do not exhibit the same per-prefix policy at all adjacencies, and so it is necessary to know which peering point the traffic crosses to know what policy specification applies. Previous work at CAIDA has concentrated on the topology of the Internet at the scale of ASes, but did not preserve enough per-prefix information to study the policy overlay on this topology. The model described above has the necessary structure to adequately describe these policy relationships.

Determining the prefixes used for peering between ASes is the primary goal of this project. This includes a list of public and private peering points identified by address prefix as well as a list of ISPs that are present at each of these locations, and which of these ISPs have an observed BGP peering session. This will allow us to develop a more accurate picture of the potential connectivity among ISPs independent of policy restrictions, and give us a more concrete picture of the effects of routing policy on paths through the network.

This list of peering points will have many uses over and above the larger routing policy study described above. Peering points have the distinct feature that they are physically localizable, unlike ASes. Where a transit AS in the Internet core is distributed over a continental or global region, a peering point is typically a single physical location where several routers are co-located. IP addresses inside these peering points form a small subset of all IP addresses that can be used for unambiguously mapping a path through the Internet to a series of geographical locations. Hence, the location of each peering point can be the basis for another method for mapping network devices and paths to geographical locations with a high degree of accuracy.

The next step in filling out the model is to characterize the `peering' relationships between ISPs, for example as `transit' or `true peering', or simply `provider-customer'. It may be possible to make a simple estimation of this relationship from skitter data alone, although the exact methodology needs much additional developmental work.

Identifying peering points

There are several mechanisms for determining a set of exchange points for an Internet map beyond a list of known exchange point prefixes, e.g., as maintained by EP.NET: http://www.ep.net/. One can augment this list using prefixes from skitter data, assuming an IP address corresponds to an exchange point if it connects two other ASes that were different from its own. Using skitter topology in conjunction with RouteViews type data, one can map each IP address to an origin AS. The ASes we know are present at a given exchange point are the two other ASes connected by the exchange point IP address. One can also track the skitter target address of those paths, identifying the prefix used for the traffic. This method may yield many `spurious' exchange points due to flaws in the IP address-to-AS mapping technique, so methods are necessary for filtering out the hops that are not authentic.

Determining peering relationships

The next step in filling out the model is to characterize the `peering' relationships between ISPs, for example as `transit' or `true peering', or simply `provider-customer'. It may be possible to make a simple estimation of this relationship from skitter data alone, although the methodology needs development. Typically, it is infeasible to probe for policy, so we guess, assuming three types of peering:

Full route exchange (true peering), which allows the edge to be traversed towards any destination. Not valid as a general assumption.
Advertising `my client' prefixes only. Estimate clients by looking at the tail of an AS path If the destination AS is the origin AS of core route for that prefix, or if it is next to last, or if it is 3rd from last, then we assume the prefix is paying for transit, and the link is open for transit to that destination. (Stub ASes only appear as origin ASes or as an upstream AS to stub ASes that are only connected through that AS. We use this fact to fine-tune client lists.)
Advertising a subset of `my client' prefixes. This is `regional peering'. We assume that the shortest path chosen by IGP will minimize differences for this case, so we can use the full client prefix list.

For simulation, the choice of the next peering point made among peering points with next AS presence (from AS path) should be made by choosing the lowest metric, which will approximate shortest-path IGP, mod policy constraints. However, the movement toward MPLS and other traffic engineering techniques will invalidate assumption that all IGPs are exclusively shortest-path, so using hop count between peering points in skitter topologies would be insufficient.

Identifying Internet `core'

One of the most enticing questions with a large set of topology cross-sections is how to identify and depict the `core' of the Internet, or particularly ultra-connected components that might represent infrastructural vulnerabilities if damaged or removed. Recently, CAIDA researchers have attempted to strip away lesser connected autonomous systems (or `ASes') in order to find out how Internet connectivity is distributed among ISPs. CAIDA developed an AS-based Internet core snapshot in early 2000 (see https://www.caida.org/research/topology/as_core_network/.

Figure 12: AS Internet `AS core' graph
[AS core graph]

We describe various ways in which one can define the most well-connected part of the Internet. The set of nodes we consider most relevant to this question are those that

can reach the larger part of the Internet core, and
can do it in the minimum number of hops.

In more technical terms, we are going to find centers, with regard to different metrics, of the giant connected component within the core of Internet graphs.

Background from graph theory

Definitions:

a) The combinatorial core of a directed graph is its subset obtained by iterative stripping of nodes that have outdegree 0 and of 2-loops involving nodes that have no other outgoing edges except those connecting them to each other. Note that the indegree of either type of stripped node may be arbitrarily large.

In examples discussed below, the core contains between 7% and 12% of the nodes of the original non-stripped graph. In typical skitter snapshots we have studied, the combinatorial core contains approximately 10% of all nodes in the snapshots.

The iterative stripping procedure may be selectively applied only to the nodes of outdegree 0 that have indegree 1. In that case, the stubs of the graph, i.e., trees connected to the rest of the graph by a single link, are stripped. We call this superset of the core the extended core, and in our skitter topology snapshots the extended core typically contains 2/3 of the nodes of the original graph, a much greater fraction than in the positive outdegree part (our combinatorial core).

b) We call a node B reachable from a node A if there is a directed path from A to B, i.e. a path in which each directed edge is taken toward B in the proper direction.

c) A strongly connected component of a graph is a set of nodes such that each node pair (A,B) has a path from A to B and a path from B to A.

If A is reachable from B and B is reachable from A, then any node reachable from A is reachable from B and vice-versa.
Conversely, if the set of nodes reachable from A is the same as the set of nodes reachable from B, then A is reachable from B and B is reachable from A, since we always assume that A is reachable from A. This property means that a connected component is uniquely defined by the set of nodes that are reachable from nodes of this component. Note that different connected components will reach different, although not necessarily disjoint, sets of nodes.

d) A connected component is called giant component if it contains more than 50% of the nodes in the graph.

In the cases of skitter data described below, the giant component and the combinatorial core are very close to each other - in all our examples the giant component contains over 85% of the core. Furthermore, almost all of the core is reachable from the giant component, typically near 99%.

Shortest paths on Internet topology graphs suffer from potential non-conformance with routing policies. We can view them only as shortcuts or lower bounds for paths actually taken by packets. Although shortest-path physical connectivity is there, policy prevents its use. (Skitter data suggests that Internet paths actually traversed by packets were often twice as long as the shortest possible paths.)

Quantifying well-connectedness

The reasoning above shows that it makes sense to consider the giant connected component as the core. We will therefore look for the central portion of the Internet within the giant component.

Three metrics may be used to distinguish central vertices:

the minimum number of hops by which a given node can reach the part of the core reachable by nodes in the giant component; (for simplicity, we will identify this part of the core with the whole core)
a percentile of the whole core, e.g. the number of hops sufficient for a given node to reach 95% of the core;
the average number of hops in which a given node can reach every node of the core

We have investigated topology data on three structural levels: IP addresses; IP addres prefixes; and AS numbers. Each level of detail comes with its own advantages, limitations, and stumbling blocks. Preliminary work is further documented at http://ipn.caida.org/~broido/dest/top50.html.

We have been astounded to learn that, despite the perceived heterogeneity of Internet topology, and in the absense of any authority that shapes it, there are a number of universal invariants that apply to connectivity properties of almost every node in the core. One of them is the neighborhood size distribution, i.e. the curve showing how many nodes are located at each hop distance from a given node. As Figure 12 illustrates for a selected sample, this curve has a remarkable bell shape that (shifts and minor dilations notwithstanding) is almost the same for any node in the giant component.

Figure 12: Neighborhood size distribution (showing how many nodes are located at each hop distance from a given node, for a smple of several hundred nodes in core)
[Neighborhood size distribution]

Our `top 50' analysis seeks to identify the most highly connected units of Internet infrastructure, measured by the three units listed above (IP addresses, prefixes, and AS numbers). We use skitter-derived BGP prefix and AS graphs, as obtained by converting skitter forward paths (and not e.g. BGP AS paths) to BGP prefixes and/or AS numbers and running our stripping and transitive closure code on the graphs obtained that way. We used the RouteViews tables as a source of prefixes and for conversion of prefixes to origin AS numbers, but not for AS paths in those tables themselves. Instead, we create our own prefix and AS paths out of forward path information.

The next step is to sort reachability of nodes by the maximum number of hops it takes for a given IP or prefix to reach as much of the the core as the giant component's nodes can reach. The second sorting key is the average of the hop count shortest path distances between the given node and the rest of the core.

To calibrate, we try two different approaches to selecting central nodes of the IP graph, using data from Jul 15-18 00: (1) minimizing the maximum hop; and (2) minimizing the 95% percentile hop. They result in two sets of 50 IP addresses with 37 common IPs, which is close to 60% of the total of 63 IPs in both sets. (Each set has 13 IP addresses that are not in the other set.)

To continue and expand the core and infrastructural vulnerability analysis, we will need to expand the number of skitter sources, probed destinations, and analysis utilities. We will also develop and make available basic utilities to extract for a given set of routing topologies, answers to such as:

identify for a given AS number or prefix p, what ASes/prefixes are reached only through that AS or prefix?
for a given prefix, how many unique AS paths are potentially used across a set of routing tables
what is the frequency of AS path changes for specific destinations over day/week/month.
what routes are announced by a given AS on a given day? (plot historical data)
what is the dispersion across ASes?

The development of datasets, techniques, and tools to support macroscopic Internet topology analysis will provide currently lacking support for empirically-based modeling of the Internet.

D. Deliverables

a. Monitoring

measurement infrastructure for gathering topology, workload, multicast, and performance data
conversion of 20 NLANR passive monitoring boxes to CoralReef data
integration with other national measurement infrastructures as possible

b. Archiving/Serving data

General Note : It is CAIDA's policy to make Internet data available whenever possible to other researchers for use in their projects. For some data we utilize an Acceptable Use Policy intended to protect the privacy of networks and their customers and encourage publication of relevant research findings.

website(s) for public availability of multi-dimensional data sets: mantra, skitter, CoralReef data
interactive interface to database of a wide variety of data sets, with formatt, type, and size of data sets in response to NMS community feedback
publicly availabe code prototypes
published analyses of archived data, including interactive access to daily summaries
published survey of community use of data

c. Navigational/Visualization Technologies

prototype hyperbolic viewer that can handle to 100,000 nodes
2D graph layout viewer that can handle up to 50,000 nodes, with semantically accurate navigational techniques and visual metaphors appropriate for routing table analysis
integration of visualization tools with routing tables in real time
integration of visualization tools with other types of data, e.g., SNMP, performance, and passive measurement data.

d. Internet core model

prototype model of peering points
empirically-calibrated tool for analyzing and visualizing peering relationships
methodology for identifying `core' Internet nodes, prefixes, Autonomous Systems, or geographic regions
analysis of Internet core using continuously gathered skitter data, including characterization of how it's changing over time.

E. Statement of Work

a. Monitoring Task

Year 1:

add 5 additional Mantra monitoring sites
add 5 additional skitter source sites
convert 20 NLANR passive monitoring boxes to CoralReef data
develop comprehensive website(s) for public availability of data

Year 2:

make prototypes of code publicly available
make datasets publicly available for Mantra, skitter, CoralReef data

Year 3:

make additional datasets publicly available for Mantra, skitter, CoralReef workload data
develop public website with for distribution of multi-dimensional data sets (performance, workload, topology, and routing data)

b. Data Archiving and Availability

Year 1:

establish archive and interactive database for community access to skitter, mantra, routing, and corealreef data
solicit community feedback regarding needed data types, formats, and dataset sizes

Year 2:

incorporate wider variety of data sets into public database.
publish sample analysis of archived data, including interactive access to daily summaries.
work with community to determine most appropriate dataset formats for use in Internet simulation and modeling.

Year 3:

expand/refine formatting and size/type of data in response to community feedback
further expansion of database in response to community need for access to wider variety of data.

c. Interactive Visualization and Navigation Tools

Year 1:

develop prototype hyperbolic viewer that can handle to 100,000 nodes
enhance otter to handle 2D automatic graph layout of up to 50,000 nodes

Year 2:

develop and implement more semantically accurate navigational techniques and visual metaphors appropriate for routing table analysis

Year 3:

integrate visualization tools with routing tables in real time, visualize differences
integrate visualization tools with other types of data, e.g., SNMP, performance, and passive measurement data.

d. Internet core model

Year 1:

develop prototype model of peering points

Year 2:

tools for analyzing and visualizing peering relationships
methodology for identifying 'core' Internet nodes, prefixes, Autonomous Systems, or geographic regions

Year 3:

complete analysis of Internet core using continuously gathered skitter data, including characterization of how characteristics are changing over time

G. Technology Transfer

CAIDA's goal is to supplement the tools and data available for monitoring, depicting, and simulating the behavior of the commercial Internet. The outcomes of this project are focused on advancing the overall state of Internet measurement, modeling and simulation. Products of these efforts will directly benefit the various Defense Department agencies through their use in developing and evaluating new Internet protocols (e.g., protocols that are more resilient, secure, and scalable) and enhanced network planning, management, and control capabilities.

CAIDA has already successfull transferred several of its technologies to the commercial sector, throught licensing agreements to Caimis, Inc. (http://www.caimis.com). CAIDA will work with Caimis on tool development, and analysis, and incorporation of commercial Internet data into the research archives (where ISPs are willing).

The tools and knowledge that we hope to transfer to the community through this project are described below and include:

tools to measure and monitor network behavior and topology;
rich datasets focusing on performance, topology, routing, and workload data;
tools for navigating and analyzing the above data sets
techniques for developing realistic topological models of the Internet, including peering points and `critical' infrastructure

Tools

CAIDA will use several existing measurement tools and sources of data as part of this project, and work closely with Caimis (www.caimis.com/) on enhancement of their tools for use in research and analysis projects.

Skitter - the skitter tool https://www.caida.org/tools/measurement/skitter for active scanning of path-specific information at varying granularities, e.g., large datasets containing tens of thousands of destinations for infrastructure topology analyses and smaller datasets for higher frequency measurements (several packets per second) for path-specific routing behavior and performance analysis. Anomalous behavior will be further examined using active measurement techniques including traceroute and other simsilar custom tools. For Task 3, we will be using skitter data extensively to identify peering points along the paths. Historical data is useful as well as current data since the presence of routers in physical peering points is likely to only change slowly over time. AS adjacencies from BGP routing tables
CoralReef - Passively measured traffic data and traffic traces are available from CAIDA https://www.caida.org/tools/measurement/coralreef and from the NLANR MOAT project http://moat.nlanr.net/Traces/), including workload data such as that collected at NASA's Ames Internet Exchange https://anala.caida.org/AIX. A developers' mail-list is proving instrumental in engaging individuals throughout the globe in the development and enhancement of the code, including having participating institutions share upgrades and patches for this public code with others in the community.
otter -- https://www.caida.org/tools/visualization/otter/, supported by CAIDA for several years and used for visualizing a variety of different Internet data sets. (See https://www.caida.org/research/topology/as_core_network/ for recent example. Will also be publicly available.
Mantra - the Mantra service https://www.caida.org/tools/measurement/mantra/ is available for monitoring multicast routers at select points throughout It is already a widely used multicast utility today, and extending its measurement and diagnosis capabilities promises to provide an important monitoring function supporting multicast deployment and use throughout the community. Participation in public Mantra monitoring activities will be open, including participation by interested DOD sites that possess non-sensitive multicast routers.
hypnet -- a 3Djava hyperbolic viewer currently under development at CAIDA, for navigating extremely large network topologies.

Datasets

Currently, the CAIDA and NLANR MOAT's programs have more than a terabye of archived traffic and trace data from commercial and research networks. Most of these Internet datasets, including skitter and some Coral and cflowd datasets (with sensitive IP addresses anonymized), are available in processed form to researchers. As part of this project, we propose to continue making individual datasets available to researchers, tracking researchers' use of these datasets. More importantly, we will also make some correlated datasets available to researchers (correlating topology, performance, routing or other data).

We will correlate traffic data against information available in various commercial routing tables collected by CAIDA and by the University of Oregon's RouteViews project (http://www.antc.uoregon.edu/route-views/). This site is today's most important source of available routing table data for the research sector and includes peering information from 18 major service providers. We use the tables extensively to map IP addresses to ASes using origin AS lookups.

We will also develop various analysis scripts and visualization code throughout this project. The massive volumes of data to be analyzed, correlated, and rendered as part of this project will require the development of new techniques for data aggregation and reduction and will also require the development and use of parallel processing algorithms to make efficient use of available supercomputers at SDSC. In addition to developing tools to try to visualize routing problems, either in real-time or as post-mortem, we also expect to arrive at reasonable suggestions to offer router vendors with respect to what instrumentation within the router would improve an engineer's ability to debug such problems.

Impact on the Advancement in Information Technology and Related Sciences

The research and tools proposed under this effort can lead to important enhancements in the community's understanding of the Internet infrastructure. The proposed project will also assist in the development of tools for navigation, analysis, and correlated visualization of routing tables and massive volumes of traffic data and path-specific performance and routing data that are critical to advancing both research and operational efforts regarding the evolving commercial Internet.

We also expect to be able to offer suggestions to routing vendors with respect to what instrumentation within the router would facilitate diagnosing and fixing problems in [closer to] real-time.

This research has obvious relevance to public policy and regulatory questions regarding concentration of administration of Internet infrastructure.

In addition to the research results anticipated through this project, CAIDA plans to develop routing-related curriculum materials for undergraduate and graduate use as part of CAIDA's NSF-supported Internet Engineering Curriculum (IEC) repository and Internet Teaching Labs (http://iec.caida.org/).

H. Comparison with Other Ongoing Research

Numerous research and commercial groups are building simulators to analyze and understand traffic behavior at the physical and protocol level. Some of these products are capable of importing tcpdump data, SNMP utilization data, or measurements from HP Openview or other network measurement software. However, none of these efforts are currently equipped to deal with actual data from BGP routing tables or forward path measurement data or global multicast traffic. Nevertheless, these data types reflect commercial realities and should be considered in developing baselines for much of today's WAN-related modeling and simulation efforts.

CAIDA personnel are familiar with several research simulators (e.g., VINT/ns http://www.openmash.org and http://netweb.usc.edu/vint/) and commercial products such as OPnet (http://www.opnet.com/) and MakeSys' NetMaker (http://www.makesys.com/).

What the community currently needs, and what CAIDA is in an optimal position to provide, is tools for measurement and access to increasingly complex, multi-dimensional datasets (topology, performance, workload, and routing).

In terms of measurement of Internet traffic, several groups are active in this arena: the University of Massachusetts and ACIRI in multicast; CMU/ACIRI in NIMI; Internet-2 (Surveyor), XIWT/IPERF, SLAC, RIPE, and NLANR/MOAT in active measurement of specific infrastructures, and CAIDA and NLANR in the passive measurement of high performance links. Several organizations host sites to view various core routing tables (particularly the University of Oregon's RouteViews), and/or collect routing table data, e.g., University of Michigan and UCSB/CAIDA. Few groups however actively correlate Internet datasets and none are making multi-dimensional datasets availabe in forms useful for simulations of WAN networking as is proposed in this project.

Advantages of the Proposed Effort

There is a growing recognition within academic, vendor, and network providers communities of the importance of effective Internet monitoring and simulation tools to architecting and managing the Internet, as well as the development of new protocols and Internet hardware and software. Existing public and commercial offerings in this area however, fail to address critical routing, traffic, and vendor interoperability issues at scales of the global Internet. Accurate insights concerning generalizable traffic require data from multiple monitoring sites, times, and event occurrences. CAIDA's monitoring efforts and those of collaborators help position us to both collect and correlate relevant topology, performance, workload and routing data in order to develop traffic- and protocol-specific case studies.

Data sets such as these can be publicly available for use as inputs to various models and simulations or as scenarios available to vendors (in developing new products), researchers (in developing new protocols/technologies), and faculty (in constructing tutorials and lab exercises for students). An example of support for faculty is CAIDA's Traffic Analysis CD (http://traffic.caida.org/) that includes: packet traces, analysis software, tutorials, animations, and reference materials).

Our relationships with vendors (such as Cisco and MIL3 - both CAIDA members), ISPs (both members and collaborators), and the research community (e.g., UC Berkeley and ISI) will directly assist us in focusing our tasks and transferring the findings and products to industry and to other segments of the research sector.

Through the approaches described in this proposal and with our education and outreach efforts, we are confident that we can achieve the goals of this project and assist in encouraging wide-spread adoption of new approaches to monitoring and simulating Internet traffic.

I. Key Personnel

K.C. Claffy

Dr. K.C. Claffy is an Associate Research Scientist at the University of California's San Diego Supercomputer Center (UCSD/SDSC). She received her BS with honors from Stanford University and her MS and Ph.D. from the University of California, San Diego in 1989, 1991, and 1994 respectively.

Dr. Claffy's recent efforts have been as the founder and Principal Investigator for the Cooperative Association for Internet Data Analysis (CAIDA), based at the University of California's San Diego Supercomputer Center. Her research focuses on collection, analysis and visualization of wide-area Internet data on topology, workload, performance, and routing particularly with respect to commercial ISP collaboration /cooperation and sharing of analysis resources. Through the development of neutral, open tools and methodologies, she seeks to promote cooperation among Internet Service Providers (ISPs) in face of commercialization and competition, and to meet engineering and traffic analysis requirements of the commercial Internet community.

In addition to her research efforts, Dr. Claffy is committed to information dissemination and technology transfer. Her involvement in the NANOG and USENIX communities are indicative of her commitment to bringing the gap between the research and commercial networking community and to focus attention on the importance of empirical Internet measurements. In January 2000, Claffy was selected as one of the this year's Top 25 Women on the Web, see: http://www.top25.org/. Claffy's article depicting the state of Internet measurement and analysis was profiled in the IEEE's Millenium Issue, http://www.computer.org/internet/v4n1/, along with articles by other Internet notables, including Len Kleinrock, Bob Metcalf, and Bill Gates.

Dr. Claffy's current and pending support with allocated time is as follows:

DARPA, PI "Predictability and Security of High Performance Networks" 35% (07-15-98 thru 07-14-00, option year thru 07-14-01) ($2.5M obligated; $6.6M ceiling)
NSF, PI "CAIDA Seed Grant" 50% (09-01-97 thru 08-30-00) ($3.1M)
NSF, PI "Internet Atlas" 10% (03-15-99 thru 09-30-01) ($450K)
NSF, PI "Internet Engineering Curriculum Repository" PI - no time charged; Evi Nemeth co-PI (07-15-97 thru 09-30-00) ($680K)

Other key personnel include David Moore, Acting Director for CAIDA. Mr. Moore graduated from the University of California's San Diego Supercomputer Center (UCSD/SDSC) with a Bachelor's in Computer Science and Engineering. He has been involved in systems and networking research with UCSD for 6 years.

J. Description of the Facilities

CAIDA is a program of UCSD's San Diego Supercomputer Center (SDSC) where a range of equipment and laboratory resources are available to aid in obtaining the outlined deliverables. These include supercomputers, archival storage systems (100s of terabyte), data handling platforms, and advanced visualization systems. Major hardware resources currently at SDSC include a 14-processor CRAY T90, a 256-processor CRAY T3E, a 128-node IBM SP, over two (2) terabyte of disk storage and 100 terabyte of tape storage. The archival storage system sustains over a terabyte of data movement per day, with peak access rates over 2000 transactions per hour. SDSC's resources were augmented in December 1999 with the addition of a novel multithreaded supercomputer, IBM's Tera MTA. SDSC's Visualization Laboratory will be available to assist with visualizing the traffic datasets. Visualization resources include Silicone Graphics MXE workstations, an audio/video production suite, and various high-end visualization and rendering software.

Connectivity at UCSD/SDSC includes: vBNS (OC12), Abilene (OC12), CALREN2 (OC12), DREN (OC3), ESNET (OC3 through GA), Cerfnet (OC3), and various commercial providers at SDSC/CAIDA's SD-NAP facility (https://www.caida.org/Caida/sdnap.html).

K. Experimentation and Integration Plans

Funding for attending annual DARPA events and meeting with collaborators is included in the budget.

Details on our technical approach, including experimentation plans, are covered in section II.C and in our scope of work in section II.E.

L. Cost by task -

Personnel

Current plans include for David Moore to allocate half his time to Task 1 (Monitoring), and 25% each to Task 2 and 3. Ken Keys will be split 50% into Tasks 1 and 2. Nevil Brownlee will devote 50% of his time to Tasks 1, 3, and 4. Bradley Huffaker will be dedicated to Task 3 (Visualization). Andre Broido will be fully dedicated to Task 4 (Modeling). Amy Blanchard will take on administrative project management responsilbities across all tasks. We may potentially subcontract some development work out to other organizations, as seems cost-effective.

Equipment is included only in year one for prototype monitors, processing hosts, and storage. We expect significant cost sharing from ISPs (including CAIDA members as well as others) and organizations that have already expressed interest in purchasing monitors and analysis.

Travel

Support is requested to cover travel to scientific conferences for principal investigators.

Consumable supplies

Photocopying, transparencies, equipment, postage, and express delivery charges directly related to the project.

Section III. Additional Information

A bibliography of relevant technical papers and research notes (published and unpublished) are provided below.

Bibliography

[AJB00], Albert, R., Jeong, H. Barabasi, A-L., "Error and attack tolerance of complex networks", Letters to Nature Magazine, 27 July 00, http://www.nature.com/cgi-taf/DynaPage.taf?file=/nature/journal/v406/n6794/abs/406378a0_fs.html.

[Almeroth 00], Almeroth, K., "The Evolution of Multicast: From the MBone to Inter-Domain Multicast to Internet2 Deployment", IEEE Network Special Issue on Multicasting, January/February 2000 http://imj.ucsb.edu/publications.html#mdeploy.

[Bolot 93], Bolot, J., " End-to-end packet delay and Loss Behavior in the Internet," Proceedings of SIGCOMM '93, pp. 289-298, September 1993.

[HWB 97] Braun, H.-W., "Autonomous system based visualizations of BGP routing tables", 1997, http://moat.nlanr.net/AS/

[Broido 00] Broido, A., `finding the `Internet Core': the top-most IP addresses, BGP prefixes and AS numbers', http://ipn.caida.org/~broido/dest/top50.html

[Card 99] Card, S., Mackinlay, J., and Shneiderman, B., Readings in Information Visualization: Using Vision to Think, Morgan Kaufmann, 1999.

[Claffy 00] claffy, k, "Tracking a metamorphic infrastructure: observations on our (in)ability to accurately predict, analyze or even measure conditions on the global Internet", https://www.caida.org/publications/presentations/Soa0005/.

[Claffy 00] claffy, k, "Measuring the Internet", IEEE Internet Computing, Millenium Forecasts, Jan/Feb 2000. http://www.computer.org/internet/ic2000/w1toc.htm

[Claffy 99] claffy, k, "Internet measurement and data analysis: topology, workload, performance and routing statistics", presented at the National Academy ofEngineering 1999 conference, April 1999 and submitted for special IEEE publication, https://www.caida.org/publications/papers/.

[Claffy et al 00 ] claffy, k, "Visualizing Internet Topology at a Macroscopic Scale", https://www.caida.org/research/topology/as_core_network/, June 2000.

[CM 99] claffy, k and McCreary, Sean, "Internet measurement and data analysis: passive and active measurement", Statistical Computing and Graphics, Volume.10 No.1, https://www.caida.org/publications/papers/.

[CMT 98] Claffy, k., Miller, G., and Thompson, K., "the nature of the beast: recent traffic measurements from an Internet backbone", Proceedings Inet '98, 1998.

[CMM 99] Claffy, K., Monk, T., McRobb, D," Internet tomography", Nature Magazine, Web Matters. http://helix.nature.com/webmatters/tomog/tomog.html

[CO '00] Coffman, K. G. and Odlyzko, A, M., `Internet growth: Is there a "Moore's Law" for data traffic?, July 2000. http://www.dtc.umn.edu/~odlyzko/doc/networks.html.

[DC00] Doyle, J., and Carlsson, J. M., ``Power laws, highly optimized tolerance and generalized source coding'', CalTech Dept of Control and Dynamical Systems, 26 April 99.

[CD00] , Carlsson, J. M., and Doyle, J., `Highly optimized tolerance: a mechanism for power laws in designed systems', Cal Tech Dept of Control and Dynamical Systems, Dept. Physics TR, 28 Feb 00.

[HFMNC 00], Huffaker, B., Fomenkov, M., Moore, D., Nemeth, E., and Claffy, K., Measurements of the Internet topology in the Asia-Pacific Region, Proceedings of Inet '2000, https://www.caida.org/publications/papers/.

[HCN 99] Huffaker, B., claffy, k, Nemeth, E "Tools to Visualize the Internet Multicast Backbone", INET '99, June 1999, see https://www.caida.org/publications/papers/.

[HCN 99] Huffaker, B., claffy, k, Nemeth, E "Otter: A general-purpose network visualization tool", INET '99, June 1999, https://www.caida.org/publications/papers/.

[HJWC 98], Huffaker, B., Jung, J., Wessels, D., Claffy, K., "Visualization of the Growth and Topology of the NLANR Caching Hierarchy", 2nd Annual Caching Workshop, and special issue of Computer Networks and ISDN System, 1998. https://www.caida.org/publications/papers/

[MCB 98] McRobb, D., claffy, k, Bachman, N., "skitter: macroscopic Internet topology mapping and visualization", XIWT workshop on Critical Infrastructure: The Path Ahead Symposium on Cross-Industry Activities for Information Infrastructure Robustness, May 1998.

[CM 98] Claffy K. and Monk, T., Comments by CAIDA Concerning the FCC's Review of the Acquisition of MCI Communications Corp. by Worldcom, Inc. April 27, 1998.

[CMM 98], Claffy, K, Monk, T., and McRobb, D., "Preliminary Measurement Specification for Internet Routers", unpublished draft, https://www.caida.org/tools/measurement/measurementspec/.

[GR 00] Gao, Lixin and Rexford, J., "Stable Internet routing without global coordination," ACM SIGMETRICS, June 2000.

[GW 00] Griffin, T., and Wilfong, G. ``An Analysis of BGP Convergence Properties.'' Proceedings of the ACM Conference of the Special Interest Group on Data Communication, September 1999 (SIGCOMM'99). http://www.acm.org/sigcomm/sigcomm99/papers/session8-1.html

[CAIDA 00], Huffaker, B., Broido, A., Fomenkova, M., Claffy, K,, Jacubiec, O., `Visualizing Internet Topology at a Macroscopic Scale', unpublished web page, July 2000, https://www.caida.org/research/topology/as_core_network/.

[Jamin 00] Jamin, S. et al., "On the placement of Internet infrastructure", Proceedings of Infocom '00, 2000. http://idmaps.eecs.umich.edu/

[Jamin 00], Jamin, S. et al., "An An Architecture for a Global Internet Host Distance Estimation Service", Proceedings of Infocom '99, 1999, http://idmaps.eecs.umich.edu/.

[LWVA 00], Labovitz, C., Wattenhofer, R., Venkatachary, S., Ahuja A., "The Impact of Internet Policy and Topology on Delayed Routing Convergence", Microsoft Researcg Technical Report (MSR-TR-2000-74) http://www.research.microsoft.com/users/labovit/.

[LWVA 00], Labovitz, C., Ahuja, A., Bose, A., Jahanian, F., "Delayed Internet Routing Convergence". Proceedings of ACM SIGCOMM 2000, http://www.research.microsoft.com/users/labovit/.

[Meyer 97] D. Meyer, University of Oregon Route Views project, http://www.antc.uoregon.edu/route-views/.

[MPDC 00], Moore, D., Periakaruppan, R., Donohoe, J., "Where in the world is netgeo.caida.org?", Proceedings of Inet '2000, https://www.caida.org/publications/papers/.

[MPDC 00], Nemeth, E., Ott, T., Thompson, K., Claffy, K., "The Internet Engineering Curriculum Repository", Proceedings of Inet '2000, https://www.caida.org/publications/papers/.

[PF99] Paxson, V., and Floyd, S., Why We Don't Know How To Simulate The Internet, under review, October 1999, http://www.aciri.org/floyd/papers/wsc.ps.

[Ware 00] Ware, C., Information Visualization: Perception for Design Morgan Kaufmann/Academic Press, 2000.

[Huston 00]AS 1221 -- Huston, G., BGP Table Statistics/Monitoring System. http://bgp.potaroo.net/

Appendices

A: Discrepancy of announced BGP AS path versus forward path

Discrepancy of announced BGP AS path versus forward path as measured by skitter boxes. (Using skitter source riesling.caida.org, On net 192.172.226 Checking the paths from riesling on 11/16: Most of the path differences seem to be insertion of an additional AS hop, typically 2914 (Verio) just after 1740 (CERF), but also 1877

##Target: 193.14.46.1
##BGP path:             195 1740 1800 1257
##Forwarding path:      195 1740 2914 1800 1877 1257

This is interesting though:

##Target: 196.38.47.1
##BGP path:             195 1740 7018 3741
##Forwarding path:      195 1740 701 3741

The path is advertised through ATT, but passes through UUNET

##Target: 216.72.47.1
##BGP path:             195 1740 4000
##Forwarding path:      195 1740 2914 4000 7984 4000

##Target: 195.187.47.1
##BGP path:             195 1740 701 1833 1299 8308
##Forwarding path:      195 1740 701 1833 3301 8308

##Target: 194.250.47.1
##BGP path:             195 1740 1239 5511 3215
##Forwarding path:      195 1740 1239 3215 5511 3215

##Target: 200.6.48.1
##BGP path:             195 1740 7018 3561 1916
##Forwarding path:      195 1740 7018 3561 2561

##Target: 203.16.48.1
##BGP path:             195 1740 6453 1221 9324
##Forwarding path:      195 1740 6453 1221 2764

##Target: 193.40.48.1
##BGP path:             195 1740 6453 2603 1741 3221
##Forwarding path:      195 1740 6453 2603 UNKN 3221

This last is probably just a 10/8 intermediate hop...

##Target: 216.72.48.1
##BGP path:             195 1740 4000
##Forwarding path:      195 1740 2914 4000 7984 4000

Skipping forwards a bit:

##Target: 207.24.67.1
##BGP path:             195 1740 1673 
{1324,12217,1333,1326,1695,1322,1335,5113,1334,1670,1677,1665,1325,
1669,1323,1327,1684,1667,1661,1672,1663,1331,1675,1321,1685,689,
1225,1332,1330,1674,11563,1671,1662}
##Forwarding path:      195 1740 2914 1673 1325

This seems perfectly OK, as 1325 is a member of the AS set

##Target: 198.97.67.1
##BGP path:             145 7170 132
##Forwarding path:      195 145 7170 132

Quite bizarre! For some reason this path went out through SDSC? 
Most likely the route changed during the day, but it currently
goes through the vBNS directly (11/18, ~23:00)

Verio AS insertion occurs from:

traceroute to 194.23.44.1
traceroute to 193.14.46.1
[...]
14      134.24.46.114   AS1740
15      192.157.69.21   AS2914
16      198.67.133.37   AS1800

traceroute to 195.70.45.1
traceroute to 194.211.46.1
traceroute to 194.55.47.1
[...]
9       134.24.29.37    AS1740
10      198.32.136.28   AS2914
11      134.222.228.18  AS286

traceroute to 203.111.46.1
traceroute to 194.235.47.1
[...]
14      134.24.46.169   AS1740
15      192.157.69.80   AS2914
16      204.59.136.193  AS4000

192.157.69.0/24 belongs to Sprint, and the whois record indicates it is used
for NAPs. Known NAP prefixes:

Sprint (NETBLK-SPRINT-NAP)      SPRINT-NAP   192.157.64.0 - 192.157.73.255
Sprint NAP Team (NET-ICMNET-6)  ICMNET-6     192.157.69.0

Exchange Point Blocks (NET-EP-)

Netname: NET-EP-1
Netblock: 198.32.0.0 - 198.32.255.255

Suspected NAP prefixes:

ENEA Data AB (NET-SWNET4)

Netname: SWNET4
Netnumber: 192.36.147.0

Cable & Wireless USA (NETBLK-CW-02-BLK)

Netname: CW-02-BLK
Netblock: 204.188.0.0 - 204.189.255.255
Maintainer: CWUS
(only 204.189.152.0/24)
from:
traceroute to 192.160.50.1
[...]
13      204.70.2.14     AS3561
14      204.189.152.170 AS2561
15      200.130.255.21  AS1916

Egyptian Universities Network (EUN) (ASN-FRCU-EUN)
Autonomous System Name: FRCU-EUN
   Autonomous System Number: 2561

Performance Systems International, Inc. (NETBLK-PSI-C)

   Netname: PSINET-C4
   Netblock: 204.4.0.0 - 204.7.255.0
   Maintainer: PSI
(only 204.6.118.0/24)
from:
traceroute to 199.67.51.1
[...]
7       134.24.29.74    AS1740
8       204.6.118.137   AS8656
9       38.1.3.45       AS174

aut-num:     AS8656
descr:       PSINet Europe DE

PSINet Inc. (ASN-PSINET)
   Autonomous System Name: PSINET
   Autonomous System Number: 174

Hmm, this last one might not be a NAP.

So any hop from these prefixes should probably be ignored 
in the forwarding AS path

So is Global One Colombia (ASN-GIPCO) Autonomous System Number: 7984
providing transit for Sprint (4000) in Columbia?

traceroute to 216.72.47.1
[...]
12      134.24.46.169   AS1740
13      192.157.69.80   AS2914
14      204.59.137.134  AS4000
15      206.49.176.241  AS7984
16      206.49.176.130  AS7984
17      196.27.25.81    AS4000
##BGP path:             195 1740 4000
##Forwarding path:      195 1740 2914 4000 7984 4000

So BGP isn't being used end-to-end (AS-wise) on this path. For a 
multihomed provider, this opens the possibility of loops. Why is 
Sprint proxy-advertising the destination, but the intermediate 
prefixes are still being advertised by GIPCO? Where is GIPCO
advertising these prefixes, and do they peer with anyone other
than Sprint? Has GIPCO been bought by Sprint, and is this just
displaying a transitional phase?

Similar weirdness between OPENTRANSIT (5511) and RAIN (3215) in .fr:

traceroute to 194.250.47.1
[...]
11      194.206.207.53  AS3215
12      193.251.128.129 AS5511
13      193.251.128.34  AS5511
14      194.250.89.13   AS3215
15      194.250.89.2    AS3215
16      194.250.88.5    AS3215
17      194.250.88.10   AS3215
18      194.250.180.146 AS3215
19      195.101.11.58   AS3215

Appendix B: Glossary

ANSI - American National Standards Institute
API - Application Programming Interface
ATM - Asynchronous Transfer Mode
Benchmark - usage in this proposal refers to a comparative evaluation of
accuracy and reliability of 
various vendor implementations of IP layer measurement features
BGMP - Border Gateway Multicast Protocol
CoralReef - Measurement suite for workload characterization,
www.caida.org/Tools/CoralReef
DAG - University of Waikato's family of passive measurement cards (DS3,
OC3/12/48)
DVMRP - Distance Vector Multicast Routing Protocol 
IETF - Internet Engineering Task Force
INET - INet Conference
IPSEC - Internet Protocol Security
IP Telephony - Internet Protocol Telephony
MBGP - Multi-protocol Border Gateway Protocol
MIB - Management Information Base 
MPLS - Multi-protocol Label Switching
MRM - Multicast Reliability Monitor
MSDP - Multicast Source Discovery Protocol
MSTP - Multicast Source Transfer Protocol
NANOG - North American Network Operators Group
NMS - Network Modeling and Simulation
PIM-SM - Protocol Independant Multicast - Sparse Mode
POS  - Packet over Sonet
SDR - Session Directory
Skitter - traceroute like tool for analysis of path-specific connectivity, 
   performance and route stability 
SNMP - Simple Network Management Protocol 
TCP - Transmission Control Protocol
SSM - single source multicast 
UDP - User Datagram Protocol
Usenix - Unix User Group

Macroscopic Internet data measurement and analysis

Correlating sources of possibly predictive data:

routing, performance, topology, and workload

PROPOSAL - Section I - Administrative

Table of Contents

Section II. Detailed Proposal Information

A. Innovative claims for the proposed research

B. Proposal Roadmap

C. Technical rationale, technical approach and constructive plan

1. Technical Rationale

2. Technical Approach and Plan

a) Monitoring (Task 1)

Multicast Traffic

Other Internet Traffic Datasets

b) Archiving and Serving Data Sets to the Community (Task 2)

c) Interactive Visualization and Navigation Tools (Task 3)

d) Developing model of Internet core (Task 4)

Policy-based topology modeling

Identifying peering points

Determining peering relationships

Identifying Internet `core'

D. Deliverables

a. Monitoring

b. Archiving/Serving data

c. Navigational/Visualization Technologies

d. Internet core model

E. Statement of Work

a. Monitoring Task

Year 1:

Year 2:

Year 3:

b. Data Archiving and Availability

Year 1:

Year 2:

Year 3:

c. Interactive Visualization and Navigation Tools

Year 1:

Year 2:

Year 3:

d. Internet core model

Year 1:

Year 2:

Year 3:

G. Technology Transfer

Tools

Datasets

Impact on the Advancement in Information Technology and Related Sciences

H. Comparison with Other Ongoing Research

Advantages of the Proposed Effort

I. Key Personnel

J. Description of the Facilities

K. Experimentation and Integration Plans

L. Cost by task -

Personnel

Travel

Consumable supplies

Section III. Additional Information

Bibliography

Appendices

A: Discrepancy of announced BGP AS path versus forward path

Appendix B: Glossary