



The proposal "IODA-NP: Multi-source Realtime Detection of Macroscopic Internet Connectivity Disruption" is also available in PDF.
Abstract
As most of the world's communication infrastructure converges on use of the Internet, governments around the world are increasingly realizing they must understand how to provide the level of reliability and resilience we have come to take for granted from the public switched telephony network, now on its way out in the U.S. But in spite of the societal and economic impact of Internet connectivity disruptions, we lack any rigorous understanding of Internet outages or even sufficient tools for their systematic and timely identification. This gap is rooted in persistent challenges in Internet measurement and monitoring, such as the lack of measurement-aware design of Internet protocols, inherent complexity of (noisy) network measurement data, and the lack of systematic ground-truth data, combined with the highly diverse and continuously evolving nature of Internet infrastructure. Challenges in Internet connectivity outage identification are exacerbated by the heterogeneity of disruption events and their characteristics: many endemic or external factors can trigger an outage, ranging from human error such as misconfiguration, government-mandated shutdowns, and cyber-attacks, to cable cuts, network failures, natural disasters, power outages, etc. In the last 5 years, the Center for Applied Internet Data Analysis (CAIDA) at UC San Diego (Offerer) leveraged its two decades of experience in Internet measurement and data analysis, to tackle this research problem. Specifically, we focused on the detection and characterization of large-scale Internet outages, i.e., connectivity disruptions significantly affecting large geographic regions or specific Internet operators. Our work proceeded in parallel along two dimensions: we proposed and combined innovative methodologies for outage detection and we developed software frameworks as building blocks to implement and test our approaches. As a result, early this year we presented a prototype of our "IODA" (Internet Outage Detection and Analysis) system to detect events of connectivity disruption in near-realtime. In response to CSD BAA, HSHQDC-17-R-B0002, via strategically planned participation in the PARIDINE program TTA#1, we propose to conduct applied research and development in order to (i) define a rigorous framework, IODA-NP (Next Phase), for large-scale outage detection based on IODA methodologies and software building blocks, (ii) perform methodological and technological improvements to our approaches and systems in order to develop and deploy this framework as a near-realtime capability monitoring the Internet 24/7, and (iii)systematically identify and quantitatively evaluate its capabilities and limitations in the real world. In addition, we propose an optional Pilot Task in which we will integrate, deploy and test our solution into operation in cooperation with an entity identified by the DHS. Our project includes applied research, software development, new data analytics, systems integration, operations and maintenance, validation, and event analysis and reporting.Performance Goals
The Regents of the University of California; University of California, San Diego on behalf of the San Diego Supercomputer Center's Center for Applied Internet Data Analysis (CAIDA), offer this technical proposal, which includes the following goals: (1) a measurement and data analysis system, based on a rigorously defined framework, which will improve our ability to identify and monitor episodes of macroscopic Internet connectivity disruption that threaten the security and reliability of the nation's communication capabilities; (2) a Web interface to access dashboards and alerts and visually inspect specific events (software-as-a-service) as well as programmatic APIs to access both live streams and historical data of the alerts generated by the system; (3) periodic reports about the events detected by the system and analyzing its performance, limitations and potential improvements; (4) (optional) deployment and testing of our solution into operation in cooperation with an entity identified by the DHS. The proposed work targets objectives outlined in TTA#1 of PARIDINE's solicitation: Definition, Identification, and Production of Network / Internet Disruptive Events (NIDEs). We will develop and evaluate a methodology and build, integrate, and operate multiple open-source software tools for near-realtime detection of large-scale Internet outages affecting large geographic regions or specific Internet operators. Our detection, reporting, and analysis system will help identify different types of events of connectivity disruption with diverse potential impact on communications, other critical infrastructures, public safety, financial services, etc. To achieve the four mentioned goals, we propose to leverage the results of technologies and infrastructure funded by the Department of Homeland Security and the National Science Foundation, in particular the methodologies and the software components developed in CAIDA's IODA (Internet Outage Detection and Analysis) project. We will combine outage inference approaches operating both at the Internet control plane (BGP) and data plane, using both passive (Internet Background Radiation) and active approaches (continuous probing). We will define specific metrics and provide definitions of events for each inference approach and propose an applied methodology to fuse these inferences in order to (i) increase their accuracy and (ii) yield a better understanding of the characteristics of the disruptive event detected (e.g., a cable cut or a natural disaster will likely affect the control plane, whereas network congestion will appear as a data-plane anomaly only in most situations). In addition to APIs enabling access to alerts generated by our service, to enhance user capabilities to understand, quantify and categorize each event, we will provide visual interfaces for the inspection and correlation of current and historical data. Our proposed work in response to PARIDINE's call will result in IODA-NP (IODA Next Phase), a rigorously defined and evaluated (on real events) framework and a usable standalone product (provided as software-as-a-service) as well as methods and software modules that in the future could be integrated into broader solutions (e.g., for risk assessment, root cause analysis, or event prediction). The resulting technologies will support situational awareness and decision analytic needs of NCCIC and other government agencies. We perform this applied research and development on a reasonable efforts basis.Detailed Technical Approach
Introduction The focus of our proposed work is the identification and characterization of macroscopic events of connectivity disruption that affect the Internet, potentially disrupting other critical infrastructure components that now rely on Internet communications, e.g., power grid, VoIP enhanced 911, financial institutions. These types of events cover the disruptive event scenarios exemplified in sections 2.7.1, 2.7.2, 2.7.4, 2.7.5, 2.7.6 of the PARIDINE BAA. CAIDA pioneered methodologies to characterize large-scale Internet outages in 2011 and 2012, caused by natural disasters and human intervention In 2012 we received funding from the NSF to start a project called IODA (Internet Outage Detection and Analysis) to develop a proof-of-concept system implementing our methodologies [1]. To support this project, we developed several IODA software components (e.g., [2]) thanks to this and other funding from NSF and DHS S&T. We completed this project in early 2017, resulting in mature methodologies to extract "liveness" signals from various types of Internet measurement data, a set of open source software components and visual interfaces, and a demo dashboard. In response to BAA HSHQDC-17-R-B0002, we propose to take IODA to the next phase (IODA-NP) and produce a framework and a tool addressing the requirements of the PARIDINE solicitation, significantly advancing the current state of the art in near-realtime global Internet monitoring and specifically in the detection and classification of disruptive Internet events such as connectivity outages. Existing Work: IODA In IODA we developed measurement and data analysis methodologies based on three distinct types of sources of Internet measurement data:- Internet Background Radiation (IBR) is one-way unsolicited traffic generated by millions of Internet hosts worldwide, due to misconfiguration, malware propagation, scanning, etc. IODA is the only outage inference system using IBR, thanks to a methodology that we demonstrated is capable of detecting outages caused by state censorship [3], natural disasters [4], and border router misconfiguration in Autonomous Systems (ASes) [5]. From IBR, we filter out spoofed traffic and bursty traffic components (e.g., due to scanning from large botnets) and extract a "liveness signal" based on the number of distinct source IP addresses observed from a given geographic region or AS. We collect IBR traffic through the UCSD Network Telescope, an almost entirely unutilized /8 IPv4 address block, estimated to observe 1/256th of all the IBR generated in the Internet. As of March 2017, the telescope captures more than 1TB of compressed traffic per day. We continuously process this traffic using our Corsaro open-source software platform.
- BGP routing information. For this data source, we leverage the collection infrastructure operated by the RouteViews and RIPE RIS projects. We infer the state of the routing tables exported by hundreds of operational routers by processing BGP updates and RIB dumps and we extract information about which network blocks (BGP prefixes) appear reachable on the Internet control plane from most of these vantage points. Different from other organizations occasionally reporting BGP-visible connectivity disruption (e.g., Renesys/Dyn), our approach counts visible /24 blocks instead of prefixes, more meaningfully quantifying which fraction of the address space normally announced by an AS or from a region is reachable at a certain point in time. We continuously process data from more than 300 operational BGP routers using our BGPStream open source software framework.
- Active probing. We periodically probe approximately 3.5 M /24 network blocks worldwide and adaptively send more probes upon lack of response using the Trinocular methodology developed by ISI/USC [6]. We run our measurements from a few dozen CAIDA Archipelago (Ark) nodes distributed worldwide and from a central node at UC San Diego.
References
- [1]
- A. Dainotti and K. Claffy, "Detection and analysis of large-scale Internet infrastructure outages (IODA)." https://www.caida.org/projects/ioda/, 2012.
- [2]
- C. Orsini, A. King, D. Giordano, V. Giotsas, and A. Dainotti, "BGPStream: a software framework for live and historical BGP data analysis," in Internet Measurement Conference (IMC), Nov 2016.
- [3]
- A. Dainotti, C. Squarcella, E. Aben, K. C. Claffy, M. Chiesa, M. Russo, and A. Pescapé, "Analysis of country-wide Internet outages caused by censorship," in Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement, IMC '11, pp. 1-18, ACM, 2011.
- [4]
- A. Dainotti, R. Amman, E. Aben, and K. C. Claffy, "Extracting benefit from harm: using malware pollution to analyze the impact of political and geophysical events on the internet," SIGCOMM Comput. Commun. Rev., vol. 42, pp. 31-39, Jan. 2012.
- [5]
- K. Benson, A. Dainotti, k. claffy, and E. Aben, "Gaining Insight into AS-level Outages through Analysis of Internet Background Radiation," in Traffic Monitoring and Analysis Workshop (TMA), Apr 2013.
- [6]
- L. Quan, J. Heidemann, and Y. Pradkin, "Trinocular: Understanding Internet Reliability Through Adaptive Probing," pp. 255-266, 2013.
Testing and Evaluation
Proposed use for DHS/S&T. The system will allow DHS components to improve the ability to identify, monitor, and model critical cyber-infrastructure, and to assess the impact of disruptions in terms of number and type of networks affected, and how long before connectivity is restored. Operational Utility Assessment Plan. The initial testing environment for the first two years of the project will be PARIDINE PI meetings and CAIDA workshops series (AIMS, WIE, DUST, IMAPS). To measure, assess and evaluate the current and projected operational value of this technology, we propose to visit partner locations including the NCCIC and FCC to provide hands-on tutorials and documentation to explain how to effectively use the IODA platform. The iterative process that is essential to our software development methodology will allow us to integrate feedback from DHS components on how to improve operational utility into subsequent versions of our platform. We will seek also seek testing and evaluation from network operators (through NANOG and IETF interactions). Their interest in an ability to use the tool for monitoring and detection of outages, hijacks, and other disruptive events will form the basis of our utility assessment. Metrics of utility and management of data and/or tools We will use traditional quantitative metrics including number of users, activity on project mailing lists from external participants, usage statistics of the IODA platform. We will also solicit feedback at workshops and via periodic surveys of users.File translated from TEX by TTH, version 4.03.
On 5 Sep 2018, 14:38.