Paul Barford |
KC Claffy |
Alfred Hero |
Craig Partridge |
Walter Willinger |
The first Workshop on Internet Signal Processing
(WISP1) was hosted by CAIDA at the University of California - San Diego campus
on November 11, 12, 2004. The goal of the workshop was to explore opportunities
for interdisciplinary collaboration and cross-pollination through the
application of novel analytic techniques to a variety of challenging networking
problems. This invitation-only workshop brought together a diverse group of 36
researchers (10 of whom were students) from the networking, signal processing
and applied mathematics communities for an exchange of ideas. Support for
travel and local expenses for attendees was provided by generous donations from
the National Science Foundation
(CNS-0456821) and from Cisco Systems
which helped to enable an outstanding group to be assembled. Logistical support
was provided by CAIDA staff which enabled the workshop to run smoothly and
professionally.
The workshop began with a keynote talk
delivered by Craig Partridge, and was then organized as a series of sessions in
which a wide variety of subjects were treated. Each session began with several
30-minute talks that treated broader issues of Internet science and/or analytic
methods. These were followed by sets of 15-minute talks that treated more
specific networking issues and/or analytic techniques. Discussion during talks
was encouraged but moderated, and breaks/meals offered opportunities for
continued discussion. The format enabled almost all of the attendees to give a
presentation over the two days of the workshop.
WISP concluded with a wrap-up session in
which the major themes that emerged over the course of the workshop were
discussed. There was strong consensus on the following points:
There are many
opportunities for collaborations between network researchers and signal
processing/applied math experts. Open areas include network characterization
(e.g., RTT or bandwidth estimation), network anomaly detection and classification,
intrusion detection, distributed event monitoring and analysis, and traffic
forecasting among others.
The development of
mechanisms for establishing ground truth in all estimation, identification and
classification studies is essential, but continues to pose significant
practical and theoretical challenges.
Data sets that are
labeled and documented by domain experts are essential for minimizing misuse or
inappropriate analysis of available measurements by other communities.
Finally, there was a discussion of the utility of
WISP1 and the interest in the possibility of another WISP in the future. There
was strong positive support for this year's workshop and a great deal of
interest in participating in another WISP in '05. In general, attendees would have
liked an expansion of the workshop to include more time for discussion, panels
and possibly tutorials. The organizers are in the process of reviewing the
feedback provided on workshop questionnaires and expect to announce WISP2 in
early 2005.
Our basic understanding of Internet characteristics
and behaviors is a key foundational component upon which new technologies will
be developed. An important means for expanding the basic knowledge of how the Internet
functions is through the application of mathematical, statistical and
analytical techniques that lend themselves naturally to specific problem
domains. The difficulty in this regard is that it is rarely the case that one
can simply grab a technique, "turn the crank" and get results that
are either correct or meaningful. Most methods have nuance in their application
that is only understood within the community that developed them. Conversely,
it is often difficult for experts in specific analytical domains to identify or
address Internetworking problems because they lack the domain knowledge to make
sense of the available data or the results.
The motivation for the first Workshop on
Internet Signal Processing (WISP) was the emergence of new and powerful signal
processing (SP) and multiresolution analysis (MRA) techniques in the networking
domain. It is becoming increasingly apparent that SP/MRA-based or related
techniques could lead to advances in areas such as network tomography; network
data collection; data dimensionality reduction; data compression; traffic
analysis in wired, ad-hoc wireless, and sensor networks; and network
anomaly/intrusion detection. This workshop was also interested in new analysis
techniques and approaches that are motivated by and account for the increasing
availability of spatio-temporal network measurements from PlanetLab-like or
Abilene-type infrastructures.
The goal of WISP was to shine light on the
opportunities for analysis of spatio-temporal network data and to foster
discussion between network researchers and groups from the traditional signal
processing, statistics and applied mathematics areas. To that end, the
technical focus of the workshop featured:
Application of a range
of signal analysis methods in the acquisition and evaluation of Internet
behavior and performance
Advances in signal
processing and related areas
Advances in applied
mathematics specific to multiresolution analysis
Advances in statistical
analysis techniques
Emerging communication and computing technologies over wireless, optical and quantum media
WISP was conceived and organized by the following
committee:
Paul Barford -
University of Wisconsin
KC Claffy -
CAIDA/University of San Diego
Alfred Hero - University
of Michigan
Craig Partridge - BBN
Walter Willinger -
AT&T Labs-Research
The primary organizational objective for the workshop
was to assemble a small (targeted at 30) but strong and diverse group of
students and senior researchers from the fields of networking, signal
processing and applied math. The intention for keeping the size limited was to
facilitate full group discussion. Participation in the workshop was by
invitation only. It turned out to be impossible to balance the desire for
diversity in participants and to keep the size of workshop below 30 - the final
number of participants was 36. Among these, of critical importance were the 10
student attendees all of whom made strong contributions to both presentations
and technical discussions.
The workshop was hosted at CAIDA's facility
on the University of California San Diego campus. It was arranged as a series
of talks over a wide selection of topics over a two day period (November 11 and
12). Each session consisted of
either longer 30 minute talks aimed at outlining broad problem domains or
analytic techniques, or shorter 15 minute talks aimed at highlighting a more
specific technique or problem. Discussion during talks was encouraged but
moderated in order to stay close to the target schedule. Discussions also took
place during breaks and meals.
The final agenda for the workshop was as
follows:
Thursday, November 11, 2004
8:15 - 9:00 Breakfast
9:00 - 10:00 Signal
Processing
o
Craig Partridge -
Whither Signal Processing and the Internet? (30 minutes)
o
kc claffy - Measurements
fueling Internet SP (30 minutes)
10:00 - 10:30 Break
10:30 - 12:00 Short
Presentations
o
Alefiya Hussain -
Identification of Repeated Denial of Service Attacks (15 minutes)
o
Marcos Aguilera -
Performance Debugging for Distributed Systems of Black Boxes (15 minutes)
o
Rui Castro -
Hierarchical Clustering and Network Topology Identification (15 minutes)
o
Abhishek Kumar -
Data-Streaming Algorithms for Monitoring High speed Traffic (15 minutes)
o
Nick Feamster - On BGP
instabilities and end-to-end path failures (15 minutes)
o
Darryl Veitch - A Tricky
Problem (15 minutes)
12:00 - 1:30 Lunch
1:30 - 3:00 Signal
Processing, Methods
o
Bin Yu - Statistical
Aspects of SP (30 minutes)
o
Mark Crovella - Applying
the Subspace Method to Network Traffic Analysis (30 minutes)
o
Eric Moulines - Applied
SP (30 minutes)
3:00 - 3:30 Break
3:30 - 4:30 Network
and Wireless
o
Dave Marchette -
Statistical and Visualization Techniques for Streaming Data (30 minutes)
o
Ramesh Rao - Reflections
on Modeling, Analysis and Insights in Networking Research (30 minutes)
4:30 - 5:00 Break
5:00 - 6:00 Short
Presentations
o
Rajesh Krishnan -
Traffic and Topological Analysis of Wireless Networks (15 minutes)
o
Thomas Karagiannis - A
Nonstationary Poisson View of Internet Traffic (15 minutes)
o
Joel Sommers - Phase
plot analysis of Internet packet traffic (15 minutes)
o
Neal Patwari - Watching
Traffic for an Anomaly: Data Visualization using Dimensionality Reduction (15
minutes)
6:00 Reception
Friday, November 12, 2004
8:15 - 9:00 Breakfast
9:00 - 10:00 Trends
o
Alfred Hero - Network
Inference and Signal Processing (30 minutes)
o
Paul Barford - Trends in
Network Measurements (30 minutes)
10:00 - 10:30 Break
10:30 - 12:00 Short
Presentations
o
Konstantina Papagiannaki
- Long-Term Forecasting of Internet Backbone Traffic (15 minutes)
o
Harsha Madhyastha -
Wasted Measurements in the Internet (15 minutes)
o
Felix Hernandez-Campos -
From Traffic Measurement to Realistic Workload Generation (15 minutes)
o
Abhishek Kumar -
Outwitting the Witty Worm -- Reconstruction and Analysis of an Internet-Scale
Event by Exploiting Pseudo-Random Number Generation (15 minutes)
o
Nick Feamster - Open
problems in anomaly detection in BGP data (15 minutes)
o
Christos Papadopolous -
ANT: Analysis of Network Traffic (15 minutes)
12:00 - 1:30 Lunch
1:30 - 2:30 Signal
Processing, Methods
o
Amos Ron - Mathematical
Aspects of SP (30 minutes)
o
Andre Broido -
Spectroscopy Methods (30 minutes)
2:30 - 3:00 Break
3:00 - 4:30 Networks
o
Constantine Dovrolis -
Why is the Internet traffic bursty in short (sub-RTT) time scales? (30 minutes)
o
Randy Moses -
Distributed Sensing and Inference (30 minutes)
o
David Meyer - Quantum
computing and internet signal processing (30 minutes)
4:30 - 5:00 Break
5:00 - 5:45 Short
Presentations
o
Jorma Kilpi - Passive
Monitoring of RTT spikes (15 minutes)
o
Vinay Ribeiro - Optimal
probing schemes for estimation of multiscale traffic (15 minutes)
o
Ljupco Kocarev -
Nonlinear Dynamics in TCP/IP networks (15 minutes)
5:45 Wrap-up
Discussions
Support was sought to defray travel and local
expenses for participants. WISP was extremely fortunate to receive support from
both the National Science Foundation and Cisco Systems toward this objective.
This funding was especially important for student participants who otherwise
might not have funding available. Logistical, web page management and on-site
support was provided by the CAIDA staff. Their participation and assistance enabled
the workshop to run smoothly and professionally for all participants. The
organizing committee is deeply grateful to NSF, Cisco and CAIDA for their
support.
Talk abstracts, speaker bios as well as other
materials (including talk slides) from the workshop can all be found on the
WISP web site: https://www.caida.org/workshops/isma/0411/. The goal of this section is to
give an overview of some of the themes that emerged during the workshop.
In his keynote, Partridge urged participants
to move beyond pretty pictures and to aim to try to understand what underlying
reality different signal processing techniques exposed (i.e., why the pretty
pictures looked the way they do).
While a few of the later speakers dutifully admitted that much of their
results were, in fact, pretty pictures, the workshop spent a lot of time
working on understanding.
A number of talks presented interesting
algorithms. Aguilera talked about
ways to treat distributed systems as black boxes, and then using statistical
techniques to find causal paths and understand how information was moving
through a distributed system. The
presentation nicely complemented Krishnan's talk on statistical techniques
(including causal techniques) to understand traffic flows in encrypted wireless
networks. Crovella gave a
wonderful talk that showed how Principal Component Analysis (PCA) could be used
to improve network understanding. Papagiannaki discussed the use of wavelet
analysis and linear time series models to show that IP backbone traffic
exhibits long-term trends, strong periodicities, and variability at multiple
time scales. Yu talked about
Internet tomography and compared different approaches to inferring traffic
matrices. Castro discussed what
sort of network problems can be addressed through the use of sandwich probes. Hussain presented work on using
Fourier-based techniques to better understand the structure of DDoS attacks,
and Papadopoulos discussed the use of spectral analysis techniques to the study
of network traffic in general. Kumar
presented his work on applying techniques from data-streaming to the design of
efficient monitoring applications. In a second talk, he used network telescope traces of the
Witty worm and reconstructed the series of actions performed by the worm at
each infected site. Computing
statistics and estimates on streaming data was also the topic of Marchette's
talk.
Overall, the talks clearly described a number
of exciting research methods and several important problems to which they may
be applied. It is clear, however,
that we are still in the early stages of exploration, and that validation
(i.e., establishing ground truth) has to play a much more dominant role than in
the past.
There were a number of talks that focused
simply on how to get good data to feed our nifty signal processing algorithms. kc talked about the difficulties of
getting data in the first place. Feamster gave two talks, both of which dealt
with BGP measurements and illustrated the difficulties in analyzing them. Barford's talk was forward looking and
described the sort of network measurements that can be expected to be collected
on a regular basis in the not-too-distant future.
The talks and ensuing discussions made it very clear that in order for the different research communities to interact in a productive manner, there is a dire need for data sets that are labeled and documented by domain experts to minimize misuse or inappropriate analysis of available measurements by other communities. NSF has recognized some of the difficulties of even getting high quality data sets to researchers and has funded CAIDA to design an Internet Measurement Data Catalog (IMDC) to index the heterogeneous datasets (both publicly accessible and restricted usage) into a database that researchers can query to find relevant data to support their work. The IMDC includes annotation capabilities for users so that bugs, novel features, and other information about datasets can be shared by investigators with experience in analyzing a particular dataset. In addition to providing the fodder for new inquiries, the IMDC will also facilitate more robust science by documenting exactly the data used in a study and enabling others to reproduce published results.
Veitch described an identifiability problem
in statistical modeling of network traffic that illustrates the benefits of
accounting for known network structures as compared to traditional time series
modeling. Karagiannis presented a statistical analysis of recent backbone
traces and claimed that traffic in today's large backbone networks is no longer
self-similar and more like Poisson over time scales relevant to the calculation
of primary performance metrics such as delays, queue lengths, etc. In stark contrast, relying on a more
networking-centric analysis of trace data, Dovrolis showed that real network
traffic exhibits pronounced burstiness over those time scales and is far from
Poisson. This observation was supported by Sommers who gave a demo of a tool
for visualizing network traffic dynamics over small time scales. Broido's talk about spectroscopy methods
applied to network traffic analysis also illustrated the presence of
interesting phenomena of packet-level traffic over small time scales and tried
to associate them with networking-specific mechanisms. Hernandez-Campos talked about his work
that tries to offer a more complete understanding of the interactions between the
different forces behind Internet traffic dynamics. Finally, Moulines focused again on statistical issues and
compared the use of Fourier versus wavelet methods for estimating the
long-memory or Hurst parameter.
These talks demonstrated that the analysis of
network traffic over small time scales is still a wide open problem, but that
substantial progress will only occur when statistical analysis techniques are
combined with more network-centric approaches to produce results that are
consistent with respect to other available and relevant measurements and agree
with networking reality and intuition.
There were a number of talks that focused on
existing trends and future problems. Hero described how network inference from
traffic measurements can be placed in the context of general spatio-temporal
analysis, dimension reduction, and classification. Patwari illustrated this
idea in his talk about visualizing traffic anomalies using dimensionality
reduction. Madhyastha discussed
the problem of wasted measurements in planned triggered measurement
architectures and talked about ways to be more efficient. Ron's talk focused on multiresolution
representations and their relevance to analyzing network measurements. Moses described inference problems (detection,
estimation, and tracking) using distributed sensors and signal processing.
Finally, Meyer's talk covered basic ideas of quantum computing and surveyed
existing quantum algorithms that are potentially applicable to signal
processing.
4. Feedback
An on-line anonymous feedback form was used to solicit
comments from attendees. Selected comments are given below. The organizers are
using this feedback to help plan the future of WISP.
• Comments utility/relevance of talks
◦ "I
found talks on signal processing methods and trends the most interesting."
◦ "All talks were relevant. The talks
by Harsha and Neal have some relationship to my work and so are most useful to
me."
◦ "For me, a signal processing
person, the network talks were most useful - especially k c claffy, Craig
Partridge, Paul Barford, Nick Feamster; they provided good insight that I
didn't have prior to the workshop."
• Comments utility of workshop
◦ "Overall,
I thought the workshop was very good, though, and I learned quite a lot. Thanks for giving me the opportunity to
come."
◦ "For someone like me who wants to enter this research
area, this was the perfect opportunity."
◦ "The opportunity to meet excellent
networking researchers with an SP perspective."
◦ "Really nice to get everybody
together and encourage discussion and exchange of ideas between the
statistical/signal processing group and networking group of researcher."
◦ "It was a great opportunity for
discussing among people doing research in the same area and for exchanging
ideas about new directions and issues to solve."
◦ "High degree of audience
participation and interest. A well-focused and active community of interest and
an excellent selection of talks and readings."
◦ "The organization was great, and
the group of attendees made for interesting discussions."
• Suggestions to improve future workshops
◦ "Have
a 30-60min discussion session with a mediator at the end of each morning and
afternoon."
◦ "Maybe a discussion panel/open
discussion of what the group believed were the emerging problems/trends would
have been interesting."
◦ "I think WISP was a fantastic idea
and would be really nice to continue
getting together once a year."
◦ "My biggest, strongest suggestion
would be, let's do it again!"
5. Next Steps
The organizing
committee is currently discussing the format and organization for WISP 2005. It
will take place in late summer or early fall and will be hosted on the
University of Wisconsin - Madison campus. Dates are currently being considered
within the context of other related events (e.g., SIGCOMM) and an announcement
of WISP 2005 should be made in February 2005.