WAN Bandwidth Estimation
|
|
|
Stefano Masini |
|
ste@caida.org |
|
Summer 2001 |
Existing Bandwidth
Estimation (bwest) tools
|
|
|
|
Actively being tested (collecting data) |
|
pathchar (first, by Van Jacobson) |
|
pathrate (Dovrolis, U Delaware) |
|
pipechar (LBL, Berkley Labs) |
|
pchar (Bruce Mah, Cisco) |
|
{bc}probe (Oceans, Boston U) |
|
Work in progress |
|
nettest |
|
nettimer (Kevin Lai, Stanford) |
|
bing (not maintained) |
|
clink (Downey, Wellesley Coll.) |
|
netperf (HP) |
|
treno (Pittsburg Supercomp. Ctr) |
|
ttcp (Army Research Lab) |
|
nttcp (Tech. Univ. Munchen) |
|
sprobe (U Washington) |
|
|
|
|
The testbed
The main testbed
The main testbed (cont)
|
|
|
|
Fully connected graph |
|
20 distinct source-destination pairs |
|
|
|
Each host probes each other every 3
minutes via traceroute |
|
Few hundreds of distinct paths seen in
3 months |
Controlling the testbed
|
|
|
|
Centralized headquarter (Stefano’s
laptop) |
|
Perl server running on each host |
|
Headquarter issues commands to hosts |
|
Convenient command line syntax |
|
Data is encrypted |
|
Fast response (no ssh connection delay) |
|
Headquarter keeps local mirror of hosts
file systems (rsync) for convenient access to the data |
Server features
|
|
|
|
Simple encryption schema for
communication with headquarter |
|
XOR of data with shared secret key |
|
Resistant to crashes and/or kill |
|
Job control |
|
Job timeout |
|
Output of issued commands logged to
file |
Automated tool execution
|
|
|
|
Tools must be run across every distinct
source-destination pair (as long as portability allows). |
|
At any given time only one src-dest
pair is being tested. |
|
Some tools may suffer from machine load
and/or network load. |
|
Easiest way to avoid both is to do one
thing at a time. |
|
Tool execution is “wrapped” around an
executor script, which takes care of scheduling the runs at the various
hosts. |
|
Distributed approach preferred over
centralized one: |
|
No need for centralized scheduler |
|
No need for extra communication |
The executor wrapper
|
|
|
A global run defined as a sequence of local
runs. |
|
A local run is an execution of the tool
over a given src-dest pair. |
|
A global run contains exactly one local
run for every src-dest pair. |
|
The state of a global run represents
which src-dest pairs have been tested so far (and which have not). |
|
Wrapper running at a given host starts
the tool with the correct destination basing on the state, and after
completion it updates the state and runs a new wrapper on the next designated
host with the new state. |
Tools evaluation
Evaluation strategy
|
|
|
|
Collect outputs of the various tools |
|
as many src-dest pairs as possible |
|
testing period as long as possible |
|
Parse the outputs to produce only numbers
in a way that allows for statistical analysis of many runs |
|
Compare the estimated network
properties with real network topology (where known) |
Sample results
|
|
|
Pipechar results. 647 runs. 2 weeks. |
The topology knowledge
Information about the real
topology
|
|
|
|
CAIDA peering with sysadmins of many
ASes. |
|
Topology information is difficult to
manage: |
|
Incredibly high level of detail |
|
No homogeneous structure/technology |
|
No steady technology (changing fast
over time) |
|
Need to make as few assumptions and
simplifications as possible because we don’t know which factors are important
for bandwidth estimation |
Database overview
|
|
|
|
Main components: |
|
nodes (router, workstation, switch,
etc.) |
|
interfaces (network card, router/switch
ports, etc.) |
|
links |
|
IP addresses (including names and ASes) |
|
Extreme importance to keeping track of
the information sources and people. |
Important relations among
components
|
|
|
|
One interface ® many IP addresses |
|
e.g. IP aliasing |
|
Many interfaces ® one IP address |
|
e.g. Load balancing in server cluster |
|
Many interfaces ® one link |
|
e.g. Ethernet |
|
One interface ® many links |
|
e.g. Special hardware for low-level
load balancing (Cisco?) |
The Database E/R Scheme
Links: logical or physical?
|
|
|
|
Links technology is difficult to
categorize: |
|
Ethernet, Token Ring, FDDI |
|
Packet over ATM |
|
ATM virtual circuits |
|
Packet over Sonet (POS) |
|
Software limited bandwidth channels |
|
... and in a few months? |
Links: logical or physical?
(cont)
|
|
|
|
Definition left vague in the DB design |
|
|
|
Added attribute “containing_link_id” to
link_info |
|
Currently unused |
|
Allows for representation of tree
structures: |
Information sources
|
|
|
|
Every piece of knowledge about the
topology is added to the database along with the source of the information
and the time |
|
allows to fetch a “snapshot” of the
topology at any given time, past or present |
|
useful for estimating tools performance
because at the time the logs are analyzed the topology may have changed |
How is the topology
determined?
|
|
|
|
Real topology is impossible to
determine in an automated way |
|
Need to gather infos from the sysadmins |
|
Slow process |
|
|
|
Massive use of Traceroute and/or
Skitter |
|
Discovered topology is just an
approximation of the reality, but enough for tool performance estimation |
|
Fast and automated process |
Sample topology and
traceroute behaviour
|
|
|
Traceroute reconstructed path: |
Sample topology and
traceroute behaviour
|
|
|
Traceroute reconstructed path: |
Sample topology and
traceroute behaviour
|
|
|
Traceroute reconstructed path: |
Sample topology and
traceroute behaviour
|
|
|
Traceroute reconstructed path: |
Sample topology and
traceroute behaviour
|
|
|
Traceroute reconstructed path: |
Sample topology and
traceroute behaviour
|
|
|
Traceroute reconstructed path: |
Sample topology and
traceroute behaviour
|
|
|
Traceroute reconstructed path: |
Sample topology and
traceroute behaviour
|
|
|
Traceroute reconstructed path: |
Sample topology and
traceroute behaviour
|
|
|
Traceroute reconstructed path: |
Sample topology and
traceroute behaviour
|
|
|
Traceroute reconstructed path: |
Sample topology and
traceroute behaviour
|
|
|
Traceroute reconstructed path: |
Sample topology and
traceroute behaviour
|
|
|
Traceroute reconstructed path: |
Sample topology and
traceroute behaviour
|
|
|
Traceroute reconstructed path: |
Sample topology and
traceroute behaviour
|
|
|
Traceroute reconstructed path: |
|
|
|
|
|
Interfaces C and E are missing! |
Hidden interfaces assumption
|
|
|
|
In interpreting a traceroute path we
assume the existence of hidden interfaces. |
|
e.g. In the following example in the
database would be inserted not only the four interfaces A, B, D and F, but
also two unnamed interfaces (which correspond in reality to C and E) |
|
this assumption may cause a reverse
path to appear distinct from the forward one even if they are symmetric |
|
the only way to solve the problem is
asking to the sysadmins L |
More details to come...
|
|
|
|
|
Database on Kokoro |
|
|
|
Tools and documentation on Cider |
|
|
|
Stefano on the Moo |
Thanks CAIDA !!