The work for this lab will be done in groups of three people. Each group will focus in depth on one of the traces from Lab 1. An experimental setup is available to emulate the topologies used for each of the traces in Lab 1.
[Insert a figure here]
One lab setup is available for each exercise. Groups should select one of the following exercises to work on for this lab.
Performance Under Heavy Loss: Reno TCP is known to have problems when multiple packets are lost in a single RTT. In this exercise, the group will explore the performance of Reno, NewReno, and SACK TCP under heavy loss conditions.
Configure the NetBSD kernel to compare Reno, SACK, and NewReno under identical conditions. This will require compiling several new kernels (or can we just use sysctl?).
Configure Dummynet to vary the packet loss rate. Dummynet is configured through the command ipfw pipe .... It supports specification of bandwidth, delay, and loss rates on pipes.
Run combinations of performance tests using different kernels and different packet loss rates. You may also wish to vary the bandwidth and delay of the dummynet pipe. Focus on packet loss rates between 1% and 30%.
Plot bandwidth as a function of packet loss for each of the TCP implementations. Compare this to several of the performance formulae (to be discussed in the next lecture). You may need to obtain the average RTT for the connections using tcptrace (or some other technique) to do this comparison accurately.
References: Floyd Reno-multiloss paper will be available in the lab.
TCP Timer Analysis: In this lab, the group will analyze the performance of the TCP Retransmission Timer. A number of operating systems today are using fine granularity timers for storing and computing the RTO. Under these circumstances, it is possible for the actual RTT of the network to become larger than the computed RTO, and cause unnecessary timeouts.
The system in use has a modified slow tick. Normally on NetBSD, this value is 500 msec; it has been set to 100 msec.
Trace ??? from Lab 1 was the result of a bug in the RTO code. The first part of this exercise is to find and fix this bug. You should look in the files /usr/src/sys/netinet/tcp_timer.c and /usr/src/sys/netinet/tcp_input.c for the code which executes timeouts and which computes the RTO value.
Task two is to explore networks with variable delay. Modify dummynet on FreeBSD to inject different forms of variable delay. Dummynet is normally configured through the command ipfw pipe .... It supports specification of bandwidth, delay, and loss rates on pipes. In order to add variable delay support, you will need to modify the kernel dummynet.c module.
Run test TCP connections over this path and see how likely TCP is to see unnecessary timeouts under different conditions.
If time permits, consider various fixes to address any problems you see. Paxson and Allman have recently suggested that the most effective "fix" for the Internet is to impose a minimum RTO of 1 sec. Explore several possible fixes.
References: Paxson/Allman SIGCOMM99 paper will be available in the lab.
Three Dupacks Check: This test is used to distinguish between network reordering and packet loss. For small windows, the three dupacks check can fail, resulting in unnecessary timeouts. It has been suggested that this check be relaxes when cwnd is small.
In order to generate scenarios with small windows, two Dummynet configurations are available. One uses low bandwidth and small buffer sizes, guaranteeing consistent small window sizes. The second has high packet loss rates, which will often result losses when cwnd is small.
Modify the kernel to reduce the three dupacks check in the case where cwnd is small.
An alternate solution would be to generate additional duplicate ACKs. Modify the kernel to send one new packet per dupack in order to increase the number of dupacks available.
Are there any other solutions you can think of?
Compare the effectiveness of each solution via trace examination.
Compare the results of each solution numerically by measuring the bandwidth of each solution for each of the test network configurations.
Slowstart and Window Sizing:
For this exercise, the test network is configured with a bandwidth of 10 Mb/s and a delay of 100 msec. This corresponds to a bandwidth*delay of 125 kB.
Using command line switches, modify the buffer sizes used by netperf. Measure the performance (bandwidth) obtained as you vary the socket buffer size. Plot this function (bandwidth vs. buffer size). Is there a disadvantage to using too large of a socket buffer?
Now use Netperf in request/response mode, with a block size of 100kB. (Also be sure the buffer size is at least 100kB). In this mode, TCP is always in slowstart. Examine a packet trace of this connection. What problems can you see?
Consider mechanisms to improve slowstart performance. Here are two possible suggestions:
Implement at least one "alternate" start mechanism and compare its performance to normal TCP. What are the tradeoffs?
In order to run connections in parallel, we will be using the webpolygraph tool, designed to benchmark web proxies. This tool allows us to simulate a multiuser workload, with variable connection sizes.
To simulate mice, we will use many connections of small transfer size (10kB). To simulate elephants, we will use a small number of connections with large transfer sizes (1MB). Each simulation will run through the same simulated 100msec T1 network.
The goal will be to measure packet loss rate at the bottleneck as a function of network load. The easiest way to measure packet loss rate is to use the interface counters on netstat on the FreeBSD router. By looking at packets in and out on the network interfaces before and after a run, you can compute the total number of packets which arrived during the run, and the total number which were dropped. Derive a packet loss percentage.
To vary the load, use the --robots switch to webpolygraph to modify the number of simulated users. With mice workloads, small numbers of robots won't be able to fill the link. As you increase the number of robots, the average bandwidth available to each robot will decrease. Measure the packet loss with 10, 20, 30, 40, and 50 robots for both the mice and the elephant workloads.
Plot these results on the same graph. Do this several times, once using #robots as the x-axis, once using available bandwidth per robot as the x-axis, and once using the average achieved bandwidth per robot as the x-axis. What can you conclude about the aggressiveness of mice vs. elephants?
Note: If two groups do this lab, one will use a simulation with a drop-tail router with 30 queue buffers. The second will use a RED router (using AltQ). Because the RED router should treat connections more fairly, there could be some significant difference between the two sets of results.