Skip to Content
[CAIDA - Center for Applied Internet Data Analysis logo]
Center for Applied Internet Data Analysis
NeTraMet DNS Root Server Performance

Motivation

This project began as a test for packet-pair matching using DNS request/response packets destined for the 13 root nameservers, named A through M.

NeTraMet Ruleset

Development of this ruleset presented some interesting problems. First attempts crashed the NeTraMet meter from time to time. The problematic ruleset was essentially as follows:

if SourcePeerType == IPv4 save;
else ignore; # not IP
if Source TransType == UDP save;
else ignore; # not UDP

if DestTransAddress == DNS save, {
  TestDestAddress; # Sets FlowKind
  if FlowKind == 0 nomatch; # Not a root nameserver
  else {
    save ToTurnaroundTime1 = 100.11.0!0 & 1.3.1!700;
      # 100 buckets, PP_UDP_DNS, linear scale, 10**3 => 1..700 ms
    count;
    }
  }

It tests for UDP packets with a DestTransAddress (destination port) of DNS (53). DestPeerAddress is compared to a list of the root nameserver addresses, creating a flow for each root nameserver and building a ToTurnaroundTime distribution for it. Most of the time this worked as expected, but occasionally the meter would find a stream which was ambiguous in that it could belong to more than one flow. To see how this can happen, imagine that the meter sees the following three packets in sequence:

  • 0.1.0.1|53 --> 0.180.0.2|53
    Matched forward, created flow A, between host 0.1.0.1 port 53 and host 0.180.0.2 port 53. (The IP addresses come from a trace file, they have been 'sanitized' so that the actual IP addresses do not appear in the trace tile).
  • 0.180.0.2|62225 --> 0.1.0.1|53
    Matched reverse, flow A. This is possible because the ruleset doesn't save the SourceTransAddress. It can't, because that would cause the meter to create flows for every pair of TransAddresses.
  • 0.1.0.1|53 --> 0.180.0.2|6225
    Source-->Dest match failed, but Dest-->Source matched, creating new flow B. Flows A and B have streams in common, since streams use all of Source/Dest Peer/Trans Addresses (flows only use the parts saved under their masks).

Flows could both see packets for the same stream, leading to stream queues with a stream in common. This doesn't become a problem until one of the flows times out, and the meter recovers its streams. At that point the other flow still has pointers to the just recovered stream, i.e. its stream queue has been corrupted.

To detect ambiguous streams as they appear, I modified the stream data structure to keep a pointer to the flow that created it. As part of packet processing, when the meter locates the packet's stream, it checks to see that it does belong to the packet's flow. If it doesn't, the meter displays a warning message, and prevents the flow from creating any more streams. This is a somewhat extreme measure, but it makes sure that the meter's data structures remain intact.

It may be possible for the SRL compiler to determine that a ruleset can create ambiguous streams, but that is a long-term research question. In the short term one must simply be careful not to write such rulesets. The root nameserver ruleset is easily modified by adding one further statement, like this:

if DestTransAddress == DNS save, {
  if SourceTransAddress == DNS save; # Ambiguous flow, keep it separate
  TestDestAddress;  # Sets FlowKind
  .....
  }

This tests for the ambiguous case (SourceTransAddress and DestTransAddress are both DNS), and creates a separate flow for it. The resulting flow data files have two records per meter reading for each root server, but it is not difficult to combine them into a single flow.

The ruleset employed for this project measures ToTurnaround time using 100 bins with values ranging linearly from 1 to 700 msec. If the meter doesn't observe a response packet within ten times the upper limit (7 seconds in this case) the request packet is deleted from its packet queue and counted as 'lost', i.e. it is not counted within the distribution. Packets are also counted as 'lost' if they are dropped from a streams packet queue to make room for a new request packet. The ToLostPDUs attribute is used to compute percentage loss rates for the root nameservers.

Graphs

On the SDSC link we can see packets for eight of the 13 root nameservers; routes to the other five go via the vBNS. Their responses are plotted in pairs, making four plots for each of three weeks. The upper half of each plot shows the measured response time in milliseconds and the lower plot shows the percentage packet loss.



Activity on DNS Servers A and G
Week 1Week 2Week 3Week 4
Sat 3Jun00 - Fri 9Jun00Sat 10Jun00 - Fri 16Jun00Sat 17Jun00 - Fri 23Jun00Sat 24Jun00 - Fri 30Jun00
UCSD normal weekUCSD finals weekUCSD summer breakUCSD summer break


Activity on DNS Servers C and K
Week 1Week 2Week 3Week 4
Sat 3Jun00 - Fri 9Jun00Sat 10Jun00 - Fri 16Jun00Sat 17Jun00 - Fri 23Jun00Sat 24Jun00 - Fri 30Jun00
UCSD normal weekUCSD finals weekUCSD summer breakUCSD summer break


Activity on DNS Servers F and M
Week 1Week 2Week 3Week 4
Sat 3Jun00 - Fri 9Jun00Sat 10Jun00 - Fri 16Jun00Sat 17Jun00 - Fri 23Jun00Sat 24Jun00 - Fri 30Jun00
UCSD normal weekUCSD finals weekUCSD summer breakUCSD summer break


Activity on DNS Servers J and L
Week 1Week 2Week 3Week 4
Sat 3Jun00 - Fri 9Jun00Sat 10Jun00 - Fri 16Jun00Sat 17Jun00 - Fri 23Jun00Sat 24Jun00 - Fri 30Jun00
UCSD normal weekUCSD finals weekUCSD summer breakUCSD summer break

Discussion

The observed loss rates are rather high, raising questions about their accuracy. Three reasons why they might be artificially high were checked:

  1. Packets were dropped from the full queue.
    The queue size limit was initially 30 packets and the total counts per five-minute interval were of the same order, making it unlikely that packets were being dropped. To test this, the queue limit was increased to 60 packets; this made no difference to the observed loss rate.
  2. Some packets were returning via another route, bypassing the meter
    This seems highly unlikely, since if the routing were stable, this would produce a steady loss rate for the affected server. This does not happen. Instead, the loss rates fluctuate along with the network traffic rate.
  3. Responses were returning after the timeout period of 7 seconds
    This was tested by running a second copy of the ruleset using a logarithmic range of bins from 1 msec to 30 seconds, i.e. a timeout period of five minutes. This made no observable difference to the loss rates.

It appears that the observed loss rates are real. Loss may be attributed either to congestion effects at various points in the Internet, and/or to loading effects at the root servers themselves. Bear in mind that high loss rates for DNS requests does not necessarily mean that other applications will suffer similar loss rates!

About the A and G root servers

Both A and G roots are in Virginia. A (dark green) is at NSI, and G (red) is at DoD. Both show strong diurnal variations through the first two weeks, and are comparatively quiet in the third week. Response time for G improves through the second week, and is generally better in the third. These two servers have very similar loss patterns, which are in turn very similar to that for the F server (see below).

About the C and K root servers

The K root (dark blue) is run by RIPE NCC, and is housed in London. Its traces are very similar to those from M, except that the minimum delay time is about 175 msecs. Response time for K root varies similarly to C root (magenta), which is located at PSI in Virginia, except that its minimum is about 96 msecs. C shows strong diurnal variations during the quiet week (Week 3), where traffic appears to have been at its lowest. This contrasts significantly with other root servers (e.g., see F), and has not yet been explained.

About the F and M root servers

The F root, in Palo Alto, is both one of the most heavily used by hosts at SDSC, as well as the one with the shortest response time. Its values are plotted in grey. The M root, plotted in light green, is in Tokyo. Looking at the week starting on Saturday 17 Jun (first week of summer break) we see clear minimums in the two response times. SDSC to Palo Alto is about 20 msecs, while SDSC to Tokyo is about 130 msecs. These times seem reasonable for the geographical distances involved.

For the two preceding weeks the response time traces vary diurnally, with the weekend (10-11 Jun) showing less variation than the weekdays. This suggests that the congestion is occurring close to SDSC, affecting both paths in the same way.

The F server loss rate plot shows strong diurnal variations, and these correlate well with the observed inbound traffic rates. Comparatively few requests are sent to the M server, which is why its plots are more 'grainy' and don't seem to correlate with the traffic.

About the J and L root servers

The J root (dark brown) is located at NSI in Virginia, the same as the A root. The L root (light blue) is at ISI in California. J root behaves in much the same way as A, as might be expected. It is not clear why the geographically closer L root performance is consistently poorer.

  Last Modified: Tue Oct-13-2020 22:21:54 UTC
  Page URL: https://www.caida.org/research/traffic-analysis/netramet/dns/index.xml