From kc@ipn.caida.org Tue Mar 31 15:31:22 1998
Date: Tue, 31 Mar 1998 10:32:50 -0800
From: k claffy <kc@ipn.caida.org>
To: brad <bhuffake@sdsc.edu>
Subject: made some changes to dw's adds, can you replace?


I added paragraphs 2,3,4 and modified 5.

Data
--------------------

total number of HTTP requests.  

One restriction of the ``Planet Cache'' visualization
was that we could only show our direct neighbor caches (i.e.
the other caches which connect to ours), since we were 
limited to data generated from only our daily access logs.

With Plankton, however, we are able to see much more than
our direct neighbors.  
We modified Squid to log the X-Forwarded-For header 
from HTTP requests, which allows us to retain knowledge of 
the identity of our client's clients, and so on.  This header,
invented for Squid, lists the IP addresses of the requesting
client(s).  Each cache (optionally) appends the requesting
client's IP address to this list.  A cache administrator
may choose to not add its client's address, in which 
case the label `unknown' will appear instead.

The first address in the X-Forwarded-For header is most
likely a Web browser (the end user).  Because we are only interested
in visualizing Web caches, not end user browsers, 
we remove the first address of every list.  Additionally, 
because Squid logs the X-Forwarded-For information
as received from the client, it does not include our 
cache's own IP address, so we append this latter address
to the list.

To create a complete view of the hierarchy, we combine 
data from all 7 NLANR caches, with resulting file sizes 
that range from 2 to 4 Mbytes, large enough to render download
times problematic.  These data sets also contain fully qualified domain
names (hostnames) of caches within the hierarchy.  We realize that many
people use Web proxies to achieve some level of anonynimity.  Thus, to
protect their privacy and identities, we only make smaller, filtered
versions of Plankton data available for public consumption.