From wessels@nlanr.net Tue Mar 31 09:43:14 1998 Date: Mon, 30 Mar 1998 23:55:21 -0700 From: Duane Wessels To: jjung@caida.org, bhuffake@caida.org Cc: k claffy Subject: Re: k'ified paper I added paragraphs 2,3,4 and modified 5. Data -------------------- The raw data being visualized is collected only at the NLANR root caches. Because an HTTP request only arrives at one of the root caches after it has missed at all the caches along the way, the number of HTTP requests seen in the data is not representative of the number of requests made throughout the system. The root caches see only misses from client caches directed at them. Cache log files from client caches would be necesssary for a more complete picture of the total number of HTTP requests. One restriction of the ``Planet Cache'' visualization was that we could only show our direct neighbor caches (i.e. the other caches which connect to ours). This is because the visualization data was generated from our daily access logs. With Plankton, however, we are able to see much more than our direct neighbors. We have knowledge of the identity of our client's clients, and so on. Squid was modified to log the X-Forwarded-For header from HTTP requests. This header, invented for Squid, lists the IP addresses of the requesting client(s). Each cache (optionally) appends the requesting client's IP address to this list. A cache administrator may choose to not add its client's address, in which case the word `unknown' is used instead. The first address in the X-Forwarded-For header is most likely a Web browser (the end user). Because we are only interested in visualizing Web caches, the first address of every list is removed. Additionally, because the X-Forwarded-For is logged as received from the client, it does not include our cache's own IP address. The cache's own IP address is appended to the list. In creating a complete view of the hierarchy, we combine the data from all 7 NLANR caches, with resulting file sizes from results in file sizes that range from 2 to 4 Mbytes, large enough to render download times problematic. These data sets also contain fully qualified domain names (hostnames) of caches within the hierarchy. We realize that many people use Web proxies to achieve some level of anonynimity. Thus, to protect their privacy and identities, we only make smaller, filtered versions of Plankton data available for public consumption.