Despite these drawbacks, the user-friendly interface
and the visually informative and dynamic nature
of both the standalone and the applet version are
vast improvements over currently available tools,
and provide a good basis for future analysis.
Keywords: Web Caching, Visualization, Java
By extending this concept across a cooperative mesh, one can form cache meshes, or hierarchies, to distribute load and leverage content among caches, further increasing overall performance. For these reasons, rapid growth in the demand for web caching has led to many research, development, and deployment projects in the last few years. One well-known caching project in the United States, the NSF-sponsored NLANR root cache system has been developing and deploying a prototype global web caching hierarchy since 1995.
With root caches located at each node of the NSF's very high performance Backbone Network Service (vBNS), the NLANR caches have experienced a steady increase in use. This growth can be seen in figure 1:
Currently over a thousand caches in over 45 countries participate in the NLANR caching hierarchy, with around four million requests per day reaching the NLANR root caches. The growth rate has clarified the need for a more organized structure that supports a scalable global Internet information distribution system.
Web caching can serve at least three purposes: content control; increased performance in terms of reduced external bandwidth required; and faster response times (lower latency). Web caching activities have grown much faster outside the United States than domestically, primarily due to the relatively higher cost of bandwidth -- typically 10-100 times the cost in the US. A few countries (i.e. China and Singapore) even mandate the government-sponsored use of caching to allow content control by blocking web pages from particular sites.
In the next section we describe the first prototype visualization of the underlying structure of the NLANR caching hierarchy, Planet Cache, undertaken in 1996. Section 3 covers the design and implementation of Plankton, the follow-on visualization tool to Planet Cache. Section 4 describes the Plankton user interface and options. We conclude with a discussion of the results of this visualization and possible future directions.
Using data collected at the NLANR caches by
their web caching software (Squid) [4],
Munzner created a 3D model of the hierarchy with nodes
on the globe representing caches and lines reflecting
peering relationships between them. The color of the
lines indicated the number of HTTP requests received
and the node positions reflected the latitude and longitude
of the participating caches. NLANR stored VRML images
reflecting the configuration for each day, posted them
on the web for analysis, and created a video animating
the change in configured hierarchy over the course of the
project.
Figure 2 shows several caches in Europe and Asia peering with between 4 and 6 NLANR caches, so that every request to the NLANR system results in transmission of a separate query to each cache. Although this behavior may maximize the client's chance of getting a HIT response quickly, it is not in the best interests of either the user or the system as a whole. What really matters is how long it takes to retrieve the entire document, not just the single packet acknowledging the HIT.
Since the NLANR caches are connected via the vBNS, they effectively behave as a single distributed cache with six access points. Therefore, even if a document is only in a west coast NLANR cache, a cache client in Europe will get better response time by retrieving it via an east coast NLANR cache, which can leverage the relatively uncongested vBNS rather than having to use the often overloaded commodity Internet.
Insight gained from this initial visualization resulted in NLANR's cache access restriction policies, designed to force a more hierarchical global cache topology. Limiting access to the closest (throughput-wise) of the NLANR caches significantly reduced the number of duplicated transoceanic requests to the NLANR cache system, and at the same time decreased response time delay for requests for documents already stored in any NLANR cache.
While this investigation proved useful at the time, it depicted only a general sense of topology, limited to geographical illustration of the architecture, rather than allowing display of the hierarchical complexity of the virtual global cache mesh. Furthermore, the original tool could not zoom in to narrow the field of focus.
Our design goal for Plankton was to fill in this gap in functionality, to provide the ability to see the complete hierarchy either geographically or topologically, to focus in on a specific part, to dynamically alter the graph layout, and to explore the changes in the network over time. Such visualization will facilitate our ability to understand and improve on current load balancing and resource distribution as well as to predict future growth.
One restriction of the ``Planet Cache'' visualization was that we could only show our direct neighbor caches (i.e. the other caches which connect to ours), since we were limited to data generated from only our daily access logs. One goal of Plankton was to allow viewing of more than our direct neighbors. We modified Squid to log the X-Forwarded-For header from HTTP requests, which allows us to retain knowledge of the identity of our client's clients, and so on. This header, invented for Squid, lists the IP addresses of the requesting client(s). Each cache (optionally) appends the requesting client's IP address to this list. A cache administrator may choose not to add its client's address, in which case the label `unknown' will appear instead.
The first address in the X-Forwarded-For header is most likely a Web browser of an end user. Because we are only interested in visualizing Web caches, not end user browsers, we remove the first address of every list. Additionally, because Squid logs the X-Forwarded-For information as received from the client, it does not include our cache's own IP address, so we append this latter address to the list.
To create a complete view of the hierarchy we combine data from all seven NLANR caches, with resulting file sizes that range from 2 to 4 Mbytes, large enough to render download times problematic. These data sets also contain fully qualified domain names (hostnames) of caches within the hierarchy. We realize that many people use Web proxies to achieve some level of anonymity. Thus, to protect their privacy and identities, we only make smaller, filtered versions of Plankton data available for public consumption.
CGI scripts provide another level of user control over the data set, but without sufficient user interactivity, particularly in the ability to dynamically modify the caching topology layout. Achieving this level of interactivity using a standard CGI-script interface would have been difficult for the programmer and for the user, the latter of whom would experience a large wait time between user action and feedback while data was in transit across the net.
Choosing Java had many other advantages as well. First, Java allows us to implement both an applet and a standalone application. An applet is a complete set of code designed to be downloaded across the network for safe execution by a browser on a client machine. An applet provides advantages of both faster response time to the user as well as decreased network traffic. In contrast, a standalone program is a more canonical, independent executable piece of code, local to a one's system and able to run independently from other applications.
So Java allows the same source code to provide the advantages of an applet's wide-spread (`all-you-need's-a-browser') accessibility to the data set, and the standalone program's improved performance without the requirement for Internet connectivity.
The algorithm begins by determining whether there are any root caches in the data set. The default roots are the NLANR caches, but the user can specify any particular nodes as roots while the program is running. If Plankton does not find the root caches, it selects as root the node with the most children. If there is only one root node, Plankton places it in the center of the display window; otherwise Plankton places the root nodes evenly around the circumference of a circle centered in the middle of the display window. The radius of each node from the center is proportional to its number of children.
In figure 3, A is further from the center because
it has children and both B and C do not.
Figure 4 illustrates the default case when no root node is present;
As Plankton places each root node in the image, it also puts
the node's data structure into a queue that it can pass to a
function for processing. This function pops the node
from the queue and puts all of its as yet unplaced children
into an array. Plankton then places each node in this array
across a base angle that defaults to 180 degrees (see figure 5)
unless there is only a single root node, in which case it uses
360 degrees (figure 6).
Figure 5. Plankton base angle 180 | Figure 6. Plankton base angle 360 |
---|---|
![]() |
![]() |
The distance from the parent to a given child node is a weighted sum of the number of children of the parent plus the number of children of the node itself. ( (node's radius) = (# of node's parent's children) + (# node's children) ).
After having drawn the position of the node on the display, Plankton puts the nodes at the end of the queue, takes the next node off the top of the queue, and starts the cycle again. The size of a child node is one less than the size of its parent, which renders node size an indicator of its distance from a root node. Because the placement algorithm uses the size of the node when it calculates distance from the parent, decreasing size allows more nodes to fit on the screen without overlapping.
In the figure 7, F is further away from D than E because
F has more children than E does. Likewise,
F's children are further from F than D's
children are from D because F has more children than D.
Figure 8 shows the complete NLANR caching hierarchy (February 7, 1998).
Plankton draws each cache on the display by converting the latitude and longitude into an (x,y) coordinate, mapped in the display area. This (x,y) coordinate pair changes as the display changes in size, such as when the user zooms in or out, causing a potentially lengthy recomputation. Plankton then draws each cache node. If multiple caches map to the same point, either because their lat/long is the same or because the display's current resolution maps them to the same (x,y) point, Plankton represents the whole group with a single node. The corresponding textual display for this node would then contain summary data for all the caches it represents. The lines drawn between the nodes represents peering between these caches.
The overall NLANR caching hierarchy, drawn in geographical layout,
is shown below:
The user interface feature set includes:
We describe each in more detail below.
By studying the way the caching hierarchy changes over time, it is possible to formulate future trends, which can provide greater insight into trends and changes in network usage:
Answers to these questions will help network administrators decide where to best allocate resources to achieve an effective global caching hierarchy. Combining free Internet access, dynamic topology depictions, and the ability to view the data temporally, we are able to offer a platform through which researchers and network architects can gain new insights into web caching.
The current data displayed in the public version of the tools conservatively identifies caches with labels such as client37 or client152 to avoid any possible privacy sensitivities; we hope to have future releases append top level domain information, yielding names like client37.au and client152.com, which will allow both the standalone and publically accessible applet versions to color nodes by top level domain.
The research described here is in part based on funding from two National Science Foundation support grants: the NLANR Cache Hierarchy Project, NCR-9796082 and the CAIDA Project, NCR-9711092.
[1] | http://ircache.nlanr.net |
  | |
[2]   |
T. Munzner, E. Hoffman, K. Claffy, "Planet Cache: visualization of the NLANR caching hierarchy", http://ircache.nlanr.net/Cache/cacheviz.html, NLANR, 1996 |
  | |
[3]   |
K Claffy, A. Rousskov, D. Wessels, "NLANR cache hierarchy statistics, visualization, and roi", https://www.caida.org/Presentations/Nanog9802/, Nanog Presentation, 1998 |
  | |
[4] | http://squid.nlanr.net/ |
  | |
[5] | E. Shaffer, "Host name to Latitude/Longitude", http://cello.cs.uiuc.edu/cgi-bin/slamm/ip2ll/ |
  | |
[6]   |
C. Davis, P. Vixie, T. Goodwin, I. Dickinson, "A Means for Expressing Location Information in the Domain Name System", RFC 1876, 1996 |
  | |
[7] | T. Munzner, E. Hoffman, K. Claffy, "IEEE Symposium on Information Visualization", 1996 |
  | |
[8]   |
R. Becker, S. Erick, A. Wilks "IEEE Transactions on Visualization and Computer Graphics", Vol 1, No. 1, pp. 16-21, 1995 |
  | |
[9] | http://www.javasoft.com/products/jdk |
Jaeyeon Jung is a PhD student in Computer Science at the Korea Advanced Institute of Science and Technology. She is currently a visiting researcher at CAIDA (Cooperative Association for Internet Data Analysis). Her research interests include caching visualization, web proxy benchmarks, and web cache statistics.
Evi Nemeth is leading CAIDA's Internet Engineering Curriculum Repository project. She is a visiting faculty member in the Computer Science Department at the University of California, San Diego on leave from the University of Colorado, Boulder.
Duane Wessels heads UCSD's NLANR (National Laboratory for Applied Network Research) based at the National Center for Atmospheric Research in Boulder, Colorado. He is researching and developing a global Web caching infrastructure using the Squid Cache software. Other interests include Internet measurement, multicast, and missions to Mars.
k claffy is a research scientist with CAIDA (Cooperative Association for Internet Data Analysis) based at the University of California's San Diego Supercomputer Center (SDSC). Her research interests include traffic analysis, impact of high-demand (e.g., multimedia) applications on the integrity of current infrastructure, equity among users, and changing financial structure of the Internet.