Why do research in Web Caching?

There are a number of reasons to do research in web caching. Many of them were brought up in a discussion on the ircache mailing list. For the original posts, see the thread on 'free bandwidth' in the ircache archives for September 1998.

I generally break up motivations in web caching research as short term and long term. In the short term, there is the desire to improve upon web caching systems as they can affect the performance of the web today. This includes:

  • Reducing the cost of connecting to the Internet. Traffic on the web consumes more bandwidth than any other Internet service, and so any method that can reduce the bandwidth requirements is particularly welcome in parts of the world in which telecommunication services are much more expensive than in the U.S., and in which service is often provided on a cost per bit basis. Even when telecommunication costs are reasonable, a large percentage of traffic is destined to or from the U.S. and so must use expensive trans-oceanic network links.

  • Reducing the latency of today's WWW. One of the most common end-user desires is for more speed. Many people believe web caching can help reduce the "World Wide Wait", and so research that improves upon user latency is quite welcome. Latency improvements are most noticable, again, in areas of the world in which data must travel long distances, accumulating significant latency as a result of speed-of-light constraints, accumulating processing time by many systems over many network hops, and increased likelihood of experiencing congestion as more networks are traversed to cover such distances. High latency as a result of speed-of-light constraints is particularly taxing in satellite communications.

In the long term one might argue that research in web caching is not necessary, as the cost of bandwidth continues to drop. However, research in web caching will continue to reap benefits as:

  • Bandwidth will always have some cost. The cost of bandwidth will never reach zero, even though costs are currently going down as competition increases, the market grows, and economies of scale contribute. No matter what the cost for bandwidth, one will always want to maximize the return on investment, and caching will often help.

  • Non-uniform bandwidth and latencies. Because of physical limitations such as environment and location, as well as financial limitations, there will always be variations in bandwidth and latencies. Caching can help to smooth these effects.

  • Bandwidth demands continue to increase. New users are still getting connected to the Internet in large numbers. Even as growth in the user base slows, demand for increased bandwith will continue as high-bandwidth media such as audio and video increase in popularity. If the price is low enough, demand will always outstrip supply. Additionally, as the availability of bandwidth increases, user expectations are also likely to increase.

  • Hot spots in the web will continue. While some user demand can be predicted (such as for the latest version of a free web browser), and thus have the potential for intelligent load balancing by distributing content among many systems and networks, other popular web destinations come as a surprise, sometimes as a result of current events, but also potentially just as a result of desirable content and word of mouth. These 'hot spots' will continue to affect availability and response time and can be alleviated through web caching.

  • Communication vs. computation. Communication is likely to always be more expensive (to some extent) than computation. We can build CPUs that are much faster than main memory, and so memory caches are utilized. Likewise, caches will continue to be used as computer systems and network connectivity both get faster.

There are a number of parallels to this situation; one is that of main memory in computer systems. The cost of RAM has decreased tremendously over the past decades. Yet relatively few people would claim to have enough, and in fact, demand for additional memory to handle larger applications continues unabated. Therefore, virtual memory (caching) continues to be used strategically instead of purchasing additional memory.

Of course, there are additional secondary benefits to web caching. These include reduced load on the originating servers, and improved reliability as objects may be available in a cache even when the original web server is currently inaccessible.

