Web Caching

Tips for Building a Cache-Aware Site

Besides using freshness information and validation, there are a number of other things you can do to make your site more cache-friendly.

  • Refer to objects consistently - this is the golden rule of caching. If you serve the same content on different pages, to different users, or from different sites, it should use the same URL. This is the easiest and most effective may to make your site cache-friendly. For example, if you use /index.html in your HTML as a reference once, always use it that way.
  • Use a common library of images and other elements and refer back to them from different places.
  • Make caches store images and pages that don't change often by specifying a far-away Expires header.
  • Make caches recognize regularly updated pages by specifying an appropriate expiration time.
  • If a resource (especially a downloadable file) changes, change its name. That way, you can make it expire far in the future, and still guarantee that the correct version is served; the page that links to it is the only one that will need a short expiry time.
  • Don't change files unnecessarily. If you do, everything will have a falsely young Last-Modified date. For instance, when updating your site, don't copy over the entire site; just move the files that you've changed.
  • Use cookies only where necessary - cookies are difficult to cache, and aren't needed in most situations. If you must use a cookie, limit its use to dynamic pages.
  • Minimize use of SSL - because encrypted pages are not stored by shared caches, use them only when you have to, and use images on SSL pages sparingly.
  • use the Cacheability Engine - it can help you apply many of the concepts in this tutorial.

Writing Cache-Aware Scripts

By default, most scripts won't return a validator (e.g., a Last-Modified or ETag HTTP header) or freshness information (Expires or Cache-Control). While some scripts really are dynamic (meaning that they return a different response for every request), many (like search engines and database-driven sites) can benefit from being cache-friendly.

Generally speaking, if a script produces output that is reproducable with the same request at a later time (whether it be minutes or days later), it should be cacheable. If the content of the script changes only depending on what's in the URL, it is cacheble; if the output depends on a cookie, authentication information or other external criteria, it probably isn't.

  • The best way to make a script cache-friendly (as well as perform better) is to dump its content to a plain file whenever it changes. The Web server can then treat it like any other Web page, generating and using validators, which makes your life easier. Remember to only write files that have changed, so the Last-Modified times are preserved.
  • Another way to make a script cacheable in a limited fashion is to set an age-related header for as far in the future as practical. Although this can be done with Expires, it's probably easiest to do so with Cache-Control: max-age, which will make the request fresh for an amount of time after the request.
  • If you can't do that, you'll need to make the script generate a validator, and then respond to If-Modified-Since and/or If-None-Match requests. This can be done by parsing the HTTP headers, and then responding with 304 Not Modified when appropriate. Unfortunately, this is not a trival task.

Some other tips;

  • If you have to use scripting, don't POST unless it's appropriate. The POST method is (practically) impossible to cache; if you send information in the path or query (via GET), caches can store that information for the future. POST, on the other hand, is good for sending large amount of information to the server (which is why it won't be cached; it's very unlikely that the same exact POST will be made twice).
  • Don't embed user-specific information in the URL unless the content generated is completely unique to that user.
  • Don't count on all requests from a user coming from the same host, because caches often work together.
  • Generate Content-Length response headers. It's easy to do, and it will allow the response of your script to be used in a persistent connection. This allows a client (whether a proxy or a browser) to request multiple objects on one TCP/IP connection, instead of setting up a connection for every request. It makes your site seem much faster.

See the Implementation Notes for more specific information.

Cache Now!

Last modified: Sun Sep 10 20:46:50 EDT 2006
