Web Caching

[Previous Page] [Table of Contents] [Next Page]


A Note About the HTTP

HTTP 1.1 compliance is mentioned several times in this document. As of the time it was written, the protocol is a work in progress. Because of this, it is virtually impossible for an application (whether a server, proxy or client) to be truly compliant. However, the protocol has been openly discussed for some time, and feature-frozen for enough time to allow developers to use the ideas contained in it, like Cache-Control and ETags. When HTTP 1.1 is final, expect more vendors to openly state that their applications are compliant.

Implementation Notes - Web Servers

Generally speaking, it's best to use the latest version of whatever Web server you've chosen to deploy. Not only will they likely contain more cache-friendly features, new versions also usually have important security and performance improvements.

Apache 1.3

Apache (http://www.apache.org/) uses optional modules to include headers, including both Expires and Cache-Control. Both modules are available in the 1.2 or greater distribution.

The modules need to be built into Apache; although they are included in the distribution, they are not turned on by default. To find out if the modules are enabled in your server, find the httpd binary and run httpd -l; this should print a list of the available modules. The modules we're looking for are mod_expires and mod_headers.

  • If they aren't available, and you have administrative access, you can recompile Apache to include them. This can be done either by uncommenting the appropriate lines in the Configuration file, or using the -enable-module=expires and -enable-module=headers arguments to configure (1.3 or greater). Consult the INSTALL file found with the Apache distribution.

Once you have an Apache with the appropriate modules, you can use mod_expires to specify when objects should expire, either in .htaccess files or in the server's access.conf file. You can specify expiry from either access or modification time, and apply it to a file type or as a default. See http://www.apache.org/docs/mod/mod_expires.html for more information, and speak with your local Apache guru if you have trouble.

To apply Cache-Control headers, you'll need to use the mod_headers module, which allows you to specify arbitrary HTTP headers for a resource. See http://www.apache.org/docs/mod/mod_headers.html

Here's an example .htaccess file that demonstrates the use of some headers.

  • .htaccess files allow web publishers to use commands normally only found in configuration files. They affect the content of the directory they're in and their subdirectories. Talk to your server administrator to find out if they're enabled.
### activate mod_expires
ExpiresActive On
### Expire .gif's 1 month from when they're accessed
ExpiresByType image/gif A2592000
### Expire everything else 1 day from when it's last modified
### (this uses the Alternative syntax)
ExpiresDefault "modification plus 1 day"
### Apply a Cache-Control header to index.html
<Files index.html>
Header append Cache-Control "public, must-revalidate"
</Files>
  • Note that mod_expires automatically calculates and inserts a Cache-Control:max-age header as appropriate.

Netscape Enterprise 3.6

Netscape Enterprise Server (http://www.netscape.com/) does not provide any obvious way to set Expires headers. However, it has supported HTTP 1.1 features since version 3.0. This means that HTTP 1.1 caches (proxy and browser) will be able to take advantage of Cache-Control settings you make.

To use Cache-Control headers, choose Content Management | Cache Control Directives in the administration server. Then, using the Resource Picker, choose the directory where you want to set the headers. After setting the headers, click 'OK'. For more information, see http://developer.netscape.com/docs/manuals/enterprise/admnunix/content.htm#1006282

MS IIS 4.0

Microsoft's Internet Information Server (http://www.microsoft.com/) makes it very easy to set headers in a somewhat flexible way. Note that this is only possible in version 4 of the server, which will run only on NT Server.

To specify headers for an area of a site, select it in the Administration Tools interface, and bring up its properties. After selecting the HTTP Headers tab, you should see two interesting areas; Enable Content Expiration and Custom HTTP headers. The first should be self-explanatory, and the second can be used to apply Cache-Control headers.

See the ASP section below for information about setting headers in Active Server Pages. It is also possible to set headers from ISAPI modules; refer to MSDN for details.

Lotus Domino R5

Lotus' (http://www.lotus.com/) servers are notoriously difficult to cache; they don't provide any validators, so both browser and proxy caches can only use default mechanisms (i.e., once per session, and a few minutes of 'fresh' time, usually) to cache any content from them.

Even if this limitation is overcome, Notes' habit of referring to the same object by different URLs (depending on a variety of factors) bars any measurable gains. There is also no documented way to set an Expires, Cache-Control or other arbitrary HTTP header.

Implementation Notes - Server-Side Scripting

Because the emphasis in server-side scripting is on dynamic content, it doesn't make for very cacheable pages, even when the content could be cached. If your content changes often, but not on every page hit, consider setting an Expires header, even if just for a few hours. Most users access pages again in a relatively short period of time. For instance, when users hit the 'back' button, if there isn't any validator or freshness information available, they'll have to wait until the page is re-downloaded from the server to see it.

  • One thing to keep in mind is that it may be easier to set HTTP headers with your Web server rather than in the scripting language. Try both.

CGI

CGI scripts are one of the most popular ways to generate content. You can easily append HTTP response headers by adding them before you send the body; Most CGI implementations already require you to do this for the Content-Type header. For instance, in Perl;

#!/usr/bin/perl
print "Content-type: text/html\n";
print "Expires: Thu, 29 Oct 1998 17:04:19 GMT\n";
print "\n";
### the content body follows...

Since it's all text, you can easily generate Expires and other date-related headers with in-built functions. It's even easier if you use Cache-Control: max-age;

print "Cache-Control: max-age=600\n";

This will make the script cacheable for 10 minutes after the request, so that if the user hits the 'back' button, they won't be resubmitting the request.

The CGI specification also makes request headers that the client sends available in the environment of the script; each header has 'HTTP_' appended to its name. So, if a client makes an If-Modified-Since request, it may show up like this:

HTTP_IF_MODIFIED_SINCE = Fri, 30 Oct 1998 14:19:41 GMT 

Server Side Includes

SSI (often used with the extension .shtml) is one of the first ways that Web publishers were able to get dynamic content into pages. By using special tags in the pages, a limited form of in-HTML scripting was available.

Most implementations of SSL do not set validators, and as such are not cacheable. However, Apache's implementation does allow users to specify which SSI files can be cached, by setting the group execute permissions on the appropriate files, combined with the XbitHack full directive. For more information, see http://www.apache.org/docs/mod/mod_include.html

PHP 3

PHP (http://www.php.net/) is a server-side scripting language that, when built into the server, can be used to embed scripts inside a page's HTML, much like SSI, but with a far larger number of options. PHP can be used as a CGI script on any Web server (Unix or Windows), or as an Apache module.

By default, objects processed by PHP are not assigned validators, and are therefore uncacheable. However, developers can set HTTP headers by using the Header() function.

For example, this will create a Cache-Control header, as well as an Expires header three days in the future:

<?php
  Header("Cache-Control: must-revalidate");

  $offset = 60 * 60 * 24 * 3;
  $ExpireString = "Expires: " . gmdate("D, d M Y H:i:s", time() + $offset) . " GMT";
  Header($ExpireString);
?>

Remember that the Header() function MUST come before any other output.

As you can see, you'll have to create the HTTP date for an Expires header by hand; PHP doesn't provide a function to do it for you. Of course, it's easy to set a Cache-Control: max-age header, which is just as good for most situations.

For more information, see http://www.php.net/manual/function.header.php3

Cold Fusion 4.0

Cold Fusion, by Allaire (http://www.allaire.com/) is a commercial server-side scripting engine, with support for several Web servers on Windows and Solaris.

Cold Fusion makes setting arbitrary HTTP headers relatively easy, with the CFHEADER tag. Unfortunately, setting date-related functions in Cold Fusion isn't easy as Allaire's documentation leads you to believe; their example for setting an Expires header, as below, won't work.

<CFHEADER NAME="Expires" VALUE="#Now()#">

It doesn't work because the time (in this case, when the request is made) doesn't get converted to a HTTP-valid date; instead, it just gets printed as a representation of Cold Fusion's Date/Time object. Most clients will either ignore such a value, or convert it to a default, like January 1, 1970.

Cold Fusion's date formatting functions make it difficult generate a date that is HTTP-valid; you'll need to either use a combination of DateFormat, Hour, Minute and Second, or roll your own. Of course, you can still use the CFHEADER tag to set Cache-Control: max-age and other headers.

Also, Remember that Web server headers are passed through with some implementations (such as CGI); check yours to determine whether you can use this to your advantage, by setting headers on the server instead of in Cold Fusion.

ASP

Active Server Pages, build into IIS and now becoming available in other implementations, also allow you to set HTTP headers. For instance, to set an expiry time, use the properties of the Response object in your page, like this:

<% Response.Expires=1440 %>

specifying the number of minutes from the request to expire the object. Likewise, absolute expiry time can be set like this (make sure you format HTTP date correctly):

<% Response.ExpiresAbsolute=#May 31,1996 13:30:15 GMT# %>

Cache-Control headers can be added like this:

<% Response.CacheControl="public" %>
  • When setting HTTP headers from ASPs, make sure you either place the Response method calls before any HTML generation, or use Response.Buffer to buffer the output.
  • Note that ASPs set a Cache-Control: private header by default, and must be declared public to be cacheable by HTTP 1.1 shared caches. While you're at it, consider giving them an Expires header as well.

[Previous Page] [Table of Contents] [Next Page]

Cache Now!

 Copyright ©1999-2006
 All rights reserved.

Last modified: Sun Sep 10 20:46:50 EDT 2006
Comments, corrections, and suggestions gratefully accepted.