What are the most important things to make cacheable?
A good strategy is to identify the most popular, largest objects (especially images) and work with them first.
How can I make my pages as fast as possible with caches?
The most cacheable object is one with a long freshness time set. Validation does help reduce the time that it takes to see an object, but the cache still has to contact the origin server to see if it's fresh. If the cache already knows it's fresh, it will be served directly.
I understand that caching is good, but I need to keep statistics on how many people visit my page!
If you must know every time a page is accessed, select ONE small object on a page (or the page itself), and make it uncacheable, by giving it a suitable headers. For example, you could refer to a 1x1 transparent uncacheable image from each page. The Referer header will contain information about what page called it.
Be aware that even this will not give truly accurate statistics about your users, and is unfriendly to the Internet and your users; it generates unnecessary traffic, and forces people to wait for that uncached item to be downloaded. For more information about this, see On Interpreting Access Statistics in the references.
I've got a page that is updated often. How do I keep caches from giving my users a stale copy?
The Expires header is the best way to do this. By setting the server to expire the document based on its modification time, you can automatically have caches mark it as stale a set amount of time after it is changed.
For example, if your site's home page changes every day at 8am, set the Expires header for 23 hours after the last modification time. This way, your users will always get a fresh copy of the page.
See also the Cache-Control: max-age header.
How can I see which HTTP headers are set for an object?
To see what the Expires and Last-Modified headers are, open the page with Netscape and select 'page info' from the View menu. This will give you a menu of the page and any objects (like images) associated with it, along with their details.
To see the full headers of an object, you'll need to manually connect to the Web server using a Telnet client. Depending on what program you use, you may need to type the port into a separate field, or you may need to connect to www.myhost.com:80 or www.myhost.com 80 (note the space). Consult your Telnet client's documentation.
Once you've opened a connection to the site, type a request for the object. For instance, if you want to see the headers for http://www.myhost.com/foo.html, connect to www.myhost.com, port 80, and type:
GET /foo.html HTTP/1.1 [return] Host: www.myhost.com [return][return]
Press the Return key every time you see [return]; make sure to press it twice at the end. This will print the headers, and then the full object. To see the headers only, substitute HEAD for GET.
My pages are password-protected; how do proxy caches deal with them?
By default, pages protected with HTTP authentication are marked private; they will not be cached by shared caches. However, you can mark authenticated pages public with a Cache-Control header; HTTP 1.1-compliant caches will then allow them to be cached.
If you'd like the pages to be cacheable, but still authenticated for every user, combine the Cache-Control: public and no-cache headers. This tells the cache that it must submit the new client's authentication information to the origin server before releasing the object from the cache.
Whether or not this is done, it's best to minimize use of authentication; for instance, if your images are not sensitive, put them in a separate directory and configure your server not to force authentication for it. That way, those images will be naturally cacheable.
Should I worry about security if my users access my site through a cache?
SSL pages are not cached (or unencrypted) by proxy caches, so you don't have to worry about that. However, because caches store non-SSL requests and URLs fetched through them, you should be conscious of security on unsecured sites; an unscrupulous administrator could conceivably gather information about their users.
In fact, any administrator on the network between your server and your clients could gather this type of information. One particular problem is when CGI scripts put usernames and passwords in the URL itself; this makes it trivial for others to find and user their login.
If you're aware of the issues surrounding Web security in general, you shouldn't have any surprises from proxy caches.
I'm looking for an integrated Web publishing solution. Which ones are cache-aware?
It varies. Generally speaking, the more complex a solution is, the more difficult it is to cache. The worst are ones which dynamically generate all content and don't provide validators; they may not be cacheable at all. Speak with your vendor's technical staff for more information, and see the Implementation notes below.
My images expire a month from now, but I need to change them in the caches now!
The Expires header can't be circumvented; unless the cache (either browser or proxy) runs out of room and has to delete the objects, the cached copy will be used until then.
The most effective solution is to rename the files; that way, they will be completely new objects, and loaded fresh from the origin server. Remember that the page that refers to an object will be cached as well. Because of this, it's best to make static images and similar objects very cacheable, while keeping the HTML pages that refer to them on a tight leash.
If you want to reload an object from a specific cache, you can either force a reload (in Netscape, holding down shift while pressing 'reload' will do this by issuing a Pragma: no-cache request header) while using the cache. Or, you can have the cache administrator delete the object through their interface.
I run a Web Hosting service. How can I let my users publish cache-friendly pages?
If you're using Apache, consider allowing them to use .htaccess files, and provide appropriate documentation.
Otherwise, you can establish predetermined areas for various caching attributes in each virtual server. For instance, you could specify a directory /cache-1m that will be cached for one month after access, and a /no-cache area that will be served with headers instructing caches not to store objects from it.
Whatever you are able to do, it is best to work with your largest customers first on caching. Most of the savings (in bandwidth and in load on your servers) will be realized from high-volume sites.
Sun Sep 10 20:46:50 EDT 2006