Background

A little while ago I’ve mentioned that I was working on client-side HTTP caching (using Ehcache) for REST clients. After a little hiatus, I’m back to complete the unfinished business, precisely dealing with cache validation support (using ETag, Last-Modified, If-None-Match, If-Modified-Since headers). I’ve also explained the approach I was taking to implement the solution, using Java’s ResponseCache mechanism.

However, I think I’ve hit a dead end implementing the solution using that approach. I will try to explain it here and hope that smarter people out there provide their thoughts.

Overview

Let’s start with a simple straightforward scenario. Quick control flow of the Java protocol handler approach:

  1. A client application gets an instance of sun.net.www.protocol.http.HttpUrlConnection, which extends java.net.HttpUrlConnection, via url.openConnection(). This hanlder instantiates a registered instance of java.net.ResponseCache, if there is one available.
  2. When a request is sent to the server via HttpUrlConnection, protocol handler first checks whether the representation is present in the cache by calling the get() method of the ResponseCache. If it is in the cache send it to the client, else send the request to the origin server.
  3. If the request is sent to the origin server, and if the response is any of 200, 301, or other “cache-able” statuses, the handler then calls the put() method of the ResponseCache to potentially cache the representation.
  4. ResponseCache would store the element in the cache. It uses Expires, Date, Cache-Control headers to determine time to live and set it on the element. Let’s ignore expiration model for this post as the focus is on validation.

Note: You have to write your own concrete implementation for ResponseCache to store and retrieve elements from the cache. Java doesn’t provide an out-of-the-box implementation for it, but it provides a framework for doing so.

Validating Cached Element

Now let’s look at a scenario that does cache validation. First, what is validation? There are two headers a server may send for validating the resource: a timestamp (Last-Modified) indicating when the resource was last changed, and an entity tag value (ETag). Server may choose to send only one of these headers, as both of them try to achieve the same purpose.

Responding to a request for resource X, a server sends along one or both these headers to the client along with X’s representation. On any subsequent request for resource X — the client may honor these response headers, and sends two of its own headers: If-Modified-Since (with the value of the Last-Modified header) and If-None-Match (with the value of ETag header). Former requests the server to send the representation only if the resource is modified since the Last-Modified time it has got, and the latter asks to send the representation only on the change of ETag value that it supplied.

If there is a change in the resource, a server sends an updated representation, with new values for ETag and/or Last-Modified headers. This scenario works fine with no issues as you get a 200 response back, and the protocol handler handles this just fine (similar to the straightforward scenario mentioned above). The issue that I’m going to mention is with the case in which the server determines that there is no change with the resource, and sends back a status 304, NOT MODIFIED, with no body in the content.

See the following sequence of events that end up with a status code 304 from the server (click on the image to enlarge):

Conditional GET

Issues with Java’s HTTP Handler

  • A client or client-side cache should first check whether a cached representation is available before sending a conditional GET of this sort. (There is no point sending Not-Modified-Since and/or If-None-Match headers if it doesn’t have a representation to fall back on). Java’s cache handler framework using HttpUrlConnection does not provide an option to do so.Let’s see the relevant source code of sun.net.www.protocol.http.HttpUrlConnection, lines 399-410:
    // Set modified since if necessary
    long modTime = getIfModifiedSince();
    if (modTime != 0) {
        Date date = new Date(modTime);
        //use the preferred date format according to RFC 2068(HTTP1.1),
        // RFC 822 and RFC 1123
        SimpleDateFormat fo = new SimpleDateFormat(
            "EEE, dd MMM yyyy HH:mm:ss 'GMT'", Locale.US);
        fo.setTimeZone(TimeZone.getTimeZone("GMT"));
        requests.setIfNotSet("If-Modified-Since", fo
            .format(date));
    }
    

    The above block of code adds If-Modified-Since header but makes no checks whatsoever whether the representation is available in the cache.

  • I don’t see a reference to If-None-Match header in the source. So if the client sends that header, it will be sent to the origin server without an availability check
  • In case, if there is no representation in the cache, the cache must have an ability to remove the validation headers from the request before sending the request to the origin server. I don’t see this framework supporting such a behavior.

Thoughts??

// Set modified since if necessary
0400:                    long modTime = getIfModifiedSince();
0401:                    if (modTime != 0) {
0402:                        Date date = new Date(modTime);
0403:                        //use the preferred date format according to RFC 2068(HTTP1.1),
0404:                        // RFC 822 and RFC 1123
0405:                        SimpleDateFormat fo = new SimpleDateFormat(
0406:                                "EEE, dd MMM yyyy HH:mm:ss 'GMT'", Locale.US);
0407:                        fo.setTimeZone(TimeZone.getTimeZone("GMT"));
0408:                        requests.setIfNotSet("If-Modified-Since", fo
0409:                                .format(date));
0410:                    }

Share and Enjoy:
  • del.icio.us
  • DZone
  • Digg
  • StumbleUpon
  • Technorati
  • Reddit
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed
  • Tumblr
  • HackerNews

You may also like:

  1. HTTP Response Cache Mechanism in Java
  2. HTTP Response Caching with Ehcache