Posts Tagged Java

Java’s HTTP Handler and Cache Validation Issues

Background

A little while ago I’ve mentioned that I was working on client-side HTTP caching (using Ehcache) for REST clients. After a little hiatus, I’m back to complete the unfinished business, precisely dealing with cache validation support (using ETag, Last-Modified, If-None-Match, If-Modified-Since headers). I’ve also explained the approach I was taking to implement the solution, using Java’s ResponseCache mechanism.

However, I think I’ve hit a dead end implementing the solution using that approach. I will try to explain it here and hope that smarter people out there provide their thoughts.

Overview

Let’s start with a simple straightforward scenario. Quick control flow of the Java protocol handler approach:

  1. A client application gets an instance of sun.net.www.protocol.http.HttpUrlConnection, which extends java.net.HttpUrlConnection, via url.openConnection(). This hanlder instantiates a registered instance of java.net.ResponseCache, if there is one available.
  2. When a request is sent to the server via HttpUrlConnection, protocol handler first checks whether the representation is present in the cache by calling the get() method of the ResponseCache. If it is in the cache send it to the client, else send the request to the origin server.
  3. If the request is sent to the origin server, and if the response is any of 200, 301, or other “cache-able” statuses, the handler then calls the put() method of the ResponseCache to potentially cache the representation.
  4. ResponseCache would store the element in the cache. It uses Expires, Date, Cache-Control headers to determine time to live and set it on the element. Let’s ignore expiration model for this post as the focus is on validation.

Note: You have to write your own concrete implementation for ResponseCache to store and retrieve elements from the cache. Java doesn’t provide an out-of-the-box implementation for it, but it provides a framework for doing so.

Validating Cached Element

Now let’s look at a scenario that does cache validation. First, what is validation? There are two headers a server may send for validating the resource: a timestamp (Last-Modified) indicating when the resource was last changed, and an entity tag value (ETag). Server may choose to send only one of these headers, as both of them try to achieve the same purpose.

Responding to a request for resource X, a server sends along one or both these headers to the client along with X’s representation. On any subsequent request for resource X — the client may honor these response headers, and sends two of its own headers: If-Modified-Since (with the value of the Last-Modified header) and If-None-Match (with the value of ETag header). Former requests the server to send the representation only if the resource is modified since the Last-Modified time it has got, and the latter asks to send the representation only on the change of ETag value that it supplied.

If there is a change in the resource, a server sends an updated representation, with new values for ETag and/or Last-Modified headers. This scenario works fine with no issues as you get a 200 response back, and the protocol handler handles this just fine (similar to the straightforward scenario mentioned above). The issue that I’m going to mention is with the case in which the server determines that there is no change with the resource, and sends back a status 304, NOT MODIFIED, with no body in the content.

See the following sequence of events that end up with a status code 304 from the server (click on the image to enlarge):

Conditional GET

Issues with Java’s HTTP Handler

  • A client or client-side cache should first check whether a cached representation is available before sending a conditional GET of this sort. (There is no point sending Not-Modified-Since and/or If-None-Match headers if it doesn’t have a representation to fall back on). Java’s cache handler framework using HttpUrlConnection does not provide an option to do so.Let’s see the relevant source code of sun.net.www.protocol.http.HttpUrlConnection, lines 399-410:
    // Set modified since if necessary
    long modTime = getIfModifiedSince();
    if (modTime != 0) {
        Date date = new Date(modTime);
        //use the preferred date format according to RFC 2068(HTTP1.1),
        // RFC 822 and RFC 1123
        SimpleDateFormat fo = new SimpleDateFormat(
            "EEE, dd MMM yyyy HH:mm:ss 'GMT'", Locale.US);
        fo.setTimeZone(TimeZone.getTimeZone("GMT"));
        requests.setIfNotSet("If-Modified-Since", fo
            .format(date));
    }
    

    The above block of code adds If-Modified-Since header but makes no checks whatsoever whether the representation is available in the cache.

  • I don’t see a reference to If-None-Match header in the source. So if the client sends that header, it will be sent to the origin server without an availability check
  • In case, if there is no representation in the cache, the cache must have an ability to remove the validation headers from the request before sending the request to the origin server. I don’t see this framework supporting such a behavior.

Thoughts??

// Set modified since if necessary
0400:                    long modTime = getIfModifiedSince();
0401:                    if (modTime != 0) {
0402:                        Date date = new Date(modTime);
0403:                        //use the preferred date format according to RFC 2068(HTTP1.1),
0404:                        // RFC 822 and RFC 1123
0405:                        SimpleDateFormat fo = new SimpleDateFormat(
0406:                                "EEE, dd MMM yyyy HH:mm:ss 'GMT'", Locale.US);
0407:                        fo.setTimeZone(TimeZone.getTimeZone("GMT"));
0408:                        requests.setIfNotSet("If-Modified-Since", fo
0409:                                .format(date));
0410:                    }

Tags: , ,

Thoughts on Class Reloading and JavaRebel

A couple of days ago, I’ve attended Philly JUG for a talk titled Non-stop Java Development, presented by Ivo Magi of Zero Turnaround. The focus was on their product JavaRebel and how it can speed up development by cutting down on deployment time and on server restarts.

So what problem is JavaRebel trying to solve?  When you make code changes in the application you go through the usual cycle of:

{make-a-change} -> {waitFor: build-deploy-serverRestart} -> {check-the-change}.

JavaRebel claims to eliminate the second step of waiting for build/deploy and the need for restarting application server for the code changes to take affect.

No need to mention that I’m a big fan of open source model, JavaRebel is not in that category, it is a commercial tool. The presenter discussed about the available non-commercial options and why this product is better than the existing tools. As far as the available options (or approaches) there are two major ones — hot deploy and hot swap.

Hot deployment: When a class is changed build-deploy takes place without a need for a container restart. My experience with this so far has resulted in varied degrees of success. It works fine for a few times but had to restart the container/server after a few changes. (I also don’t have enough confidence that it can work effectively in a clustered production environment, but that is not the concentration of this post, it is the development time and how to minimize the wait time). OSGi approach is perhaps very similar to hot deploy except that it works at granular level on individual bundles, as opposed to updating the entire application.

Hot swap: An ability to change the class while a JVM is running, without the application ever noticing that. This is useful for the quick reload of classes during the debug time. If you change some functionality in the existing methods that works fine, but if you add a new method to a class or if you add a new class the changes are not detected.

Another option is that you may try your luck with some class loader hacks — loading the changed class in a different class loader, etc. I never tried it. Again, if anybody used this approach successfully please provide your comments. There are a few frameworks that were mentioned in the presentation that use this approach, I can only remember Tapestry at the moment.

As far as JavaRebel goes it extends the hot swap functionality.  It is implemeted as a JVM plugin (-javaagent). In simple terms an agent in JVM works like an interceptor in front of your main method. The -javaagent command line option is used to register custom instrumentation plugins. What I understood from the talk is that JavaRebel works at the class loader level, it doesn’t create new ones but extend the functionality of the existing class loaders.

Saving the application state after class reloads saves quite a bit of time, especially if you are working on a work flow kind of application and you are testing some step 8 of 10.  You can continue your test from where you stopped and doesn’t have to go back to step 1.

JavaRebel also seems to have some plugins to support popular web frameworks. I remember Spring, Guice and Struts were mentioned.

Obviously, I haven’t tried the tool so I can not make any recommendations. Try it for yourself and see if it suits your needs.

Tags: ,

HTTP Response Cache Mechanism in Java

One of the tasks that I’m currently working on (albeit at a pace not to my liking) is to add a HTTP response cache implementation in ehcache’s web module. This will be particularly useful for REST-based clients to optionally cache the responses of the GET requests.

Java has got a built-in mechanism for response caching. This was introduced in Java 5 by adding three abstract classes in java.net package: ResponseCache, CacheRequest, CacheResponse. You need to extend these classes for your own cache implementation.

The flow of events is something like the following:

  • A concrete class of ResponseCache registers with the system by using the static method ResponseCache.setDefault(ResponseCache).
  • There are two methods in the ResponseCache that are invoked by the protocol handlers. get() returns a CacheResponse and put() returns a CacheRequest.
  • When you create a URLConnection and attempt to read content the appropriate stream handler is created and it checks for the content in the cache by invoking ResponseCache.get().
  • If the content is found in the cache, it is returned. Otherwise a request is sent to the origin server, the received response is then passed on to ResponseCache.put() to see if the content is cacheable (based on the response headers) and possiblly store it in the cache.

Bulk of the work of cache-ability determination, placing the resource content in the cache, evicting the content based on the Expires or Date headers, and retrieving the resource will be done by your own cache implementation. Here is where I will be spending bulk of my time satisfying the intricacies of RFC-2616 (Chapter 13) that deals with Caching in HTTP.

This implementation works only for the clients using URLConnection to connect, as Java’s response cache mechanism described above works only for URLConnection. But once a pattern is set hopefully we can extend or write adapters for the clients using other mechanisms like Apache Common’s HttpClient.

Hope to elaborate more on this soon …

Tags: ,