Posts Tagged Ehcache

Greg Luck’s Ehcache Presentation

Few days ago, at Philly JUG, Greg Luck (CTO Ehcache at Terracotta) spoke about Ehcache and Terracotta. The event was attended by over 100 professionals in the area.

History

Greg started off with a brief history on the progress of Ehcache over the years [I still remember those early conversations that I had with Greg about 5 years ago when I first contributed to the project]. The project was never static, in fact far from it, there has been a steady progress from the stage where it was simple-yet-powerful standalone cache to a well-implemented extensible distributed framework.

In the early stages, Greg was not too keen on building a distributed cache. Once the goal of getting out a great standalone cache was achieved, and that coupled with contributions from various people (and feature requests) the quest towards distributed cache began. This post Ehcache goes distributed on his blog explains the thought process of that time quite well.

Founded in 2003 as a fork of one of the other open source caching frameworks (Apache JCS). It progressed steadily with additions like Hibernate integration, web caching, distributed caching, Cache Server with REST and SOAP services. Terracotta acquired Ehcache in 2009.

Ehcache Configuration and Performance

Greg explained the basic configuration of Ehcache using Spring's Pet Clinic as an example -- configuring Hibernate and using Ehcache as the second-level cache. He went over configuration options and about the cache eviction algorithms.

I think the performance discussion evoked quite a bit of interest in the audience (well, who doesn't like talking about benchmarks and pretty charts! ;) ) Arguably, Ehcache is the one of the best out there, in terms of performance. If you want to know more about the tests check out the source code yourself (requires Terracotta user account to browse that repository, registration is free).

Here are a few performance figures from the presentation, showing Ehcache's superiority in terms of performance. The speaker also demonstrated the performance figures of Hibernate read and write with Ehcache as the second-level cache (not shown below). Tests were performed in a cluster of 8 client JVMs (1.75G heap), 1 Terracotta server (6G heap) and using MySql

Get (Read) and Put (Write) performance charts below:

Ehcache in-process vs Memcached:

Ehcache in-process has to be faster than Memcached. If not for anything, the basic setup of in-process should have an upper hand over the serialization and network overhead for the Memcached setup. I'd be more interested in more apples-to-apples comparison of Ehcache server (REST-based) vs Memcached. Mentioned that to Greg and I'm sure he is on to it next ...

Performance conclusions:
After App servers and DBs tuned by the independent 3rd parties -- there is 30-95% reduction in database load, improved 80 times the read-only performance of MySQL, and notably lower latency.

Ehcache Monitor

Greg demonstrated Ehcache Monitor, a console application for management and monitoring of caches. Couple of main goals of this tool are tuning cache usage and detecting errors. It provides information about Cache hit ratios, hit/miss rates, hits on the database, detailed efficiency of cache regions. It also has some administrative capabilities that I still have to explore. The tool is still in beta, but looks promising. Read more about it here.

New Features in Ehcache 2.0

  • Hibernate 3.3+ caching API: New SPI addresses some of the synchronization issues with the previous versions to better suit for the clustered environment.
  • JTA: From version 2.0, Ehcache acts as an XAResource and participates in JTA transactions. It detects most common transaction managers for various popular application servers.
  • Write-through and write-behind caching: Version 2.0 introduces write-through and write-behind caching. In write-through pattern -- cache acts as a facade to an underlying resource, and a write to the cache causes write to the resource behind it. In write-behind the concept is pretty much the same but the writes happen in an asynchronous fashion. Read more here.
  • Bulk loading: Greg said this is one of the features that's been requested for a while now. Now 2.0 has the ability to significantly speed up bulk loading into caches using the Terracotta server array. Read more here.

Conclusion

All-in-all I think that was a great session, and as always, I enjoy using Ehcache and the association with the project. I also got a chance to hack some code with Greg while he was here in Philadelphia. Hope to contribute more ...

Tags: ,

Java’s HTTP Handler and Cache Validation Issues

Background

A little while ago I've mentioned that I was working on client-side HTTP caching (using Ehcache) for REST clients. After a little hiatus, I'm back to complete the unfinished business, precisely dealing with cache validation support (using ETag, Last-Modified, If-None-Match, If-Modified-Since headers). I've also explained the approach I was taking to implement the solution, using Java's ResponseCache mechanism.

However, I think I've hit a dead end implementing the solution using that approach. I will try to explain it here and hope that smarter people out there provide their thoughts.

Overview

Let's start with a simple straightforward scenario. Quick control flow of the Java protocol handler approach:

  1. A client application gets an instance of sun.net.www.protocol.http.HttpUrlConnection, which extends java.net.HttpUrlConnection, via url.openConnection(). This hanlder instantiates a registered instance of java.net.ResponseCache, if there is one available.
  2. When a request is sent to the server via HttpUrlConnection, protocol handler first checks whether the representation is present in the cache by calling the get() method of the ResponseCache. If it is in the cache send it to the client, else send the request to the origin server.
  3. If the request is sent to the origin server, and if the response is any of 200, 301, or other "cache-able" statuses, the handler then calls the put() method of the ResponseCache to potentially cache the representation.
  4. ResponseCache would store the element in the cache. It uses Expires, Date, Cache-Control headers to determine time to live and set it on the element. Let's ignore expiration model for this post as the focus is on validation.

Note: You have to write your own concrete implementation for ResponseCache to store and retrieve elements from the cache. Java doesn't provide an out-of-the-box implementation for it, but it provides a framework for doing so.

Validating Cached Element

Now let’s look at a scenario that does cache validation. First, what is validation? There are two headers a server may send for validating the resource: a timestamp (Last-Modified) indicating when the resource was last changed, and an entity tag value (ETag). Server may choose to send only one of these headers, as both of them try to achieve the same purpose.

Responding to a request for resource X, a server sends along one or both these headers to the client along with X’s representation. On any subsequent request for resource X -- the client may honor these response headers, and sends two of its own headers: If-Modified-Since (with the value of the Last-Modified header) and If-None-Match (with the value of ETag header). Former requests the server to send the representation only if the resource is modified since the Last-Modified time it has got, and the latter asks to send the representation only on the change of ETag value that it supplied.

If there is a change in the resource, a server sends an updated representation, with new values for ETag and/or Last-Modified headers. This scenario works fine with no issues as you get a 200 response back, and the protocol handler handles this just fine (similar to the straightforward scenario mentioned above). The issue that I'm going to mention is with the case in which the server determines that there is no change with the resource, and sends back a status 304, NOT MODIFIED, with no body in the content.

See the following sequence of events that end up with a status code 304 from the server (click on the image to enlarge):

Conditional GET

Issues with Java's HTTP Handler

  • A client or client-side cache should first check whether a cached representation is available before sending a conditional GET of this sort. (There is no point sending Not-Modified-Since and/or If-None-Match headers if it doesn’t have a representation to fall back on). Java’s cache handler framework using HttpUrlConnection does not provide an option to do so.Let's see the relevant source code of sun.net.www.protocol.http.HttpUrlConnection, lines 399-410:
    // Set modified since if necessary
    long modTime = getIfModifiedSince();
    if (modTime != 0) {
        Date date = new Date(modTime);
        //use the preferred date format according to RFC 2068(HTTP1.1),
        // RFC 822 and RFC 1123
        SimpleDateFormat fo = new SimpleDateFormat(
            "EEE, dd MMM yyyy HH:mm:ss 'GMT'", Locale.US);
        fo.setTimeZone(TimeZone.getTimeZone("GMT"));
        requests.setIfNotSet("If-Modified-Since", fo
            .format(date));
    }
    

    The above block of code adds If-Modified-Since header but makes no checks whatsoever whether the representation is available in the cache.

  • I don't see a reference to If-None-Match header in the source. So if the client sends that header, it will be sent to the origin server without an availability check
  • In case, if there is no representation in the cache, the cache must have an ability to remove the validation headers from the request before sending the request to the origin server. I don't see this framework supporting such a behavior.

Thoughts??

// Set modified since if necessary
0400:                    long modTime = getIfModifiedSince();
0401:                    if (modTime != 0) {
0402:                        Date date = new Date(modTime);
0403:                        //use the preferred date format according to RFC 2068(HTTP1.1),
0404:                        // RFC 822 and RFC 1123
0405:                        SimpleDateFormat fo = new SimpleDateFormat(
0406:                                "EEE, dd MMM yyyy HH:mm:ss 'GMT'", Locale.US);
0407:                        fo.setTimeZone(TimeZone.getTimeZone("GMT"));
0408:                        requests.setIfNotSet("If-Modified-Since", fo
0409:                                .format(date));
0410:                    }

Tags: , ,

HTTP Response Caching with Ehcache

As I mentioned recently on this blog, I'm working in my spare time on HTTP response cache implementation using Ehcache. When all said and done this will be a part of the Ehcache's web module. At this point the work on this is far from over but treat this as a status update.

Motivation

Who are the users of such an implementation? There was a feature request or two from the teams implementing REST clients to use Ehcache on the client side for optionally caching the GET operations based on various headers and directives.

High-level Design

This design leverages Java's Response cache mechanism, introduced back in version 1.5. I've written about it in an earlier post. On a GET request, Java's URLConnection's getInputStream() method invokes appropriate  protocol handler, which in turn interacts with the concrete implementation of the java.net.ResponseCache.

Java's mechanism does not provide any default cache implementation, but you can extend java.net.ResponseCache and write your own. EhcacheResponseCacheAdaptor extends ResponseCache and implements its get() and put() methods. Also, when an instance of EhcacheResponseCacheAdaptor is created it registers itself with the JVM as a default cache.

Whenever the system tries to load a URL (using URLConnection) via a protocol handler it first checks in the cache via get(). If the data is in the cache, it is returned if not a connection is made to the origin server to GET the content. After downloading the content the protocol handler then attempts to put it in the cache via put().

URI Mappings and Cache

The caller after creating a new instance of EhcacheResponseCacheAdaptor calls its addMapping() method to register a specific URI pattern to a cache that is defined in the ehcache config file. For example:

EhcacheResponseCacheAdaptor responseCache =
new EhcacheResponseCacheAdaptor(new CacheManager(););
responseCache.addMapping("http://somedomain.com:9000/rest/customers/",
"sampleCache1");

An instance of Ehcache's CacheManager is passed to the constructor of the adaptor. And then one or more URIs can be mapped to the caches.

Cache elements

If you used or using Ehcache you know that Element needs a key and its value. Adaptor implementation parses the URI, matches that to a cache and the rest of the URI is treated as a key. So if a request is made for http://somedomain.com:9000/rest/customers/12887876320, system identifies that the user intends to use sampleCache1 as the cache, and the remaining part of the URI that is not a part of the mapping is used as the key. In this example 12887876320 is used as the key.

Cacheability

Cacheability determination is made based based on Cache-control (no-cache, max-age=0) and/or Pragma headers, supporting both HTTP 1.0 and 1.1.

Invalidation

Time-to-live (TTL) calculation is made using Cache-control (max-age, min-fresh) and/or Expires headers.  TTL is assigned to the element directly. If the required headers are not provided it uses the values provided from the cache configuration.

Conditional GET

One of the important directives of conditional GET is only-if-cached (of Cache-control header). With this directive the client indicates to the system to fetch the value only from the cache if present, and not to request from the origin server. I've looked briefly into the protocol handler implementation and that seemed to be handling this scenario and perhaps I don't have to do much here.

From the User's point of view

For the end-user once they create the adaptor and add the mappings (as described above) not much is there to be done. They would perform the actions without any specific knowledge of this ResponseCache implementation. Following code would do just fine ..


URL url = new URL("http://somedomain.com:9000/rest/customers/12887876320");

URLConnection urlConnection = url.openConnection();
urlConnection.setDefaultUseCaches(true);
InputStream inputStream = urlConnection.getInputStream();

...

Next Steps

There are still quite a few loose ends in the implementation which need some tightening. The challenge is to provide support for as many HTTP headers from RFC-2616 as reasonable, Etag is one of them that I still have to take a look. More soon ...

Tags:

HTTP Response Cache Mechanism in Java

One of the tasks that I'm currently working on (albeit at a pace not to my liking) is to add a HTTP response cache implementation in ehcache's web module. This will be particularly useful for REST-based clients to optionally cache the responses of the GET requests.

Java has got a built-in mechanism for response caching. This was introduced in Java 5 by adding three abstract classes in java.net package: ResponseCache, CacheRequest, CacheResponse. You need to extend these classes for your own cache implementation.

The flow of events is something like the following:

  • A concrete class of ResponseCache registers with the system by using the static method ResponseCache.setDefault(ResponseCache).
  • There are two methods in the ResponseCache that are invoked by the protocol handlers. get() returns a CacheResponse and put() returns a CacheRequest.
  • When you create a URLConnection and attempt to read content the appropriate stream handler is created and it checks for the content in the cache by invoking ResponseCache.get().
  • If the content is found in the cache, it is returned. Otherwise a request is sent to the origin server, the received response is then passed on to ResponseCache.put() to see if the content is cacheable (based on the response headers) and possiblly store it in the cache.

Bulk of the work of cache-ability determination, placing the resource content in the cache, evicting the content based on the Expires or Date headers, and retrieving the resource will be done by your own cache implementation. Here is where I will be spending bulk of my time satisfying the intricacies of RFC-2616 (Chapter 13) that deals with Caching in HTTP.

This implementation works only for the clients using URLConnection to connect, as Java's response cache mechanism described above works only for URLConnection. But once a pattern is set hopefully we can extend or write adapters for the clients using other mechanisms like Apache Common's HttpClient.

Hope to elaborate more on this soon ...

Tags: ,

ehcache Caching Policies (Contd..)

Following up with my earlier post --

Finally done with the last few changes related with the new caching policies (FIFO, LFU). A new attribute memoryStorePolicy is being added to the cache element in the configuration file:


<cache name="sampleCache1" maxelementsinmemory="10000" eternal="false" overflowtodisk="true" timetoidleseconds="300" timetoliveseconds="600" memorystorepolicy="LFU"/>

memoryStorePolicy attribute is an optional element. Legal values are: LRU, LFU and FIFO. The value defaults to LRU (Least Recently Used) policy.

Next I should update the site documentation for the caching policies ..

Tags:

ehcache Caching Policies

ehcache is one of the open source projects that I have contributed recently. For those who haven't heard about ehcache yet -- it is a simple, fast and thread safe cache for Java. Hibernate community uses ehcache significantly. More about ehcache at http://sourceforge.net/projects/ehcache/.

Provided a patch to support more caching policies than LRU (that it currently supports). FIFO and LFU are the new policies that I have added and submitted the patch. They can be tested only programatically at this point. I still have to add the declarative way of defining the cache policies for FIFO and LFU. Hope I can finish that soon ..

Tags: