Site hosted by Angelfire.com: Build your free website today!


Caching for Java Applications


Caching is one of the primary techniques to decrease the response times of applications that frequently access data stores. These data stores could be directory servers, relational databases or file systems. Caching results of DNS lookups significantly reduces the response times of browsers. Caching frequently accessed web pages decreases the response times of web servers. Caching is even used by the CPU for faster access to data stored in memory. Similarly there are a whole host of scenarios in which caching mechanisms come very handy.
In this article I will discuss the implementation of a caching component in Java. We will use the MRU (Most Recently Used) caching algorithm. Currently, we will concentrate on the implementation of the cache itself. We can add additional features like Cache Managers as we go along. We will also look at features like a TTL for cache entries and tracking data store changes ... but that's for later.

Operation of a cache
Entries in the cache are stored with a key for each entry. Entries are read or added or removed based on this key. When an application using a caching mechanism needs to access the data store, it recreates the key for the entry and checks if it is present in the cache. If the entry is present, the entry from the cache is returned. If the entry is not present, it is retrieved from the data store and is added to the cache.

Caching Issues
Caching comes with its own set of constraints that require some amount of house keeping. Lets take a look at some of the challenges before us while implementing a caching mechanism.
Constraining the size of the cache is the biggest issue. If we had caches without any limitations on the size of the cache, they will take up a lot of memory space. In addition, searching for an entry in the cache will take longer and would defeat the whole purpose of caching. We need to do is come up with algorithms that constrain the size of the cache to either a specific number of entries or a specific memory size.

Another issue is that of synchronization with the data in the store (RDBMS, file system etc.). What if the data in the store itself changes? Our cache will become dirty, i.e. the cache will no longer contain the latest data. There are two methods to resolve this - maintaining a TTL for cache entries or using event notification.

A Time-To-Live (TTL) for a cache specifies the time after which the cache becomes invalid. The cache is emptied when the TTL for the cache expires. The cache is then rebuilt again or all the entries in the cache at the time it was flushed are read back into the cache. We could also specify a TTL for each entry and rewrite the entry into the cache from the store when the TTL for the entry expires.

Event notification is a better way to synchronize caches with data stores. In this mechanism, an event listener is registered with the data store. This listener tracks the changes occurring on the data store. If any entry present in the cache is changed (in the store) the entry is re-written into the cache.

The MRU algorithm
The 'Most Recently Used' technique is a simple algorithm that limits the size of the cache to a pre-defined number of entries or a pre-defined size (in memory). In this algorithm, the most recently used entry is at the top of the list and entries are removed from the end of the list. As is obvious, we will need to maintain a record of most recent cache accesses. A cache implementing the MRU algorithm is called an MRU cache. Here is how an MRU cache operates:

  • If the cache is not full, a new object is added at the start of the list.
  • If the cache is not full and an entry is read from the cache, the entry is moved to the start of the list.
  • If the cache is full, new objects that are not in the cache are added at the start of the list after removing an entry from the end of the list.

The data structures we will use for implementing this algorithm are a HashMap and a LinkedList. The HashMap will store the cache entries and the LinkedList will be used for keeping a record of the most recent cache access. The LinkedList will store the keys for cache entries. Most recent accesses will be moved to the top of the list and entries will be removed from the end of the list. The Hashmap stores entries that are retrieved based on their key values.

The source code for our cache is available here. In addition we also define a CacheException which is thrown whenever a cache level exception occurs.

Object Key Generation
The key generation for cache entries is a non-trivial task. We could make it trivial by just storing the entire search string as the key and searching for these strings before every data store read operation. But, this approach would take up quite a lot of memory in space and also slow down our search operations on the cache. Also, how would you handle cases in which there is not one search string, but a number of search arguments?
The method we will use is to convert the search arguments into a unique long integer and use this long integer as the key. The following code snippet shows how it can be done:

for (int i=0;i<searchArgs.length;i++)
{
	key = key + appendString(searchArgs[i]);
}
long value = getCRC32(key.getBytes());
				
The appendString() method adds a delimiter at the end of the string and appends the next search argument. getCRC32() method returns the CRC32 checksum value for the byte array. This value will be unique for every unique set of search arguments. Long integers also make for faster compare operations and are more compact in memory.

Quo Vadis
We just looked at the implementation of a very basic cache for Java applications. From here, we build more complex caching mechanisms that support TTLs and fixed sizes in memory.