How is Memcached different from traditional Java caching?

Developed by Danga Interactive to improve site performance on LiveJournal.com, Memcached's distributed architecture today supports the exponential scalability of social web applications like Twitter, Facebook, and Wikipedia. In this two-part tutorial, Sunil Patil introduces Memcached's distributed hashtable architecture and gets you started with using it to cache data for your own database-driven Java enterprise applications.

This tutorial introduces you to using Memcached to improve the performance of Java enterprise applications. The first half starts with an overview of traditional Java caching architectures as compared to Memcached's architecture. We'll also get Memcached installed on your machine and I'll introduce you to the setup and commands for working with Memcached via Telnet. In the second half we'll develop a "Hello Memcached" client program in Java, which we'll use to look under the hood of a spymemcached client. You'll also learn about using Memcached to reduce the load on your database server, and using it to cache dynamically generated page markup. Finally, we'll consider some advanced options for configuring spymemcached clients.

Overview of Memcached and Java caching architectures

Java caching frameworks like EHCache and OSCache are essentially HashMap objects in your application code. Whenever you add a new object to the cache it will be stored in the memory of your application. This strategy works fine for storing small amounts of data, but it doesn't work for caching more than few gigabytes (GB). The designers of the Memcached server took a distributed architectural approach, which allows for system scalability. As a result, you can use Memcached to cache a huge amount of data.

The architecture of Memcached consists of two pieces. First is a Memcached server that runs in its own process. If you want to scale your application, you can install and run the Memcached server on additional machines. Instances of the Memcached server are not aware of each other. The Memcached client, the second piece of the Memcached system, does know about each of the servers. The client is responsible for picking up the server for each cache entry and either storing or getting the cache entry -- a process I'll discuss in detail later in the article.

If you have some experience working on Java EE web applications chances are that you've previously used an open source Java caching framework such as EHCache or OSCache. You might also have used a commercial caching framework that shipped as part of your application server, such as DynaCache (which ships with IBM WebSphere Application Server) or JBoss Cache (which ships with JBoss AS). Before we get into the hands-on learning part of this tutorial, it's important to understand how Memcached differs from these traditional Java caching frameworks.

Using a traditional Java cache

Using a traditional Java caching framework is quite easy, regardless of whether you choose an open source or commercial option. For an open source framework such as EHCache or OSCache, you would need to download the binaries and add necessary JAR files to the classpath of your application. You might also need to create a configuration file, which you would use to configure the size of the cache, disk offload, and so on. For a caching framework that came bundled with an application server you typically would not have to download any additional JARs because they would be bundled with the software.

Figure 1. Architecture of traditional Java caching (click to enlarge)

After adding support for the caching framework in your application, you could start using it by creating a CacheManager object and getting and setting cache entries in it. Under the hood, the caching framework would create the CacheManager objects in the same JVM where your application was running. Every time you added a cache entry, that object would also be added to some type of hashtable maintained by the caching framework.

If your application server were running on multiple nodes, then you might also want support for distributed caching. In a distributed cache system, when you add an object in cache on AppServer1, that object is also available on AppServer2 and AppServer3. Traditional Java caches use replication for distributed caching, meaning that when you add a cache entry on AppServer1 it is automatically replicated to the other app servers in your system. As a result, the entry will be available on all of your nodes.

Using Memcached

In order to use Memcached for caching you must first download and install the Memcached server for the platform of your choice. Once you've installed the Memcached server it will listen on either a TCP or UDP port for caching calls.

Figure 2. Architecture of Memcached (click to enlarge)

Next, you'll download a Java client for Memcached and add the client JARs to your application. After that, you can create a Memcached client object and start calling its method to get and set cache entries. When you add an object to the cache, the Memcached client will take that object, serialize it, and send a byte array to the Memcached server for storage. At that point, the cached object might be garbage collected from the JVM where your application is running.

When you need that cached object, you can call the Memcached client's get() method. The client will take the get request, serialize it, and send it to the Memcached server. The Memcached server will use the request to look up the object from the cache. Once it has the object, it will return the byte array back to the Memcached client. The Memcached client object will then take the byte array and deserialize it to create the object and return it to your application.

Even if your application is running on more than one application server, all of them can point to the same Memcached server and use it for getting and setting cache entries. If you have more than one Memcached server, the servers won't know about each other. Instead, you'll configure your Memcached client so that it knows all the available Memcached servers. For example, if your application creates a Java object on AppServer1 and calls the set() method of Memcached, then the Memcached client will figure out which Memcached server that entry goes to. It will then start communicating with that Memcached server only. Likewise, when your code in AppServer2 or AppServer3 tries to get an entry, the Memcached client will first figure out which server that entry is stored on, and then communicate with that server only.

Memcached client logic

In its default configuration, the Memcached client uses very simple logic to select the server for a get or set operation. When you make a get() or set() call, the client takes the cache key and call its hashCode() method to get an integer such as 11. It then takes that number and divides it by number of available Memcached servers, say two. It then takes the value of the remainder, which is 1 in this case. The cache entry will go to Memcached server 1. This simple algorithm ensures that the Memcached client on each of your application servers always chooses the same server for a given cache key.

Installing Memcached

Memcached runs on Unix, Linux, Windows, and MacOSX. You can either download the Memcached source and compile it or you can download the binaries compiled by someone else and use them to install Memcached. Here I'll walk through the process of downloading the binaries for the platform of your choice; see Resources if you prefer to compile from source.

The following installation instructions are for a Windows XP 32-bit machine. See Resources for installation instructions for other platforms such as Linux. Also note that the sample code for this article was developed on a Windows XP 32-bit machine, though it should work on any other platform.

Jellycan code has a modified version of Memcached that is easy and efficient to work with. Start here by downloading the win32 binary ZIP file

Expand Memcached-<versionnumber>-win32-bin.zip on your hard disk. Note that all it contains is memcached.exe. Execute this file to start the Memcached server.

Now execute memcached.exe -d install to register memcached.exe as a service. You'll be able use the Services console to start and stop the Memcached server.

CL start/stop

Try starting and stopping the Memcached server from command-line instead of from a services panel. Doing that will give you more flexibility to try out different command-line options and figure out the best possible configuration for your requirements.

When you execute the memcached.exe without any command-line options, by default the Memcached server will start up on port 11211 with 64 MB of memory. In some cases you might want to have more granular control of the configuration. For example, say port 11211 is used by some other process on your machine and you want the Memcached server to use port 12000; or if you were starting Memcached server in a QA or production environment you would want to give it more memory than the default 64 MB. In these cases you could use command-line options to customize the server's behavior. Executing the memcache.exe -help command will yield a complete list of command-line options like the ones shown in Figure 3.

Connect with Memcached via Telnet

After the Memcached server is started it listens on the port you've assigned it to. The Memcached client connects to the server on either the TCP or UDP port, sends commands and receives responses, and eventually closes the connection. (See Resources for details of the protocol the client uses to communicate with the server.)

You can connect to your Memcached server in a variety of ways. If you're using a Java client, as we'll do in the second half of this tutorial, you'll be able to access a simple API for storing and getting objects from the cache. Alternately, you could use a Telnet client to connect to the server directly. Knowing how to use the Telnet client to communicate with the Memcached server is important for debugging the Java client, so we'll start there.

Telnet commands

First you'll need to use the Telnet client of your choice to connect to the Memcached server. On a Windows XP machine, you can simply execute telnet localhost 11211 assuming the Memcached server is running on the same machine and listening on the default 11211 port. The following commands are essential for working with Memcached via Telnet:

set adds a new item to the cache. The call is: Set <keyName> <flags> <expiryTime> <bytes>. You can type the actual value that should be stored on the next line. If you dont want the cache entry to expire then enter 0 as the value.

get returns the value of the cache key. Use get <keyName> to get the value of the keyName.

add adds a new key only if it does not already exist. For instance: add <keyName> <flags> <expiryTime> <bytes>

replace will replace a value only if the key exists. For instance: replace <keyName> <flags> <expiryTime> <bytes>

delete deletes the cache entry for the key. You can use the call delete <keyName> to delete value of the keyName.

The screenshot in Figure 4 represents a sample interaction with the Memcached server via Telnet. As you can see, the Memcached server provides feedback to each command, such as STORED, NOT_STORED, and so on.

Conclusion to Part 1

So far we've briefly discussed the differences between Memcached's distributed architecture and more traditional Java cache systems. We've also set up a Memcached implementation in your development environment, and you've practiced connecting to Memcached via Telnet. In the next part of this tutorial we'll use the Java client spymemcached to set up a distributed caching solution for a sample Java application. In the process, you'll learn a lot more about Memcached and how it can improve the performance of your Java EE applications.

Sunil Patil is a Java EE Architect working for Avnet Technology in San Francisco, California. He is the author of Java Portlets 101 (SourceBeat, April 2007) and has written numerous articles published by JavaWorld, IBM developerWorks, and O'Reilly Media. In addition to being an IBM Certified WebSphere Portal Server Application Developer and Administer, he is a Sun Microsystems Certified Java Programmer, a Web component developer, and a business component developer. You can view Sunil's blog at http://www.webspherenotes.com.