The only remaining disadvantage of using caching wrappers is their
non-reusability. Every time I wish to add caching to a class, I need to write a
specialized caching wrapper for the target interface. This can be slow and
prone to error.

Version 1.3 of the Java™ 2 SDK, Standard Edition,
introduced support for dynamic proxy classes: special classes that can
choose which interfaces they implement when created at runtime. (Contrast this
to normal Java classes, whose implemented interfaces are known -- and
fixed -- at compile time.) By using this feature, it is possible to write a
generic caching wrapper class, called Memoizer, that determines at
runtime which interface it implements. There is no longer any need for the
CachingBinaryDigitsCalculator wrapper; calls such as this one:

A call to the static method memoize with an object creates a
new proxy instance -- that is, an instance of
java.lang.reflect.Proxy -- that implements the same set of
interfaces as the passed-in object. The set of interfaces is determined by
object.getClass().getInterfaces(). Each proxy instance has an
invocation handler (java.lang.reflect.InvocationHandler) that
handles all method invocations that are made on the instance. In this case,
Memoizer is the invocation handler.

When a method is invoked on the proxy instance (for example,
calculateBinaryDigit), the invoke method is called on
the Memoizer instance owned by the proxy instance. The
invoke method is passed information that allows it to determine
which method was called on the proxy instance, along with the array of
arguments. In this example, the java.lang.Method object given to
the Memoizer instance represents
calculateBinaryDigit, and the array of Object
arguments encapsulate the int parameter n (wrapped as
an Integer). At this point, Memoizer can step in to
do its caching.

The key used for the cache is essentially the method signature: a
combination of the Method object and the array of
Object arguments passed to the invoke method. For
simplicity of implementation, Memoizer has a map of caches, one
for each method, and then combines the array of Object arguments
into a java.util.List object for the key to the relevant cache. It
is convenient to use an implementation of java.util.List for the
key, since it makes strong guarantees about the equals method, thus
ensuring that method signatures match only if all of the corresponding pairs of
arguments are equal.

Once the key has been created, the cache is checked for a previously
computed value. If one is found, it is simply returned. Otherwise, the method
has not been called with these arguments before, so it is invoked reflectively
by calling the invoke method on java.lang.Method. In
this example, the invocation will call calculateBinaryDigit on the
concrete class PiBinaryDigitsCalculator. The return value is
stored in Memoizer's cache before it is returned. The invocation
handler has now finished its work.

When To Use Memoizer

As a general rule, Memoizer can be used whenever a more
traditional cache -- such as those developed above -- is suitable.
More specifically, each method on the interface to be memoized should meet all
of the following criteria:

The return values of the method should not change from call to call.

The method should not have side effects.

The method should not take mutable arguments.

Clearly, it is futile to memoize methods that can change every time you
invoke them. Equally, it is important not to memoize methods that have
intentional side effects (methods that update state in some way), since the
side effects will not be repeated for subsequent calls. The single exception to
this advice is void methods. The implementation of
Memoizer will always invoke a void method, since the
only reason to invoke it is for its side effects.

It can be dangerous to memoize methods with mutable arguments for the same
reason that it can be dangerous to store mutable classes in hash tables. As the
doc comment for java.util.Map states, "The behavior of a map is
not specified if the value of an object is changed in a manner that affects
equals comparisons while the object is a key in the map." If you call a
memoized method with an argument once, mutate the argument object, then call
the method a second time, Memoizer might see this as a cache hit
and not recompute the value. In short, the wrong value could be returned.

Performance

The whole point of using caching is to improve the speed of your
application. However, reflection is notorious for its lack of performance. (The
gap is closing, though: Bloch reports that in the Sun implementation of J2SE 1.4, reflective calls are typically only twice as slow as normal calls, compared to a forty-fold difference in J2SE 1.3.) Memoizer makes heavy use of reflection, so it might seem that it is a poor candidate for achieving performance gains. This conclusion is not necessarily valid; if the
cost of calling the method to be memoized is high compared to the cost of a
reflective method call (that is, the call to the invoke method on
the InvocationHandler interface that Memoizer
implements), then memoizing the method call can result in a speed up.

Running the PiBinaryDigitsCalculator example on a recent J2SE
1.4 JVM shows that the cost of a memoized method call is comparable to the cost
of a normal call for very small values of n (for instance,
-10). The overhead of the memoization is quickly swamped by the
cost of performing the calculation for larger magnitudes of n.
Therefore, clients making frequent use of PiBinaryDigitsCalculator
could certainly benefit from a memoized version.

Unfortunately, the only way to tell if your code can benefit from
memoization is to profile it. Nevertheless, it is very easy to apply a
memoizing wrapper to existing code, and, importantly, it is just as easy to
remove a wrapper. This suggests the following simple optimization strategy:

Choose a class as a candidate for memoization.

Profile your class (and the system as a whole).

If the performance is acceptable, go to step 6; otherwise add a memoizing
wrapper to the class.

Profile your memoized class (and the system as a whole).

If there is little or no significant performance gain, remove the
memoizing wrapper.

Repeat as necessary.

Ideally, you should analyze the impact of adding memoization to a class on
the system as a whole, since only then can you be sure that it is worth adding.
Some methods, even though they are computationally expensive, may not benefit
from being memoized, simply because they are never called with the same
arguments more than once. To gauge this, I am making available a more fully featured version of Memoizer that implements an interface called
CacheStatistics that allows you to retrieve the number of cache
hits and cache misses. You can use this interface during performance testing to
measure whether memoization is "pulling its weight."

Extending Memoizer

It is straightforward to modify the Memoizer class to support
different cache implementation strategies. A common type of cache is the
Least-Recently-Used (LRU) cache that holds a bounded number of
entries. The cache ensures that it does not exceed its maximum size by
discarding the oldest entry, that is, the one that was least recently accessed.
A class might choose to use a LRU cache to protect against memory filling up in
an unbounded manner on long-running applications. To select the type of cache
used, you simply pass an extra argument to the memoize method
that specifies the CacheFactory implementation to use. Here the
LRU cache can hold a maximum of 1000 entries:

Tom White
has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He has written numerous articles for O'Reilly, java.net and IBM's developerWorks, and has spoken at several conferences, including at ApacheCon 2008 on Hadoop.