Serializing Immutables and Singletons with a Serialization Proxy

[Update: 3 January 2011. Blog commenter Konrad correctly points out a serious error in the original version of this post. It turns out you can serialize immutable objects because there’s no requirement that there be a public no-argument constructor. Konrad’s example code in the comment provides an example.]

A question recently came up on our mailing list/newsgroup that asked how to implement java.io.Serializable for classes containing non-serializable singletons. I decided to kick up the abstraction level of the answer and discuss LingPipe’s serialization proxy approach to serializing everything, including singletons and immutables.

Josh Bloch discusses the serialization proxy pattern in the very last section of the second edition of Effective Java. Java lets you take complete control over serialization, and I strongly advise you to use this control for forward compatibility, working around non-serializable (but constructable) member objects like singletons, and minimizing the size of serialized objects.

This class is immutable. The reason to prefer immutable classes is discussed at the beginning of Bloch’s book; the basic idea is that immutables guarantee thread safety (assuming the immutable member variables are thread safe for reads) and consistent object state (guaranteed by the constructor). The only constructor takes two arguments, and sets the final variables for the x and y coordinates of the vector. For illustration, there’s only a single non-trivial method computing vector length, but the same basic argument applies for any immutable class. A more robust implementation might require values to be finite (that is, not infinite and not not-a-number).

If we simply declare Vector2D to extend Serializable, we have a problem. It’ll write OK. [Update: The following is wrong and has thus been redacted.] But trying to read it back in causes a problem when reflection tries to invoke a nullary (no argument) constructor Vector2D(), which doesn’t exist. [Update: See my reply to Konrad’s comment to see where my thinking went wrong and why you still need custom serialization to deal with immutables that have consistency conditions on their members enforced through the constructor.]

Serialization Proxy

Before serializing an object, Java checks to see if the class implements the method Object writeReplace(). If so, when an instance of the class is serialized, the value returned by writeReplace() is serialized instead of the object itself.

When an instance of a class is being deserialized, Java checks to see if it implements Object readResolve(). If it does, after the usual steps of serialization, the return value of readResolve() is returned as the result of serialization.

The serialization proxy typically employs a static, private nested class (called the “serialization proxy”) that is serialized in place of the class through writeReplace(). The proxy itself stores all the information needed to reconstitute the class being serialized. The method readResolve() is then used to return an instance of the original, immutable class. It sounds like a mouthful, but is really quite simple:

That’s it. What’s really nice is that it doesn’t affect the original class (Vector2D) interface. The writeReplace() method and nested static serialization proxy class Vector2D.SerializationProxy are both private.

Serializing Singletons

Suppose we have a singleton and we want to make it serializable. Continuing our earlier example, let’s define a singleton:

(Yes, I know that this isn’t a good example of a singleton because you can construct the same vector directly, but it’ll suffice for this example.)

Because Origin2D is a singleton, we don’t have a public nullary constructor. But even if we did, we’d have the problem that the deserialized instance would be a new instance, thus defeating the singleton pattern. Here all we need to do is to define a readResolve() method to return the singleton:

Complete Control with Externalizable

LingPipe takes even more control by defining the serialization proxies to implement the interface java.io.Externalizable, which extends Serializable. It defines two methods, writeExternal(ObjectOutput) and readExternal(ObjectInput). If an object implements Externalizable, these methods are called instead of the default serialization reflection-based methods which simply try to serialize each of the member objects in turn (simply writing primitive objects directly using DataOutput and DataInput). If a class implements Externalizable, only the fully qualified class name and serial version ID is written automatically; the class itself is responsible for all other serialization and deserialization.

LingPipe’s AbstractExternalizable

LingPipe provides an abstract base class, com.aliasi.util.AbstractExternalizable, which may be used as a serialization proxy. It has two abstract methods, Externalizable‘s writeExternal() and Object readObject(ObjectInput), whose return value is used in the concrete implementation of readResolve(). It also has some static utility methods to read and write objects to streams.

Serial Version ID

I didn’t add serial version IDs to the example classes to keep the explanation of the proxies simple. You should add these to all serializable classes so that if the class changes, it won’t break serialiazation. That is, it preservers forward compatibility. This’ll typically be a declaration of the form:

Every time Java serializes an object, it writes the objects serial version ID. If one is not defined in a static variable named serialVersionUID, one is computed through reflection. So defining one also speeds up the serialization/deserialization process. If you default to the reflection-based version, if the class changes, so will the ID, and you’ll get conflicts in trying to read in objects. If you define the ID yourself from the beginning, the value doesn’t matter; if you have a class released into the wild, you should use Java’s serialver utility (which is distriubuted with Sun’s JDK) to compute the value created by reflection to insure backward compatibility.

Compilable versus Serializable

LingPipe also provides the util.Compilable interface, which defines a method void compileTo(ObjectOutput). We use this method to compile objects like language-model classifiers, whose compiled form are very different from their regular form. Some of these classes, like TradNaiveBayesClassifier, also implement Serializable, which writes and reads back in an object with the same behavior as the one serialized. The deserialized form allows further training; the compiled form is more speed and memory efficient.

1. You still need custom deserialization for singletons so that you always get the same object back.

2. As Josh Bloch explains in Effective Java, you still need to customize serialization for immutable objects as I described above if they have consistency conditions on their members. The problem is that you can serialize an instance, tweak the bytes representing numbers, then deserialize. Because default deserialization bypasses the constructor (easily seen by putting a print in the constructor itself), it bypasses any consistency checks in the constructor.

I think this second point is why I was confused about the base case. Thanks again for pointing out my error.

above if you want to defend against inconsistent instances of immutable classes. The problem is that someone can define a sequence