Saturday, 7 July 2012

Item 75: Consider using a custom serialized form

Do not accept
the default serialized form without first considering whether it is
appropriate. Accepting the default serialized form should be a conscious
decision that this encoding is reasonable from the standpoint of flexibility,
performance, and correctness. Generally speaking, you should accept the default
serialized form only if it is largely identical to the encoding that you would
choose if you were designing a custom serialized form.

The default
serialized form is likely to be appropriate if an object’s physical
representation is identical to its logical content. For example,
the default serialized form would be reasonable for the following class, which
simplistically represents a person’s name:

// Good candidate for default serialized form

public class Name implements Serializable {

/**

* Last name. Must be non-null.

* @serial

*/

private final String lastName;

/**

* First name. Must be non-null.

* @serial

*/

private final String firstName;

/**

* Middle name, or null if there is none.

* @serial

*/

private final String middleName;

... // Remainder omitted

}

Logically
speaking, a name consists of three strings that represent a last name, a first
name, and a middle name. The instance fields in Name
precisely
mirror this logical content.

Even if you
decide that the default serialized form is appropriate, you often must provide
a readObject method to ensure
invariants and security. In the case of Name, the readObject method must ensure that lastName and first-
Name are
non-null. This issue is discussed at length in Items 76 and 78.

Note that
there are documentation comments on the lastName, firstName, and middleName
fields,
even though they are private. That is because these private fields define a
public API, which is the serialized form of the class, and this public API must
be documented. The presence of the @serial
tag
tells the Javadoc utility to place this documentation on a special page that
documents serialized forms.

Near the
opposite end of the spectrum from Name, consider the
following class, which represents a list of strings (ignoring for the moment
that you’d be better off using one of the standard List implementations):

// Awful candidate for default serialized form

public final class StringList implements Serializable {

private int size = 0;

private Entry head = null;

private static class Entry implements Serializable {

String data;

Entry next;

Entry previous;

}

... // Remainder omitted

}

Logically
speaking, this class represents a sequence of strings. Physically, it
represents the sequence as a doubly linked list. If you accept the default serialized
form, the serialized form will painstakingly mirror every entry in the linked
list and all the links between the entries, in both directions.

Using the
default serialized form when an object’s physical representation differs
substantially from its logical data content has four disadvantages:

• It permanently
ties the exported API to the current internal representation. In the above
example, the private StringList.Entry class becomes
part of the public API. If the representation is changed in a future release,
the StringList class will
still need to accept the linked list representation on input and generate it on
output. The class will never be rid of all the code dealing with linked list
entries, even if it doesn’t use them anymore.

• It can consume
excessive space. In the above example, the serialized form unnecessarily represents
each entry in the linked list and all the links. These entries and links are
mere implementation details, not worthy of inclusion in the serialized form.
Because the serialized form is excessively large, writing it to disk or sending
it across the network will be excessively slow.

• It can consume
excessive time. The serialization logic has no knowledge of the topology of the
object graph, so it must go through an expensive graph traversal. In the
example above, it would be sufficient simply to follow the next references.

• It can cause
stack overflows. The default serialization procedure performs a recursive
traversal of the object graph, which can cause stack overflows even for
moderately sized object graphs. Serializing a StringList
instance
with 1,258 elements causes the stack to overflow on my machine. The number of
elements required to cause this problem may vary depending on the JVM
implementation and command line flags; some implementations may not have this
problem at all.

A reasonable
serialized form for StringList is simply the
number of strings in the list, followed by the strings themselves. This
constitutes the logical data represented by a StringList, stripped of
the details of its physical representation. Here is a revised version of StringList containing writeObject
and
readObject methods
implementing this serialized form. As a reminder, the transient modifier indicates that an instance field is
to be omitted from a class’s default serialized form:

// StringList with a reasonable custom serialized
form

public final class StringList implements Serializable {

private transient int size = 0;

private transient Entry head = null;

// No longer Serializable!

private static class Entry {

String data;

Entry next;

Entry previous;

}

// Appends the specified string to the list

public final void add(String s) { ... }

/**

* Serialize this {@code StringList} instance.

*

* @serialData The size of the list (the number of strings

* it contains) is emitted ({@code int}), followed by all
of

* its elements (each a {@code String}), in the proper

* sequence.

*/

private void writeObject(ObjectOutputStream s)

throws IOException {

s.defaultWriteObject();

s.writeInt(size);

// Write out all elements in the proper order.

for (Entry e = head; e != null; e = e.next)

s.writeObject(e.data);

}

private void readObject(ObjectInputStream s)

throws IOException, ClassNotFoundException {

s.defaultReadObject();

int numElements = s.readInt();

// Read in all elements and insert them in list

for (int i = 0; i < numElements; i++)

add((String) s.readObject());

}

... // Remainder omitted

}

Note that the
first thing writeObject does is to
invoke defaultWriteObject, and the
first thing readObject does is to
invoke defaultReadObject, even though
all of StringList’s fields are
transient. If all instance fields are transient, it is technically permissible
to dispense with invoking defaultWriteObject
and
defaultReadObject, but it is not
recommended.

Before deciding
to make a field nontransient, convince yourself that its value is part of the
logical state of the object. If you use a custom serialized form, most or
all of the instance fields should be labeled transient, as in the StringList example shown above.

If you are
using the default serialized form and you have labeled one or more fields transient, remember that these fields will be
initialized to their default values when an
instance is deserialized: null for object reference fields, zero
for umeric primitive fields, and false
for
boolean fields [JLS,
4.12.5]. If these values are unacceptable for any transient fields, you must
provide a readObject method that
invokes the defaultReadObject method and
then restores transient fields to acceptable values (Item 76). Alternatively,
these fields can be lazily initialized the first time they are used (Item 71).

Whether or not
you use the default serialized form, you must impose any synchronization on object
serialization that you would impose on any other method that reads the entire
state of the object. So, for example, if you have a thread-safe
object (Item 70) that achieves its thread safety by synchronizing every method,
and you elect to use the default serialized form, use the following writeObject method:

// writeObject for synchronized class with default
serialized form

private synchronized
void writeObject(ObjectOutputStream
s)

throws IOException {

s.defaultWriteObject();

}

If you put
synchronization in the writeObject method, you
must ensure that it adheres to the same lock-ordering constraints as other
activity, or you risk a resource-ordering deadlock [Goetz06, 10.1.5].

Regardless of
what serialized form you choose, declare an explicit serial version UID in
every serializable class you write. This eliminates the serial version UID as a
potential source of incompatibility (Item 74). There is also a small
performance benefit. If no serial version UID is provided, an expensive
computation is required to generate one at runtime.

Declaring a
serial version UID is simple. Just add this line to your class:

private static final long serialVersionUID = randomLongValue
;

To summarize,
when you have decided that a class should be serializable (Item 74), think hard
about what the serialized form should be. Use the default serialized form only
if it is a reasonable description of the logical state of the object; otherwise
design a custom serialized form that aptly describes the object. You should
allocate as much time to designing the serialized form of a class as you
allocate to designing its exported methods (Item 40). Just as you cannot
eliminate exported methods from future versions, you cannot eliminate fields
from the serialized form; they must be preserved forever to ensure
serialization compatibility. Choosing the wrong serialized form can have a
permanent, negative impact on the complexity and performance of a class.