Java RMI: Serialization

This excerpt is Chapter 10 from Java RMI, published in October 2001 by O'Reilly.

Serialization is the process of
converting a set of object instances that contain references to each other
into a linear stream of bytes, which can then be sent through a socket, stored
to a file, or simply manipulated as a stream of data. Serialization is the
mechanism used by RMI to pass objects between JVMs, either as arguments in a
method invocation from a client to a server or as return values from a method
invocation. In the first section of this book, I referred to this process
several times but delayed a detailed discussion until now. In this chapter, we
drill down on the serialization mechanism; by the end of it, you will
understand exactly how serialization works and how to use it efficiently
within your applications.

The Need for Serialization

Envision the banking application while a client is executing a
withdrawal. The part of the application we're looking at has the runtime
structure shown in Figure
10-1.

Figure 10-1. Runtime structure when making a withdrawal

What does it mean for the client to pass an instance of
Moneyto the server? At a minimum, it means that the
server is able to call public methods on the instance of
Money.

Just to be clear: doing things
this way would be a bad idea (and this is not the way RMI passes instances
over the wire).

One way to do this would be to implicitly make
Moneyinto a server as well.
For example, imagine that the client sends the following two pieces of
information whenever it passes an instance as an argument:

The type of the instance; in this case,
Money.

A unique identifier for the object (i.e., a logical
reference). For example, the address of the instance in memory.

The RMI runtime layer in the server can use this information to
construct a stub for the instance of
Money, so that
whenever the
Accountserver calls a method on what
it thinks of as the instance of
Money, the method
call is relayed over the wire, as shown in Figure
10-2.

Figure 10-2. Relaying a Money method call from the server

Attempting to do things this way has three significant
drawbacks:

You can't access fields on the objects that have been passed as
arguments.

Related articles:

Learning Command Objects and RMI -- O'Reilly's Java RMI author William Grosso introduces you to the basic
ideas behind command objects by providing a translation
service from a remote server and using command objects
to structure the RMI made from a client program.

Seamlessly Caching Stubs for Improved Performance -- In Part 2 of this RMI series, William Grosso addresses a common problem with RMI apps -- too many remote method calls to a naming service. In this article he extends the framework introduced in Part 1 to provide seamless caching of stubs.

Generics and Method Objects -- O'Reilly's Java RMI author William Grosso introduces you to the new Generics Specification and rebuilds his command object framework using it.

Stubs work by implementing an interface. They implement the methods in
the interface by simply relaying the method invocation across the network.
That is, the stub methods take all their arguments and simply marshall them
for transport across the wire. Accessing a public field is really just
dereferencing a pointer--there is no method invocation and hence, there
isn't a method call to forward over the wire.

It can result in unacceptable performance due to network latency.

Even in our simple case, the instance of
Accountis going to need to call
getCents( )on the instance of
Money. This means that a simple call to
makeDeposit( )really involves at least two distinct
networked method calls:
makeDeposit( )from the
client and
getCents( )from the server.

It makes the application much more vulnerable to partial failure.

Let's say that the server is busy and doesn't get around to handling the
request for 30 seconds. If the client crashes in the interim, or if the
network goes down, the server cannot process the request at all. Until all
data has been requested and sent, the application is particularly vulnerable
to partial failures.

This last point is an interesting one. Any time you have an
application that requires a long-lasting and durable connection between client
and server, you build in a point of failure. The longer the connection needs
to last, or the higher the communication bandwidth the connection requires,
the more likely the application is to occasionally break down.

TIP: The original design of the Web, with its
stateless connections, serves as a good example of a distributed application
that can tolerate almost any transient network failure.

These three reasons imply that what is really needed is a way to
copy objects and send them over the wire. That is, instead of turning
arguments into implicit servers, arguments need to be completely copied so
that no further network calls are needed to complete the remote method
invocation. Put another way, we want the result of
makeWithdrawal( )to involve creating a copy of the
instance of
Moneyon the server side. The runtime
structure should resemble Figure 10-3.

Figure 10-3. Making a remote method call can create deep copies of the arguments and return values

The desire to avoid unnecessary network dependencies has two
significant consequences:

Once an object is duplicated, the two objects are completely
independent of each other.

Any attempt to keep the copy and the original in sync would involve
propagating changes over the network, entirely defeating the reason for
making the copy in the first place.

The copying mechanism must create deep copies.

If the instance of
Moneyreferences another
instance, then copies must be made of both instances. Otherwise, when a
method is called on the second object, the call must be relayed across the
wire. Moreover, all the copies must be made immediately--we can't wait until
the second object is accessed to make the copy because the original might
change in the meantime.

These two consequences have a very important third
consequence:

If an object is sent twice, in separate method calls, two copies of
the object will be created.

In addition to arguments to method calls, this holds for objects that are
referenced by the arguments. If you pass object A, which has a reference to
object C, and in another call you pass object B, which also has a reference
to C, you will end up with two distinct copies of C on the receiving side.

Drilling Down on Object Creation

To see why this last point holds, consider a client that
executes a withdrawal and then tries to cancel the transaction by making a
deposit for the same amount of money. That is, the following lines of code are
executed:

server.makeWithdrawal(amount);
....
server.makeDeposit(amount);

The client has no way of knowing whether the server still has a
copy of
amount. After all, the server may have used
it and then thrown the copy away once it was done. This means that the client
has to marshall
amountand send it over the wire to
the server.

The RMI runtime can demarshall
amount, which is the instance of
Moneythe client sent. However, even if it has the
previous object, it has no way (unless
equals( )has been overridden) to tell whether the instance it just demarshalled is
equal to the previous object.

More generally, if the object being copied isn't immutable, then
the server might change it. In this case, even if the two objects are
currently equal, the RMI runtime has no way to tell if the two copies will
always be equal and can potentially be replaced by a single copy. To see why,
consider our
Printerexample again. At the end of
Chapter 3, we considered a list of possible feature requests that could be
made. One of them was the following:

Managers will want to track resource consumption.
This will involve logging print requests and, quite possibly, building a set
of queries that can be run against the printer's log.

This can be implemented by adding a few more fields to
DocumentDescriptionand having the server store an
indexed log of all the
DocumentDescriptionobjects
it has received. For example, we may add the following fields to
DocumentDescription:

public Time whenPrinted;
public Person sender;
public boolean printSucceeded;

Now consider what happens when the user actually wants to print
two copies of the same document. The client application could call:

server.printDocument(document);

twice with the "same" instance of
DocumentDescription. And it would be an error for the RMI
runtime to create only one instance of
DocumentDescriptionon the server side. Even though the
"same" object is passed into the server twice, it is passed as parts of
distinct requests and therefore as different objects.

TIP: This is true even if the runtime can
tell that the two instances of
DocumentDescriptionare equal when it finishes
demarshalling. An implementation of a printer may well have a notion of a
job queue that holds instances of
DocumentDescription. So our client makes the first
call, and the copy of
documentis placed in the
queue (say, at number 5), but not edited because the document hasn't been
printed yet. Then our client makes the second call. At this point, the two
copies of
documentare equal. However, we don't
want to place the same object in the printer queue twice. We want to place
distinct copies in the printer queue.

Thus, we come to the following conclusion: network latency, and
the desire to avoid vulnerability to partial failures, force us to have a deep
copy mechanism for most arguments to a remote method invocation. This copying
mechanism has to make deep copies, and it cannot perform any validation to
eliminate "extra" copies across methods.

TIP: While this discussion provides examples
of implementation decisions that force two copies to occur, it's important
to note that, even without such examples, clients should be written as if
the servers make independent copies. That is, clients are written to use
interfaces. They should not, and cannot, make assumptions about server-side
implementations of the interfaces.

Using Serialization

Serialization is a mechanism built into the core Java libraries
for writing a graph of objects into a stream of data. This stream of data can
then be programmatically manipulated, and a deep copy of the objects can be
made by reversing the process. This reversal is often called deserialization.

In particular, there are three main uses of serialization:

As a persistence mechanism

If the stream being used is
FileOutputStream, then the data will automatically be
written to a file.

As a copy mechanism

If the stream being used is
ByteArrayOutputStream, then the data will be written to
a byte array in memory. This byte array can then be used to create
duplicates of the original objects.

As a communication mechanism

If the stream being used comes from a socket, then
the data will automatically be sent over the wire to the receiving socket,
at which point another program will decide what to do.

The important thing to note is that the use of serialization is
independent of the serialization algorithm itself. If we have a serializable
class, we can save it to a file or make a copy of it simply by changing the
way we use the output of the serialization mechanism.

As you might expect, serialization is implemented using a pair
of streams. Even though the code that underlies serialization is quite
complex, the way you invoke it is designed to make serialization as
transparent as possible to Java developers. To serialize an object, create an
instance of
ObjectOutputStreamand call the
writeObject( )method; to read in a serialized object,
create an instance of
ObjectInputStreamand call
the
readObject( )object.

ObjectOutputStream

ObjectOutputStream, defined in the
java.iopackage, is a stream that implements the
"writing-out" part of the serialization algorithm. (RMI actually uses a subclass of
ObjectOutputStreamto customize its behavior.)
The methods implemented by
ObjectOutputStreamcan
be grouped into three categories: methods that write information to the
stream, methods used to control the stream's behavior, and methods used to
customize the serialization algorithm.

The "write" methods

The first, and most intuitive, category consists of the "write"
methods:

For the most part, these methods should seem familiar.
writeFloat( ), for example, works exactly as you would
expect after reading Chapter 1 -- it takes a floating-point number and encodes
the number as four bytes. There are, however, two new methods here:
writeObject( )and defaultWriteObject( ).

writeObject( )serializes an object.
In fact,
writeObject( )is often the instrument of
the serialization mechanism itself. In the simplest and most common case,
serializing an object involves doing two things: creating an
ObjectOuptutStreamand calling
writeObject( )with a single "top-level" instance. The
following code snippet shows the entire process, storing an object--and all
the objects to which it refers--into a file:

Of course, this works seamlessly with the other methods for
writing data. That is, if you wanted to write two floats, a String, and an
object to a file, you could do so with the following code snippet:

TIP:ObjectOutputStream's constructor takes an
OutputStreamas an argument. This is analagous to many
of the streams we looked at in Chapter 1.
ObjectOutputStreamand
ObjectInputStreamare simply encoding and
transformation layers. This enables RMI to send objects over the wire by
opening a socket connection, associating the
OutputStreamwith the socket connection, creating an
ObjectOutputStreamon top of the socket's
OutputStream, and then calling
writeObject( ).

The other new "write" method is
defaultWriteObject().
defaultWriteObject( )makes it much easier to customize
how instances of a single class are serialized. However,
defaultWriteObject( )has some strange restrictions
placed on when it can be called. Here's what the documentation says about
defaultWriteObject( ):

Write the nonstatic and nontransient fields of the
current class to this stream. This may only be called from the
writeObjectmethod of the class being serialized. It
will throw the
NotActiveExceptionif it is called
otherwise.

That is,
defaultWriteObject( )is a
method that works only when it is called from another specific method at a
particular time. Since
defaultWriteObject( )is
useful only when you are customizing the information stored for a particular
class, this turns out to be a reasonable restriction. We'll talk more about
defaultWriteObject( )later in the chapter, when we
discuss how to make a class serializable.

The stream manipulation methods

ObjectOutputStreamalso implements
four methods that deal with the basic mechanics of manipulating the
stream:

With the exception of
useProtocolVersion(
), these methods should be familiar. In fact,
reset( ),
close( ), and
flush( )are standard stream methods.
useProtocolVersion( ), on the other hand, changes the
version of the serialization mechanism that is used. This is necessary because
the serialization format and algorithm may need to change in a way that's not
backwards-compatible. If another application needs to read in your serialized
data, and the applications will be versioning independently (or running in
different versions of the JVM), you may want to standardize on a protocol
version.

TIP: There are two versions of the
serialization protocol currently defined: PROTOCOL_VERSION_1 and
PROTOCOL_VERSION_2. If you send serialized data to a 1.1 (or earlier) JVM,
you should probably use PROTOCOL_VERSION_1. The most common case of this
involves applets. Most applets run in browsers over which the developer has
no control. This means, in particular, that the JVM running the applet could
be anything, from Java 1.0.2 through the latest JVM. Most servers, on the
other hand, are written using JDK1.2.2 or later. (The main exception is EJB
containers that require earlier versions of Java. At this writing, for
example, Oracle 8i's EJB container uses JDK 1.1.6.)
If you pass serialized objects between an applet and a server, you should
specify the serialization protocol.

Methods that customize the serialization mechanism

The last group of methods consists mostly of protected methods
that provide hooks that allow the serialization mechanism itself, rather than
the data associated to a particular class, to be customized. These methods
are:

These methods are more important to people who tailor the
serialization algorithm to a particular use or develop their own
implementation of serialization. As such, they require a deeper understanding
of the serialization algorithm. We'll discuss these methods in more detail
later, after we've gone over the actual algorithm used by the serialization
mechanism.

ObjectInputStream

ObjectInputStream, defined in the
java.iopackage, implements the "reading-in" part
of the serialization algorithm. It is the companion to
ObjectOutputStream--objects serialized using
ObjectOutputStreamcan be deserialized using
ObjectInputStream. Like
ObjectOutputStream, the methods implemented by
ObjectInputStreamcan be grouped into three categories:
methods that read information from the stream, methods that are used to
control the stream's behavior, and methods that are used to customize the
serialization algorithm.

The "read" methods

The first, and most intuitive, category consists of the "read"
methods:

Just as with
ObjectOutputStream's
write( )methods, these methods should be familiar.
readFloat( ), for example, works exactly as you
would expect after reading Chapter 1: it reads four bytes from the stream and
converts them into a single floating-point number, which is returned by the
method call. And, again as with
ObjectOutputStream,
there are two new methods here:
readObject( )and
defaultReadObject( ).

Just as
writeObject( )serializes an
object,
readObject( )deserializes it.
Deserializing an object involves doing two things: creating an
ObjectInputStreamand then calling
readObject( ). The following code snippet shows the
entire process, creating a copy of an object (and all the objects to which it
refers) from a file:

This code is exactly inverse to the code we used for serializing
the object in the first place. If we wanted to make a deep copy of a
serializable object, we could first serialize the object and then deserialize
it, as in the following code example:

This code simply places an output stream into memory, serializes
the object to the memory stream, creates an input stream based on the same
piece of memory, and runs the deserializer on the input stream. The end result
is a deep copy of the object with which we started.

The stream manipulation methods

There are five basic stream manipulation methods defined for
ObjectInputStream:

The three new methods are also straightforward.
skipBytes( )skips the indicated number of bytes in the
stream, blocking until all the information has been read. And the two
readFully( )methods perform a batch read into a byte
array, also blocking until all the data has been read in.

Methods that customize the serialization mechanism

The last group of methods consists mostly of protected methods
that provide hooks, which allow the serialization mechanism itself, rather
than the data associated to a particular class, to be customized. These
methods are:

These methods are more important to people who tailor the
serialization algorithm to a particular use or develop their own
implementation of serialization. Like before, they also require a deeper
understanding of the serialization algorithm, so I'll hold off on discussing
them right now.

How to Make a Class Serializable

So far, we've focused on the mechanics of serializing an object.
We've assumed we have a serializable object and discussed, from the point of
view of client code, how to serialize it. The next step is discussing how to
make a class serializable.

There are four basic things you must do when you are making a
class serializable. They are:

Implement the
Serializableinterface.

Make sure that instance-level, locally defined state is
serialized properly.

Make sure that superclass state is serialized properly.

Override
equals( )and
hashCode( ).

Let's look at each of these steps in more detail.

Implement the Serializable Interface

This is by far the easiest of the steps. The
Serializableinterface is an empty interface; it declares
no methods at all. So implementing it amounts to adding "implements
Serializable" to your class declaration.

Reasonable people may wonder about the utility of an empty
interface. Rather than define an empty interface, and require class
definitions to implement it, why not just simply make every object
serializable? The main reason not to do this is that there are some classes
that don't have an obvious serialization. Consider, for example, an instance
of
File. An instance of
Filerepresents a file. Suppose, for example, it was
created using the following line of code:

File file = new File("c:\\temp\\foo");

It's not at all clear what should be written out when this is
serialized. The problem is that the file itself has a different lifecyle than
the serialized data. The file might be edited, or deleted entirely, while the
serialized information remains unchanged. Or the serialized information might
be used to restart the application on another machine, where
"C:\\temp\\foo"is the name of an entirely different
file.

Another example is provided by the
Thread class. (If you don't know much about threads, just wait a few chapters and then revisit this example. It will make more sense then.)
A thread represents a flow of execution within a particular JVM. You
would not only have to store the stack, and all the local variables, but also
all the related locks and threads, and restart all the threads properly when
the instance is deserialized.

TIP: Things get worse when you consider
platform dependencies. In general, any class that involves native code is
not really a good candidate for serialization.

Make Sure That Instance-Level, Locally Defined State Is
Serialized Properly

Class definitions contain variable declarations. The
instance-level, locally defined variables (e.g., the nonstatic variables) are
the ones that contain the state of a particular instance. For example, in our
Moneyclass, we declared one such field:

public class Money extends ValueObject {
private int _cents;
....
}

The serialization mechanism has a nice default behavior -- if all
the instance-level, locally defined variables have values that are either
serializable objects or primitive datatypes, then the serialization mechanism
will work without any further effort on our part. For example, our
implementations of
Account, such as
Account_Impl, would present no problems for the default
serialization mechanism:

While
_balancedoesn't have a
primitive type, it does refer to an instance of
Money, which is a serializable class.

If, however, some of the fields don't have primitive types, and
don't refer to serializable classes, more work may be necessary. Consider, for
example, the implementation of
ArrayListfrom the
java.utilpackage. An
ArrayListreally has only two pieces of state:

But hidden in here is a huge problem:
ArrayListis a generic container class whose state is
stored as an array of objects. While arrays are first-class objects in Java,
they aren't serializable objects. This means that
ArrayListcan't just implement the
Serializableinterface. It has to provide extra
information to help the serialization mechanism handle its nonserializable
fields. There are three basic solutions to this problem:

Fields can be declared to be transient.

The writeObject( )/
readObject(
) methods can be implemented.

serialPersistentFields can be declared.

Declaring transient fields

The first, and easiest, thing you can do is simply mark some
fields using the
transientkeyword. In
ArrayList, for example,
elementDatais really declared to be a transient
field:

This tells the default serialization mechanism to ignore the
variable. In other words, the serialization mechanism simply skips over the
transient variables. In the case of
ArrayList, the
default serialization mechanism would attempt to write out
size, but ignore
elementDataentirely.

This can be useful in two, usually distinct, situations:

The variable isn't serializable

If the variable isn't serializable, then the
serialization mechanism will throw an exception when it tries to serialize
the variable. To avoid this, you can declare the variable to be transient.

The variable is redundant

Suppose that the instance caches the result of a
computation. Locally, we might want to store the result of the computation,
in order to save some processor time. But when we send the object over the
wire, we might worry more about consuming bandwidth and thus discard the
cached computation since we can always regenerate it later on.

Implementing writeObject() and readObject( )

Suppose that the first case applies. A field takes values that
aren't serializable. If the field is still an important part of the state of
our instance, such as
elementDatain the case of an
ArrayList, simply declaring the variable to be
transientisn't good enough. We need to save and
restore the state stored in the variable. This is done by implementing a pair
of methods with the following signatures:

When the serialization mechanism starts to write out an object,
it will check to see whether the class implements
writeObject( ). If so, the serialization mechanism will
not use the default mechanism and will not write out any of the instance
variables. Instead, it will call
writeObject( )and
depend on the method to store out all the important state. Here is
ArrayList's implementation of
writeObject( ):

The first thing this does is call
defaultWriteObject( ).
defaultWriteObject( )invokes the default serialization
mechanism, which serializes all the nontransient, nonstatic instance
variables. Next, the method writes out
elementData.lengthand then calls the stream's
writeObject( )for each element of
elementData.

There's an important point here that is sometimes missed:
readObject( )and
writeObject(
)are a pair of methods that need to be implemented together. If you do
any customization of serialization inside one of these methods, you need to
implement the other method. If you don't, the serialization algorithm will
fail.

Unit Tests and Serialization

Unit tests are used to test a specific piece of
functionality in a class. They are explicitly not end-to-end or
application-level tests. It's often a good idea to adopt a
unit-testing harness such as
JUnitwhen
developing an application.
JUnitgives you
an automated way to run unit tests on individual classes and is
available from http://www.junit.org/.

If you adopt a unit-testing methodology, then any
serializable class should pass the following three tests:

If it implements
readObject( ), it should implement
writeObject( ), and vice-versa.

It is equal (using the
equals( )method) to a serialized copy of
itself.

It has the same hashcode as a serialized
copy of itself.

Similar constraints hold for classes that
implement the
Externalizableinterface.

Declaring serialPersistentFields

The final option that can be used is to explicitly declare which
fields should be stored by the serialization mechanism. This is done using a
special static final variable called
serialPersistentFields, as shown in the following code
snippet:

This line of code declares that the field named
size, which is of type
int, is
a serial persistent field and will be written to the output stream by the
serialization mechanism. Declaring
serialPersistentFieldsis almost the opposite of
declaring some fields
transient. The meaning of
transient is, "This field shouldn't be stored by serialization," and the
meaning of
serialPersistentFieldsis, "These fields
should be stored by serialization."

But there is one important difference between declaring some
variables to be
transientand others to be
serialPersistentFields. In order to declare variables to
be transient, they must be locally declared. In other words, you must have
access to the code that declares the variable. There is no such requirement
for
serialPersistentFields. You simply provide the
name of the field and the type.

TIP: What if you try to do both? That is,
suppose you declare some variables to be
transient, and then also provide a definition for
serialPersistentFields? The answer is that the
transientkeyword is ignored; the definition of
serialPersistentFieldsis definitive.

So far, we've talked only about instance-level state. What about
class-level state? Suppose you have important information stored in a static
variable? Static variables won't get saved by serialization unless you add
special code to do so. In our context, (shipping objects over the wire between
clients and servers), statics are usually a bad idea anyway.

Make Sure That Superclass State Is Handled Correctly

After you've handled the locally declared state, you may still
need to worry about variables declared in a superclass. If the superclass
implements the
Serializableinterface, then you
don't need to do anything. The serialization mechanism will handle everything
for you, either by using default serialization or by invoking
writeObject( )/
readObject( )if they are declared in the superclass.

If the superclass doesn't implement
Serializable, you will need to store its state. There are
two different ways to approach this. You can use
serialPersistentFieldsto tell the serialization
mechanism about some of the superclass instance variables, or you can use
writeObject( )/
readObject(
)to handle the superclass state explicitly. Both of these,
unfortunately, require you to know a fair amount about the superclass. If
you're getting the .class files from another source,
you should be aware that versioning issues can cause some really nasty
problems. If you subclass a class, and that class's internal representation of
instance-level state changes, you may not be able to load in your serialized
data. While you can sometimes work around this by using a sufficiently
convoluted
readObject( )method, this may not be a
solvable problem. We'll return to this later. However, be aware that the
ultimate solution may be to just implement the
Externalizableinterface instead, which we'll talk about
later.

Another aspect of handling the state of a nonserializable
superclass is that nonserializable superclasses must have a zero-argument
constructor. This isn't important for serializing out an object, but it's
incredibly important when deserializing an object. Deserialization works by
creating an instance of a class and filling out its fields correctly. During
this process, the deserialization algorithm doesn't actually call any of the
serialized class's constructors, but does call the zero-argument constructor
of the first nonserializable superclass. If there isn't a zero-argument
constructor, then the deserialization algorithm can't create instances of the
class, and the whole process fails.

WARNING: If you can't create a
zero-argument constructor in the first nonserializable superclass, you'll
have to implement the
Externalizableinterface
instead.

Simply adding a zero-argument constructor might seem a little
problematic. Suppose the object already has several constructors, all of which
take arguments. If you simply add a zero-argument constructor, then the
serialization mechanism might leave the object in a half-initialized, and
therefore unusable, state.

However, since serialization will supply the instance variables
with correct values from an active instance immediately after instantiating
the object, the only way this problem could arise is if the constructors
actually do something with their arguments--besides setting variable
values.

If all the constructors take arguments and actually execute
initialization code as part of the constructor, then you may need to refactor
a bit. The usual solution is to move the local initialization code into a new
method (usually named something like
initialize(
)), which is then called from the original constructor:

public MyObject(arglist) {
// set local variables from arglist
// perform local initialization
}

After this is done,
writeObject(
)/
readObject( )should be implemented, and
readObject( )should end with a call to
initialize( ). Sometimes this will result in code that
simply invokes the default serialization mechanism, as in the following
snippet:

TIP: If creating a zero-argument constructor
is difficult (for example, you don't have the source code for the
superclass), your class will need to implement the
Externalizableinterface instead of
Serializable.

Override equals( ) and hashCode( ) if Necessary

The default implementations of
equals(
)and
hashCode( ), which are inherited from
java.lang.Object, simply use an instance's location
in memory. This can be problematic. Consider our previous deep copy code
example:

Sometimes, as in the case of
Moneyand
DocumentDescription, the answer should be
true. If two instances of
Moneyhave the same values for
_cents, then they are equal. However, the implementation
of
equals( )inherited from
Objectwill return
false.

The same problem occurs with
hashCode(
). Note that
Objectimplements
hashCode( )by returning the memory address of the
instance. Hence, no two instances ever have the same
hashCode( )using
Object's
implementation. If two objects are equal, however, then they should have the
same hashcode. So if you need to override
equals(
), you probably need to override
hashCode( )as well.

TIP: With the exception of declaring variables
to be transient, all our changes involve adding functionality. Making a
class serializable rarely involves significant changes to its functionality
and shouldn't result in any changes to method implementations. This means
that it's fairly easy to retrofit serialization onto an existing object
hierarchy. The hardest part is usually implementing
equals( )and
hashCode( ).

Making DocumentDescription Serializable

To make this more concrete, we now turn to the
DocumentDescriptionclass from the RMI version of our
printer server, which we implemented in Chapter 4. The code for the first
nonserializable version of
DocumentDescriptionwas
the following:

Make sure that instance-level, locally defined state is
serialized properly

Of these, four are primitive types that serialization can handle
without any problem. However,
_actualDocumentis a
problem.
InputStreamis not a serializable class.
And the contents of
_actualDocumentare very
important;
_actualDocumentcontains the document we
want to print. There is no point in serializing an instance of
DocumentDescriptionunless we somehow serialize
_actualDocumentas well.

If we have fields that serialization cannot handle, and they
must be serialized, then our only option is to implement
readObject( )and
writeObject(
). For
Document-
Description, we declare
_actualDocumentto be transient and then implement
readObject( )and
writeObject(
)as follows:

Note that we declare
_actualDocumentto be transient and call
defaultWriteObject( )in
the first line of our
writeObject( )method. Doing
these two things allows the standard serialization mechanism to serialize the
other four instance variables without any extra effort on our part. We then
simply copy
_actualDocumentto the stream.

Our implementation of
readObject( )simply calls
defaultReadObject( )and then reads
_actualDocumentfrom the stream. In order to read
_actualDocumentfrom the stream, we used the length
of the document, which had previously been written to the stream. In essence,
we needed to encode some metadata into the stream, in order to correctly pull
our data out of the stream.

This code is a little ugly. We're using serialization, but we're
still forced to think about how to encode some of our state when we're sending
it out of the stream. In fact, the code for
writeObject(
)and
readObject( )is remarkably similar to
the marshalling code we implemented directly for the socket-based version of
the printer server. This is, unfortunately, often the case. Serialization's
default implementation handles simple objects very well. But, every now and
then, you will want to send a nonserializable object over the wire, or improve
the serialization algorithm for efficiency. Doing so amounts to writing the
same code you write if you implement all the socket handling yourself, as in
our socket-based version of the printer server.

TIP: There is also an order dependency here.
The first value written must be the first value read. Since we start writing
by calling
defaultWriteObject( ), we have to
start reading by calling
default-
ReadObject( ). On the bright side, this means we'll
have an accurate value for
_lengthbefore we try
to read
_actualDocumentfrom the stream.

Make sure that superclass state is handled correctly

This isn't a problem. The superclass,
java.lang.Object, doesn't actually have any important
state that we need to worry about. Since it also already has a zero-argument
constructor, we don't need to do anything.

Override equals() and hashCode( ) if necessary

In our current implementation of the printer server, we don't
need to do this. The server never checks for equality between instances of
DocumentDescription. Nor does it store them in a
container object that relies on their hashcodes.

Did We Cheat When Implementing Serializable for
DocumentDescription?

It may seem like we cheated a bit in implementing
DocumentDescription. Three of the five
steps in making a class serializable didn't actually result in changes
to the code. Indeed, the only work we really did was implementing
readObject( )and
writeObject( ). But it's not really cheating.
Serialization is just designed to be easy to use. It has a good set of
defaults, and, at least in the case of value objects intended to be
passed over the wire, the default behavior is often good
enough.

The Serialization Algorithm

By now, you should have a pretty good feel for how the
serialization mechanism works for individual classes. The next step in
explaining serialization is to discuss the actual serialization algorithm in a
little more detail. This discussion won't handle all the details of
serialization (Though we'll come close).
Instead, the idea is to cover the algorithm and protocol, so you can
understand how the various hooks for customizing serialization work and how
they fit into the context of an RMI application.

The Data Format

The first step is to discuss what gets written to the stream
when an instance is serialized. Be warned: it's a lot more information than
you might guess from the previous discussion.

An important part of serialization involves writing out
class-related metadata associated with an instance. Most instances are more
than one class. For example, an instance of
Stringis also an instance of
Object. Any given instance,
however, is an instance of only a few classes. These classes can be written as
a sequence:
C1,
C2...
CN, in which
C1is a superclass of
C2,
C2is a superclass of
C3, and so on. This is actually a linear sequence because
Java is a single inheritance language for classes. We call
C1the least superclass and
CNthe most-derived class. See Figure
10-4.

Figure 10-4. Inheritance diagram

After writing out the associated class information, the
serialization mechanism stores out the following information for each
instance:

A description of the most-derived class.

Data associated with the instance, interpreted as an
instance of the least superclass.

Data associated with the instance, interpreted as an
instance of the second least superclass.

And so on until:

Data associated with the instance, interpreted as an
instance of the most-derived class.

So what really happens is that the type of the instance is
stored out, and then all the serializable state is stored in discrete chunks
that correspond to the class structure. But there's a question still
remaining: what do we mean by "a description of the most-derived class?" This
is either a reference to a class description that has already been recorded
(e.g., an earlier location in the stream) or the following information:

The version ID of the class, which is an integer used
to validate the .class files

This should, of course, immediately seem familiar. The class
descriptions consist entirely of metadata that allows the instance to be read
back in. In fact, this is one of the most beautiful aspects of serialization;
the serialization mechanism automatically, at runtime, converts class objects
into metadata so instances can be serialized with the least amount of
programmer work.

A Simplified Version of the Serialization Algorithm

In this section, I describe a slightly simplified version of the
serialization algorithm. I then proceed to a more complete description of the
serialization process in the next section.

Writing

Because the class descriptions actually contain the metadata,
the basic idea behind the serialization algorithm is pretty easy to describe.
The only tricky part is handling circular references.

The problem is this: suppose instance
Arefers to instance
B. And
instance
Brefers back to instance
A. Completely writing out
Arequires you to write out
B. But writing out
Brequires you to write out
A.
Because you don't want to get into an infinite loop, or even write out an
instance or a class description more than once
you need to keep track of what's already been written to the stream. (Serialization is a slow process
that uses the reflection API quite heavily in addition to the bandwidth)

ObjectOutputStreamdoes this by
maintaining a mapping from instances and classes to handles. When
writeObject( )is called with an argument that has
already been written to the stream, the handle is written to the stream, and
no further operations are necessary.

If, however,
writeObject( )is passed
an instance that has not yet been written to the stream, two things happen.
First, the instance is assigned a reference handle, and the mapping from
instance to reference handle is stored by
ObjectOutputStream. The handle that is assigned is the
next integer in a sequence.

TIP: Remember the
reset(
)method on
ObjectOutputStream? It clears
the mapping and resets the handle counter to 0x7E0000 .RMI also
automatically resets its serialization mechanism after every remote method
call.

Second, the instance data is written out as per the data format
described earlier. This can involve some complications if the instance has a
field whose value is also a serializable instance. In this case, the
serialization of the first instance is suspended, and the second instance is
serialized in its place (or, if the second instance has already been
serialized, the reference handle for the second instance is written out).
After the second instance is fully serialized, serialization of the first
instance resumes. The contents of the stream look a little bit like Figure
10-5.

Figure 10-5. Contents of Serialization's data stream.

Reading

From the description of writing, it's pretty easy to guess most
of what happens when
readObject( )is called.
Unfortunately, because of versioning issues, the implementation of
readObject( )is actually a little bit more complex than
you might guess.

When it reads in an instance description,
ObjectInputStreamgets the following information:

Descriptions of all the classes involved

The serialization data from the instance

The problem is that the class descriptions that the instance of
ObjectInputStreamreads from the stream may not be
equivalent to the class descriptions of the same classes in the local JVM. For
example, if an instance is serialized to a file and then read back in three
years later, there's a pretty good chance that the class definitions used to
serialize the instance have changed.

This means that
ObjectInputStreamuses the class descriptions in two ways:

It uses them to actually pull data from the stream,
since the class descriptions completely describe the contents of the stream.

It compares the class descriptions to the classes it
has locally and tries to determine if the classes have changed, in which
case it throws an exception. If the class descriptions match the local
classes, it creates the instance and sets the instance's state
appropriately.

RMI Customizes the Serialization Algorithm

RMI doesn't actually use
ObjectOutputStreamand
ObjectInputStream. Instead, it uses custom subclasses so
it can modify the serialization process by overriding some protected methods.
In this section, we'll discuss the most important modifications that RMI makes
when serializing instances. RMI makes similar changes when deserializing
instances, but they follow from, and can easily be deduced from, the
description of the serialization changes.

Recall that
ObjectOutputStreamcontained the following protected methods:

These all have default implementations in
ObjectOutputStream. That is,
annotateClass( )and
annotateProxyClass( )do nothing.
enableReplaceObject( )returns
false, and so on. However, these methods are still called
during serialization. And RMI, by overriding these methods, customizes the
serialization process.

annotateClass( )

ObjectOutputStreamcalls
annotateClass( )when it writes out class descriptions.
Annotations are used to provide extra information about a class that comes
from the serialization mechanism and not from the class itself. The basic
serialization mechanism has no real need for annotations; most of the
information about a given class is already stored in the stream.

TIP: RMI's dynamic classloading system uses
annotateClass( )to record where .class files are stored. We'll discuss this more in
Chapter 19.

RMI, on the other hand, uses annotations to record codebaseinformation. That is,
RMI, in addition to recording the class descriptions, also records information
about the location from which it loaded the class's bytecode. Codebases are
often simply locations in a filesystem. Incidentally, locations in a
filesystem are often useless information, since the JVM that deserializes the
instances may have a very different filesystem than the one from where the
instances were serialized. However, a codebase isn't restricted to being a
location in a filesystem. The only restriction on codebases is that they have
to be valid URLs. That is, a codebase is a URL that specifies a location on
the network from which the bytecode for a class can be obtained. This enables
RMI to dynamically load new classes based on the serialized information in the
stream. We'll return to this in Chapter 19.

replaceObject( )

The idea of replacement is simple; sometimes the instance that
is passed to the serialization mechanism isn't the instance that ought to be
written out to the data stream. To make this more concrete, recall what
happened when we called
rebind( )to register a
server with the RMI registry. The following code was used in the bank
example:

This creates an instance of
Account_Impland then calls
rebind(
)with that instance.
Account_Implis a
server that implements the
Remoteinterface, but
not the
Serializableinterface. And yet, somehow,
the registry, which is running in a different JVM, is sent something.

What the registry actually gets is a stub. The stub for
Account_Impl, which was automatically generated by
rmic, begins with:

public final class Account_Impl_Stub extends java.rmi.server.RemoteStub

java.rmi.server.RemoteStubis a class
that implements the
Serializableinterface. The RMI
serialization mechanism knows that whenever a remote server is "sent" over the
wire, the server object should be replaced by a stub that knows how to
communicate with the server (e.g., a stub that knows on which machine and port
the server is listening).

Calling
Naming.rebind( )actually
winds up passing a stub to the RMI registry. When clients make calls to
Naming.lookup( ), as in the following code snippet, they
also receive copies of the stub. Since the stub is serializable, there's no
problem in making a copy of it:

_account = (Account)Naming.lookup(_accountNameField.getText( ));

In order to enable this behavior,
ObjectOutputStreamcalls
enableReplaceObject( )and
replaceObject( )during the serialization process. In
other words, when an instance is about to be serialized,
ObjectOutputStreamdoes the following:

It calls
enableReplaceObject(
)to see whether instance replacement is enabled.

If instance replacement is enabled, it calls
replaceObject( ), passing in the instance it was about
to serialize, to find out which instance it should really write to the
stream.

It then writes the appropriate instance to the stream.

Maintaining Direct Connections

A question that frequently arises as distributed applications
get more complicated involves message forwarding. For example, suppose that we
have three communicating programs:
A,
B, and
C. At the start,
Ahas a stub for
B,
Bhas a stub for
C, and
Chas a stub for
A. See Figure 10-6.

Figure 10-6. Communication between three applications.

Now, what happens if
Acalls a
method, for example,
getOtherServer( ), on
Bthat "returns"
C? The answer
is that
Agets a deep copy of the stub
Buses to communicate with
C.
That is,
Anow has a direct connection to
C; whenever
Atries to send a
message to
C,
Bis not
involved at all. This is illustrated in Figure
10-7.

Figure 10-7. Improved communication between three applications.

This is very good from a bandwidth and network latency point of
view. But it can also be somewhat problematic. Suppose, for example,
Bimplements load balancing. Since
Bisn't involved in the
Ato
Ccommunication, it has no direct way of knowing
whether
Ais still using
C, or how heavily. We'll revisit this in Chapters and ,
when we discuss the distributed garbage collector and the
Unreferencedinterface.

Versioning Classes

A few pages back, I described the serialization mechanism:

The serialization mechanism automatically, at
runtime, converts class objects into metadata so instances can be serialized
with the least amount of programmer work.

This is great as long as the classes don't change. When classes
change, the metadata, which was created from obsolete class objects,
accurately describes the serialized information. But it might not correspond
to the current class implementations.

The Two Types of Versioning Problems

There are two basic types of versioning problems that can occur.
The first occurs when a change is made to the class hierarchy (e.g., a
superclass is added or removed). Suppose, for example, a personnel application
made use of two serializable classes:
Employeeand
Manager(a subclass of
Employee). For the next version of the application, two
more classes need to be added:
Contractorand
Consultant. After careful thought, the new hierarchy is
based on the abstract superclass
Person, which has
two direct subclasses:
Employeeand
Contractor.
Consultantis
defined as a subclass of
Contractor, and
Manageris a subclass of
Employee. See Figure
10-8.

Figure 10-8. Changing the class hierarchy.

While introducing
Personis probably
good object-oriented design, it breaks serialization. Recall that
serialization relied on the class hierarchy to define the data format.

The second type of version problem arises from local changes to
a serializable class. Suppose, for example, that in our bank example, we want
to add the possibility of handling different currencies. To do so, we define a
new class,
Currency, and change the definition of
Money:

This completely changes the definition of
Moneybut doesn't change the object hierarchy at all.

The important distinction between the two types of versioning
problems is that the first type can't really be repaired. If you have old data
lying around that was serialized using an older class hierarchy, and you need
to use that data, your best option is probably something along the lines of
the following:

Using the old class definitions, write an application
that deserializes the data into instances and writes the instance data out
in a neutral format, say as tab-delimited columns of text.

Using the new class definitions, write a program that
reads in the neutral-format data, creates instances of the new classes, and
serializes these new instances.

The second type of versioning problem, on the other hand, can be
handled locally, within the class definition.

How Serialization Detects When a Class Has Changed

In order for serialization to gracefully detect when a
versioning problem has occurred, it needs to be able to detect when a class
has changed. As with all the other aspects of serialization, there is a
default way that serialization does this. And there is a way for you to
override the default.

The default involves a hashcode. Serialization creates a single
hashcode, of type
long, from the following
information:

The class name and modifiers

The names of any interfaces the class implements

Descriptions of all methods and constructors except
privatemethods and constructors

Descriptions of all fields except
private,
static, and
private transient

This single
long, called the class's
stream unique identifier (often abbreviated
suid),
is used to detect when a class changes. It is an extraordinarily sensitive
index. For example, suppose we add the following method to
Money:

public boolean isBigBucks( ) {
return _cents > 5000;
}

We haven't changed, added, or removed any fields; we've simply
added a method with no side effects at all. But adding this method changes the
suid. Prior to adding it, the
suidwas
6625436957363978372L;
afterwards, it was
-3144267589449789474L. Moreover,
if we had made
isBigBucks( )a protected method,
the
suidwould have been
4747443272709729176L.

TIP: These numbers can be computed using the
serialVer program that ships with the JDK. For example, these were all
computed by typing
serialVer
com.ora.rmibook.chapter10.Moneyat the command line for slightly
different versions of the
Moneyclass.

The default behavior for the serialization mechanism is a
classic "better safe than sorry" strategy. The serialization mechanism uses
the
suid, which defaults to an extremely sensitive
index, to tell when a class has changed. If so, the serialization mechanism
refuses to create instances of the new class using data that was serialized
with the old classes.

Implementing Your Own Versioning Scheme

While this is reasonable as a default strategy, it would be
painful if serialization didn't provide a way to override the default
behavior. Fortunately, it does. Serialization uses only the default
suidif a class definition doesn't provide one. That is,
if a class definition includes a
static final longnamed
serialVersionUID, then serialization will use
that
static
final longvalue as the
suid. In the case of our
Moneyexample, if we included the line:

private static final long serialVersionUID = 1;

in our source code, then the
suidwould be 1, no matter how many changes we made to the rest of the class.
Explicitly declaring
serialVersionUIDallows us to
change the class, and add convenience methods such as
isBigBucks( ), without losing backwards compatibility.

TIP:serialVersionUIDdoesn't have to be private. However,
it must be
static,
final, and
long.

The downside to using
serialVersionUIDis that, if a significant change is made
(for example, if a field is added to the class definition), the
suidwill not reflect this difference. This means that
the deserialization code might not detect an incompatible version of a class.
Again, using
Moneyas an example, suppose we
had:

The serialization mechanism won't detect that these are
completely incompatible classes. Instead, when it tries to create the new
instance, it will throw away all the data it reads in. Recall that, as part of
the metadata, the serialization algorithm records the name and type of each
field. Since it can't find the fields during deserialization, it simply
discards the information.

The solution to this problem is to implement your own versioning
inside of
readObject( )and
writeObject( ). The first line in your
writeObject( )method should begin:

Doing this will enable you to explicitly control the versioning
of your class. In addition to the added control you gain over the
serialization process, there is an important consequence you ought to consider
before doing this. As soon as you start to explicitly version your classes,
defaultWriteObject( )and
defaultReadObject( )lose a lot of their usefulness.

Trying to control versioning puts you in the position of
explicitly writing all the marshalling and demarshalling code. This is a
trade-off you might not want to make.

Performance Issues

Serialization is a generic marshalling and demarshalling
algorithm, with many hooks for customization. As an experienced programmer,
you should be skeptical--generic algorithms with many hooks for customization
tend to be slow. Serialization is not an exception to this rule. It is, at
times, both slow and bandwidth-intensive. There are three main performance
problems with serialization: it depends on reflection, it has an incredibly
verbose data format, and it is very easy to send more data than is required.

Serialization Depends on Reflection

The dependence on reflection is the hardest of these to
eliminate. Both serializing and deserializing require the serialization
mechanism to discover information about the instance it is serializing. At a
minimum, the serialization algorithm needs to find out things such as the
value of
serialVersionUID, whether
writeObject( )is implemented, and what the superclass
structure is. What's more, using the default serialization mechanism, (or
calling
defaultWriteObject( )from within
writeObject( )) will use reflection to discover all the
field values. This can be quite slow.

TIP: Setting
serialVersionUIDis a simple, and often surprisingly
noticeable, performance improvement. If you don't set
serialVersionUID, the serialization mechanism has to
compute it. This involves going through all the fields and methods and
computing a hash. If you set
serialVersionUID, on
the other hand, the serialization mechanism simply looks up a single value.

Serialization Has a Verbose Data Format

Serialization's data format has two problems. The first is all
the class description information included in the stream. To send a single
instance of
Money, we need to send all of the
following:

The description of the
ValueObjectclass

The description of the
Moneyclass

The instance data associated with the specific instance
of
Money.

This isn't a lot of information, but it's information that RMI
computes and sends with every method invocation. (Recall that RMI resets the
serialization mechanism with every method call.)
Even if the first two bullets comprise only 100 extra bytes of information,
the cumulative impact is probably significant.

The second problem is that each serialized instance is treated
as an individual unit. If we are sending large numbers of instances within a
single method invocation, then there is a fairly good chance that we could
compress the data by noticing commonalities across the instances being
sent.

It Is Easy to Send More Data Than Is Required

Serialization is a recursive algorithm. You pass in a single
object, and all the objects that can be reached from that object by following
instance variables, are also serialized. To see why this can cause problems,
suppose we have a simple application that uses the
Employeeclass:

What happens as a result of this? On the bright side, the
application still works. After everything is recompiled, the entire
application, including the remote method invocations, will still work. That's
the nice aspect of serialization--we added new fields, and the data format
used to send arguments over the wire automatically adapted to handle our
changes. We didn't have to do any work at all.

On the other hand, adding a new field redefined the data format
associated with
Employee. Because
serialVersionUIDwasn't defined in the first version of
the class, none of the old data can be read back in anymore. And there's an
even more serious problem: we've just dramatically increased the bandwidth
required by remote method calls.

Suppose Bob works in the mailroom. And we serialize the object
associated with Bob. In the old version of our application, the data for
serialization consisted of:

The class information for
Employee

The instance data for Bob

In the new version, we send:

The class information for
Employee

The instance data for Bob

The instance data for Sally (who runs the mailroom and
is Bob's manager)

The instance data for Henry (who is in charge of
building facilities)

The instance data for Alison (Director, Corporate
Infrastructure)

The instance data for Mary (VP in charge of IT)

And so on...

The new version of the application isn't backwards-compatible
because our old data can't be read by the new version of the application. In
addition, it's slower and is much more likely to cause network congestion.

The Externalizable Interface

To solve the performance problems associated with making a class
Serializable, the serialization mechanism allows
you to declare that a class is
Externalizableinstead. When
ObjectOutputStream's
writeObject( )method is called, it performs the
following sequence of actions:

It tests to see if the object is an instance of
Externalizable. If so, it uses externalization to
marshall the object.

If the object isn't an instance of
Externalizable, it tests to see whether the object is
an instance of
Serializable. If so, it uses
serialization to marshall the object.

These have roughly the same role that
readObject( )and
writeObject(
)have for serialization. There are, however, some very important
differences. The first, and most obvious, is that
readExternal( )and
writeExternal(
)are part of the
Externalizableinterface.
An object cannot be declared to be
Externalizablewithout implementing these methods.

However, the major difference lies in how these methods are
used. The serialization mechanism always writes out class descriptions of all
the serializable superclasses. And it always writes out the information
associated with the instance when viewed as an instance of each individual
superclasses.

Externalization gets rid of some of this. It writes out the
identity of the class (which boils down to the name of the class and the
appropriate
serialVersionUID). It also stores the
superclass structure and all the information about the class hierarchy. But
instead of visiting each superclass and using that superclass to store some of
the state information, it simply calls
writeExternal(
)on the local class definition. In a nutshell: it stores all the
metadata, but writes out only the local instance information.

TIP: This is true even if the superclass
implements
Serializable. The metadata about the
class structure will be written to the stream, but the serialization
mechanism will not be invoked. This can be useful if, for some reason, you
want to avoid using serialization with the superclass. For example, some of
the Swing classes,
while they claim to implement
Serializable, do so
incorrectly (and will throw exceptions during the serialization process). (JTextAreais one of the most egregious offenders.) If
you really need to use these classes, and you think serialization would be
useful, you may want to think about creating a subclass and declaring it to
be
Externalizable. Instances of your class will
be written out and read in using externalization. Because the superclass is
never serialized or deserialized, the incorrect code is never invoked, and
the exceptions are never thrown.

Comparing Externalizable to Serializable

Of course, this efficiency comes at a price.
Serializablecan be frequently implemented by doing two
things: declaring that a class implements the
Serializableinterface and adding a zero-argument
constructor to the class. Furthermore, as an application evolves, the
serialization mechanism automatically adapts. Because the metadata is
automatically extracted from the class definitions, application programmers
often don't have to do anything except recompile the program.

On the other hand,
Externalizableisn't particularly easy to do, isn't very flexible, and requires you to
rewrite your marshalling and demarshalling code whenever you change your class
definitions. However, because it eliminates almost all the reflective calls
used by the serialization mechanism and gives you complete control over the
marshalling and demarshalling algorithms, it can result in dramatic
performance improvements.

To demonstrate this, I have defined the
EfficientMoneyclass. It has the same fields and
functionality as
Moneybut implements
Externalizableinstead of
Serializable:

On my home machine, averaging over 10 trial runs for both
Moneyand
EfficientMoney, I
get the results shown in Table
10-1. (We need to average because the
elapsed time can vary (it depends on what else the computer is doing). The
size of the file is, of course, constant.)

These results are fairly impressive. By simply converting a leaf
class in our hierarchy to use externalization, I save 67 bytes and 10
milliseconds when serializing a single instance. In addition, as I pass larger
data sets over the wire, I save more and more bandwidth--on average, 18 bytes
per instance.

TIP: Which numbers should we pay attention
to? The single-instance costs or the 10,000-instance costs? For most
applications, the single-instance cost is the most important one. A typical
remote method call involves sending three or four arguments (usually of
different types) and getting back a single return value. Since RMI clears
the serialization mechanism between calls, a typical remote method call
looks a lot more like serializing 3 or 4 single instances than serializing
10,000 instances of the same class.

If I need more efficiency, I can go further and remove
ValueObjectfrom the hierarchy entirely. The
ReallyEfficientMoneyclass directly extends
Objectand implements
Externalizable:

ReallyEfficientMoneyhas much better
performance than either
Moneyor
EfficientMoneywhen a single instance is serialized but
is almost identical to
EfficientMoneyfor large
data sets. Again, averaging over 10 iterations, I record the numbers in Table
10-2.

Compared to
Money, this is quite
impressive; I've shaved almost 200 bytes of bandwidth and saved 40
milliseconds for the typical remote method call. The downside is that I've had
to abandon my object hierarchy completely to do so; a significant percentage
of the savings resulted from not including
ValueObjectin the inheritance chain. Removing
superclasses makes code harder to maintain and forces programmers to implement
the same method many times (
ReallyEfficientMoneycan't use
ValueObject's implementation of
equals( )and
hashCode( )anymore). But it does lead to significant performance improvements.

One Final Point

An important point is that you can decide whether to implement
Externalizableor
Serializableon a class-by-class basis. Within the same
application, some of your classes can be
Serializable, and some can be
Externalizable. This makes it easy to evolve your
application in response to actual performance data and shifting requirements.
The following two-part strategy is often quite nice:

Make all your classes implement
Serializable.

After that, make some of them, the ones you send often
and for which serialization is dramatically inefficient, implement
Externalizableinstead.

This gets you most of the convenience of serialization and lets
you use
Externalizableto optimize when
appropriate.

Experience has shown that, over time, more and more objects will
gradually come to directly extend
Objectand
implement
Externalizable. But that's fine. It
simply means that the code was incrementally improved in response to
performance problems when the application was deployed.

Learning Command Objects and RMI -- O'Reilly's Java RMI author William Grosso introduces you to the basic
ideas behind command objects by providing a translation
service from a remote server and using command objects
to structure the RMI made from a client program.

Seamlessly Caching Stubs for Improved Performance -- In Part 2 of this RMI series, William Grosso addresses a common problem with RMI apps -- too many remote method calls to a naming service. In this article he extends the framework introduced in Part 1 to provide seamless caching of stubs.

Generics and Method Objects -- O'Reilly's Java RMI author William Grosso introduces you to the new Generics Specification and rebuilds his command object framework using it.