Software Maintenance: File Format Evolution in Java

Joshua Engel examines how code changes require an evolution in file formats and how to deal with those changes. As he points out, it's not uncommon to lose data when new application versions change how some tasks are accomplished. While there's no completely graceful solution, you can make file format upgrades as painless as possible. This article examines how Java serialized files can be made to evolve better.

From the author of

From the author of

Adding a new capability to a released program often requires changing the way
users save data, which means a change to the file format. Usually you'll
have to store additional information. Sometimes you'll drastically alter
the way information is organized or represented. The file format evolves to
match the new capabilities of the program. However, you can't afford to
forget about the old versions. In the animal kingdom, those who don't adapt
die out; in software, users may upgrade, or they may not.

No matter how much better your new file format is, however, and no matter how
many improvements it includes, it's generally unacceptable to users for
their old files to become unusable with the new software. You have a couple of
options for dealing with this problem:

Keep your old code around for reading old files. You'll have to
write additional code to convert the old data into the new format (usually done
most easily by converting it into your new internal objects, and then using the
code you've already written for the new objects to write the new file
format). As a bonus, you can keep the old writing code and make it compatible
with your new objects. There's still sometimes some loss of information,
but it's better than losing everything.

Be able to read and write old file formats. This can be a lot of work,
since new versions of a program often have capabilities that older ones lack, so
there's usually no place to store the data required to make the new
capabilities work.

Data loss is not uncommon when new versions fundamentally change the way some
things are done. Old capabilities may no longer be necessary in the new version
when the new version achieves the same goal in a different fashion. For example,
a program that has changed from a Swing-based interface to a web-oriented
interface will lose a lot of information about user preferences that no longer
apply. A mail program that changes from a folder-based indexing system to a
word-based system will probably lose information in the upgrade between index
file formats, which can be especially tragic if one index has saved a lot of
user preferences and optimizations that are no longer necessary.

There's no completely graceful solution to these scenarios. However, you
can try to make file format upgrades as painless as possible. Because Java
serialization is becoming a popular option for saving files, as it's simple
and easy to use, let's examine how Java serialized files can be made to
evolve better.

Java Serialization Evolution

There are numerous advantages to using Java serialization:

It's very easy to do.

It writes out all the objects that your object links to.

If an object occurs more than once, it's only written a single time.
This is particularly important not only because it saves space in the file, but
because you don't have to worry about the potential infinite loops
you'd get if you were to write this code in a naïve way. (The
naïve way would be to recursively write out each object, but if you
don't keep track of what you've already written out, you can find
yourself going forever.)

Unfortunately, file formats defined by Java serialization tend to be very
fragile; very simple modifications to your class can make old objects
unreadable. Even simple extensions are not handled easily. For example, this
code has a very simple file format:

The big number in the message above is a hash of various properties of the
class:

Class name (Save)

Field names (name)

Method names (save)

Implemented interfaces (Serializable)

Change any of those items (adding or deleting), and you'll get a
different hash code, which will generate that exception. It's called the
serial version universal identifier (UID). You can get around this
problem by forcing the class to have the old serialVersionUID by adding
a field to the class. It must be

staticso that it's a property of the class, not the
object

finalso that it can't change as the code is
running

longbecause it's a 64-bit number

So you add the following line:

static final long serialVersionUID=-2805274842657356093L;

The number given is the "stream classdesc"; that is; the
one in the saved stream. The L tacked onto the end is for long numbers;
this is about the only time I ever use long constants.

Of course, not all changes are compatible. If you change the type of a field
from a String to an int, the de-serializer won't know
what to do with the value, and you'll get an error message like this:

java.io.InvalidClassException: Save; incompatible types for field name