There are a number of problems with Java serialization and numerous alternatives have been developed. But my focus here is on a particular use case, databases, and a single issue, performance.

Databases generally work with very large byte arrays. This is because seek time is very slow compared to the data transfer rate, so working with larger byte arrays often results in a performance gain. This is true for both hard disks and Solid State Disks (SSD). On the other hand, deserializing very large byte arrays is very CPU intensive. So Java databases optimize the size of the byte arrays they use to balance between the performance of the disk and the performance of the CPU. Fortunately the price of RAM has dropped significantly over the years so large memory caches can be used to hold the deserialized data, which reduces both the need to repeatedly read and deserialize the same data. Unfortunately there is still a need to frequently reserialize the updated data, which is also CPU intensive, and write it back to disk because of the transactional nature of many databases.

Significant performance gains are achieved by not using Java serialization, but working more closely with the data, reading and writing the binary form of integers, floats, strings and the like. This is not difficult to do, especially as most databases work only with well-defined tables where all the data in a given column is of the same type. But still, when reading or writing the data to disk, the entire byte array must be deserialized and subsequently reserialized. Working directly with the binary data is much faster than using Java serialization. But this is still a CPU-intensive process. And the irony is that many database transactions only access or update a miniscule amount of data.

Now in an ideal world, we would only deserialize data as needed, and then only reserialize the data that has changed. Doing this will mean that we can work with much larger byte arrays, resulting is an overall improvement in Java database performance. The data structures for doing this efficiently may be somewhat complex, but that should not be an issue so long as the API is reasonable. The term I use for this technology is JID, or Java Incremental Deserialization/reserialization.

Java Incremental Deserialization/reserialization (JID) provides a near-ideal solution for updating serialized data structures. On an Intel I7, an entry can be inserted in the middle of the byte array of a serialized list with 100,000 entries in 2.4 * 10^-4 seconds (240 microseconds or about a quarter of a millisecond). And an entry can be updated in the byte array of a serialized map with 100,000 entries in 4.8 * 10^-4 seconds (480 microseconds or about half a millisecond).

To achieve this level of performance, JID is not reflection based, nor does it support cyclic data structures. JID instead requires that serializable data structures be built with objects which are instances of the Jid class. Each type of object used in a serializable data structure must also be registered.

Download JActor-4.5.0.zip and JID-2.0.4.zip from here. (The versions will change--the current versions are 4.5.0 and 2.0.4.) Then extract the JActor-4.5.0.jar and JID-2.0.4.jar files and copy them to a directory, GettingJidStarted. You will also need some slf4j jar files.

The j.bat file can be used to compile and run a test with the following command:

j className

The test is comprised of a single file in the GettingJidStarted directory, GettingStarted.jave, with a single method, main.

import org.agilewiki.jactor.factory.JAFactory;

import org.agilewiki.jid.JidFactories;

import org.agilewiki.jid.scalar.vlens.actor.RootJid;

import org.agilewiki.jid.scalar.vlens.string.StringJid;

public class GettingStarted {

public static void main(String[] args) throws Exception {

.

.

.

}

}

Factories

Factories are integral to the operation of JID, as they are needed for deserialization. (JID uses factory objects to create Jid objects, with each factory assigned a type name. See JActor Factories for more information.) Our main method begins with initializing the factories.

JAFactory factory = new JAFactory();

(new JidFactories()).initialize(factory);

JAFactory is the repository of factory objects. JidFactories, when initialized, adds a number of useful Jid factory objects to JAFactory when initialized.

Creating and Serializing an Empty RootJid

Jid objects are used to create tree structures, with the root of the tree always an instance of class RootJid.

The rootJid0 object is created and initialized by the JAFactory.newActor method.

The RootJid.getSerializedLength method returns the length of the byte array needed to hold the serialized RootJid. This method involves a minimum of calculation using information that is updated when the contents of the RootJid is updated. And the length is zero when a RootJid is empty.

The RootJid.save method takes two arguments, (1) the byte array where the serialized data is to be saved and (2) an offset. The returned value is the sum of the offset and the length of the serialized data.

The RootJid.load method is used to load a RootJid with the serialized data created by the method RootJid.save. The load method takes 3 arguments, (1) the byte array holding the serialized data, (2) the offset to where the serialized data is located in the byte array and (3) the length of the serialized data. The returned value is the sum of the offset and the length of the serialized data.

Serializing a RootJid with an Empty String

A RootJid object can hold one Jid object.

rootJid0.setValue(JidFactories.STRING_JID_TYPE);

serializedLength0 = rootJid0.getSerializedLength();

serializedData0 = new byte[serializedLength0];

updatedOffset0 = rootJid0.save(serializedData0, 0);

if (!(rootJid0.getValue() instanceof StringJid))

throw new Exception("unexpected result");

if (updatedOffset0 != serializedLength0)

throw new Exception("unexpected result");

The RootJid.setValue method is used to create and initialize the Jid object held by a RootJid.

Super fast persistent data structures composed with pre-build classes is a step in the right direction, but is still a step backwards from what we are used to being able to do. Being able to easily write custom classes that are just as fast is also important.

Custom JID classes are subclasses of AppJid, which provides a persistent tuple with any number of entries of different types. To illustrate this we will look at a simple User class which has two persistent values, a name and an age.

import org.agilewiki.jid.scalar.vlens.string.StringJid;

import org.agilewiki.jid.scalar.flens.integer.IntegerJid;

import org.agilewiki.jid.collection.flenc.AppJid;

public class User extends AppJid {

private StringJid getNameJid() throws Exception {

return (StringJid) _iGet(0);

}

private IntegerJid getAgeJid() throws Exception {

return (IntegerJid) _iGet(1);

}

public String getName() throws Exception {

return getNameJid().getValue();

}

public void setName(String name) throws Exception {

getNameJid().setValue(name);

}

public int getAge() throws Exception {

return getAgeJid().getValue();

}

public void setAge(int age) throws Exception {

getAgeJid().setValue(age);

}

}

User accesses the Jid objects in its persistent tuple using the protected method _iGet(int). But this tuple must have a StringJid as its first element and an IntegerJid as its second element. This requirement is met by using a factory object to create and initialize User objects.

For high-performance, array-backed data structures are generally recommended. On the other hand, inserting into a large array isn't the fastest thing. BListJid uses small arrays (max size is 27) in a balanced tree structure to support fast updates and super fast incremental deserialization/reserialization.

As with other Jid classes, a registered factory object is mandatory. The JidFactories class registers 8 different types of BListJid, one of which is a list of integers.

The iAdd method creates an inserts a new Jid object at a given location, where 0 is the first location, 1 is the second, -1 is the last location, -2 is the next-to-last location, etc.

BListJid supports only homogenous lists, where all the entries are of the same class. The advantage is that the serialized data is smaller than it would otherwise be and performance is a bit better as well. But sometimes we need a list of homogenous objects. This can be achieved with ActorJid, which is a superclass of RootJid and which can hold any object that subclasses Jid.

Being able to constrain the types used in a data structure can be important, and this is one of the advantages of using UnionJid instead of ActorJid. It also results in serialized data that is a bit more compact and a bit faster to deserialize. It also supports recursive types, which is what we will be looking at in this next example.

In the above code we define the type union1, which can hold either a StringJid or a Jid of type unions. And the type unions is defined as a BListJid whose elements are of type union1. We then proceed to create a list whose first element is a StringJid with a value of "a" and whose second element is a list whose first element is a StringJid with a value of "b".

BMapJid is the base class for balanced tree maps which, like bListJid, provide for super-fast incremental deserialization and reserialization. BMapJid has 3 subclasses, IntegerBMapJid, LongBMapJid and StringBMapJid, which support Integer, Long and String keys respectively.

BMapJid<KEY_TYPE, VALUE_TYPE> is a collection of MapEntry objects, where MapEntry holds a key/value pair. BMapJid is effectively a sorted list of MapEntry objects, with fast indexing supporting the same methods as BListJid exception only the iAdd and iAddBytes methods are not supported. But access by key is also supported. These additional methods include

MapEntry<KEY_TYPE, VALUE_TYPE> getCeiling(KEY_TYPE key) - Returns the MapEntry with the smallest key that is greater or equal to the given key, or null. And

MapEntry<KEY_TYPE, VALUE_TYPE> getHigher(KEY_TYPE key) - Returns the MapEntry with the smallest key that is greater than the given key, or null.

BMapJid objects are created using a registered factory object. As a convenience, JidFactories registers 24 such factory objects, though it is easy enough to define register additional factory objects using the IntegerBMapJidFactory, LongBMapJidFactory and StringBMapJidFactory classes.

All Jid objects have the same superclass, Jid, which in turn is a subclass of JLPCActor, which means that all Jid objects are actors.

So far, we have not given any examples of a Jid object initialized with a Mailbox, which means that none of the Jid objects shown are able to send or process messages. But initializing a Jid object with a Mailbox is easy to do and most of the methods in the JID API have corresponding Request classes. Also, the Jid objects in a Jid tree structure will always share the same mailbox, so an application Jid never needs to send Requests to the Jid objects in its tuple--it can just call their methods directly.

In the code below we create a RootJid with a JidString set to "Hello world!", serialize it and then deserialize it. Many of the method calls shown earlier have been replaced with request messages to illustrate their use. However, the serialization and deserialization logic still uses method calls, which means that thread safety is the responsibility of the application developer for these operations. (Thread safety can always be achieved by performing these operations within an actor which uses the same mailbox as the Jid Actor.)

When deserialization and reserialization are reasonably fast, their use to make deep copies of data structures becomes a reasonable approach. And Jid provides the CopyJid request, which is supported by all Jid actors. The GetSerializedBytes request is similar, in that it returns a byte array holding the serialized data of a Jid actor, and it too is supported by all Jid actors.

Making copies of data structures is important in a multithreaded application when it can be used to reduce the number of messages sent between threads. Conversely, being able to add a copy of a Jid actor to a collection may be even more useful. This is done by first getting the byte array of a Jid's serialized data and passing it in one of several requests which then create a copy of that Jid and add it to their collection. These requests include SetActorBytes for RootJid, ActorJid and UnionJid, IAddBytes for BListJid and KMakeBytes for BMapJid.