Aleksandar Seovic's Coherence Blog

September 19, 2009

Why "Portable Objects"?

I guess I should've covered why I named this blog the way I did in the intro post few months back, but better late than never ;)

One of the things I like most about Coherence is how simple it makes interop between various platforms. While the cluster-side code has to be written in Java, writing client applications is equally simple whether you write them in Java, C# or C++. There are no web services or other heavy-weight technologies to make your life miserable -- you simply code your application against the appropriate client library, and the client library itself takes care of data marshaling and the low-level communication details between the client and the cluster. Best of all, the API used is the same one you would use within the cluster, and apart from the minor platform idiosyncrasies, it is equivalent across the supported platforms.

There are two underlying technologies that make this possible. The first one is Coherence*Extend, a low-level, TCP/IP-based messaging protocol that is used for communication between the client and the cluster. While it is truly amazing piece of software in its own right, Coherence*Extend by itself wouldn't help much if it wasn't for the second piece of the puzzle -- Portable Object Format, or POF.

Portable Object Format

POF is a platform-independent serialization format that allows you to encode equivalent Java, .NET and C++ objects into the identical sequence of bytes. In other words, for any given class with the identical data members, the bytes on the wire will be exactly the same, regardless of the platform.

Implementing POF Serialization

In order to make your objects portable, you need to implement serialization code by hand. While this definitely isn't the most exciting code you'll ever write, it is quite simple and typically doesn't take more than a minute or two per class.

There are two ways to implement POF serialization. The first one is to implement PortableObject interface (IPortableObject in .NET):

As you can see, making class portable is not a rocket science -- you simply implement PortableObject interface and use the appropriate PofReader and PofWriter methods to read and write class members to a POF stream.

However, what if we don't have the source code for the class, or are not allowed to modify it? We can still make it portable using the second approach, which is to implement an external serializer. This is exactly what we will do for our .NET class:

As you can see, writing an external serializer is not much more complex either (ignore for now the calls to Read/WriteRemainder methods -- they have to do with object evolvability, which is a subject that warrants its own post).

POF Context

POF serializer does not encode class name into the POF stream -- after all, doing so would defeat its purpose, as class names are platform-specific. Instead, it encodes integer type identifier, and leaves it up to the user to ensure that type identifiers map to appropriate classes on each platform.

The mapping is achieved using one of PofContext (IPofContext in .NET) implementations. SimplePofContext allows you to map types programmatically, and is very useful for unit testing. However, in real applications you will likely want to externalize type mappings into a file, which is where ConfigurablePofContext comes in.

The ConfigurablePofContext allows you to specify mappings in an XML file. For example, configuration file for the Java POF serializer that can serialize our Person class would look like this:

On the .NET side the configuration is very similar. The only difference really is that the XML schema is used instead of DTD, and that because our class doesn't implement IPortableObject interface we also need to configure the external serializer explicitly:

Now that both Java and .NET serializer have been configured, you can serialize Person instance on one platform and deserialize it on the other.

Conclusion

In addition to portability and platform independence, there is much more to like about POF.

For one, it is extremely compact format. Instead of verbose class names, it uses integer identifiers to represent types, and you have probably already realized from code examples that the same is true for class attributes, where integer indexes are used instead of property names. It is quite common to achieve 3-5 times size reduction of serialized data when compared to standard Java or .NET binary serialization, which in the context of Coherence means 3-5x less network traffic and more importantly 3-5x less RAM needed in the cluster (or caching 3-5x the data in the same amount of RAM, depending on how you look at it).

The second benefit is the raw serialization speed. POF serialization is consistently 10-12 times faster than Java or .NET serialization. While this is pretty much irrelevant for individual cache puts and gets, due to network access overhead, it can significantly improve the performance of queries and aggregations against the cache that require objects to be deserialized.

Finally, as of Coherence 3.5, binary POF values can be manipulated directly using PofValue interface and related classes, providing you with one additional way to avoid excessive serialization and deserialization of objects in a partitioned cache. The widely discussed PofExtractor and PofUpdater are built on top of this functionality, but I personally find the former much more interesting as it opens up a world of possibilities when it comes to direct binary manipulation of cached objects.

In the next post, I will show one such example by implementing an equivalent of a VersionedPut entry processor that does not need to deserialize either the old or the new value.

Comments

I don't think you ever shared your VersionedPut entry processor that skips deserialization. Seems like you had planned on using a PofExtractor - would you mind sharing a little more about that? Would you be using a PofNavigator impl such as SimplePofPath?

My apologies, I meant to post about BinaryVersionedPut long time ago, but things got a bit out of control. It's amazing how time flies by...

Now that the book is out, I should be able to devote more time to blogging, and this is one of the first topics I'll write about.

In the meantime, you can find implementation of BinaryVersionedPut in Coherence Tools project on Google Code (http://code.google.com/p/coherence-tools/source/browse/trunk/main/src/com/seovic/coherence/util/processor/BinaryVersionedPut.java)