Aleksandar Seovic's Coherence Blog

May 28, 2009

Coherence Identity Generator

One of the things that often gets Coherence newbies is the fact that there is no built-in facility that can be used to generate object identifiers, something similar to Oracle sequences or SQL Server identity columns. Unlike databases, which store data as tuples within the database tables, Coherence stores objects as cache entries consisting of a key (identity) and associated value.

That means that in order to insert an object into a Coherence cache, you need to provide its identity as well, to serve as the key.

If the object you need to insert into the cache has a natural key, your job is easy. For example, if you need to cache Country instances, you can simply use their ISO codes as cache keys. Unfortunately, most objects don't have natural keys, or have a composite one that is too complex, so you need to come up with a way to generate synthetic keys up front.

One option is to use UUID as an identifier, and Coherence provides excellent UUID implementations in Java, .NET and C++, so if that alternative works for you, problem solved.

However, many people dislike using UUIDs as synthetic identifiers for their objects for several reasons:

They are big. Coherence UUID implementation is 256 bits, which means that each key will be 32 bytes, compared to 4 bytes for an integer or 8 for a long (actually, likely even less than that, as both integers and longs are packed when serialized using POF).

They are cumbersome to use. While for the most part you don't care about the type of identifier, UUIDs can get unwieldy when they leak to the surface, and more often they not they eventually will. For example, UUID within the URL doesn't look very friendly.

If you are in that group, and are looking for an easy way to assign unique integer-based identifiers to your objects, keep reading.

Sequence Generator Design

Conceptually, generating sequential numbers is quite simple -- all you need to do is to associate sequence name with the last assigned number, and provide a way to increment it. However, there are several issues to consider, especially in a highly concurrent and distributed system such as Coherence:

Accessibility -- it should be easy for a client application to obtain the next sequence number, which means that the sequences should be stored in a central location accessible to all cluster nodes, as well as remote clients.

Concurrency -- in order to ensure that no duplicate numbers are issued, you need to synchronize access to a sequence. If many clients try to obtain the next number from a sequence at the same time, this can lead to a bottleneck.

Performance -- obtaining the next number should be fast. Ideally, there should be no network or database calls involved, especially in a highly concurrent environment.

Reliability -- you must ensure that sequence numbers are persisted reliably as soon as they are incremented, or you might end up issuing duplicate numbers after system failure.

Coherence itself is an obvious central location, so we will create a sequences cache to store our sequences. The cache will be keyed by sequence name, and the values will be instances of a very simple Sequence class:

public class Sequence implements Serializable, PortableObject {

private long last; ...}

Basically, the only state our sequence has is a long field that is used to store the last assigned number.

In order to improve performance and reduce contention on individual sequences, we will allow clients to request a block of numbers at once. The size of the block should be configurable on a case by case basis -- for some high-contention sequences it might make sense to allocate few hundred numbers in a single call, while for others you might want to allocate individual numbers by setting block size to 1.

/** * Construct a new sequence block. * * @param first first number in a sequence * @param last last number in a sequence */ public SequenceBlock(long first, long last) { m_next = new AtomicLong(first); m_last = last; }

/** * Return the next number in the sequence. * * @return the next number in the sequence */ public synchronized Long generateIdentity() { if (allocatedSequences == null || !allocatedSequences.hasNext()) { allocatedSequences = allocateSequenceBlock(); } return allocatedSequences.next(); }

As you can see, generateIdentity method is synchronized, which makes the generator thread-safe within the single JVM. Cluster-wide concurrency is ensured using SequenceBlockAllocator entry processor, which is guaranteed to execute atomically:

/** * Allocates a block of sequences from a target entry. * <p/> * If the target entry for the given name does not already exist in a cache, * it will be created automatically. * * @param entry target entry to allocate sequence block from * * @return allocated sequence block */ public Object process(InvocableMap.Entry entry) { Sequence sequence = entry.isPresent() ? (Sequence) entry.getValue() : new Sequence();

In order to ensure reliability, we need to store the sequence numbers in a persistent store that is accessible from all Coherence nodes, such as relational database. This can be easily accomplished by configuring sequences cache to use read/write backing map and appropriate cache store implementation. Because of the differences in environments and persistence framework preferences, this is left as an exercise for the reader.

Using Sequence Generator

Once you have everything in place, actual usage of the sequence generator is quite simple -- all you need to do is create an instance of SequenceGenerator class and invoke generateIdentity method whenever you need a new id.

The most logical place to place this code in is the class for which you need to generate the identifier. For example, in order to use sequence generator to create identifiers for new Account instances, you would make a SequenceGenerator a static field of the Account class and use it within the create factory method:

Comments

Why so much blocking in the Sequence stuff? I mean if you are context switching for the synchronization on this for every generation of a long why the AtomicLong in SequenceBlock. It would be nice to see a non-blocking version of this.