Monday, November 26, 2007

How to generate a unique ID within a distributed environment in consideration of scalability ?

Also, the following feature requirements are given ...

The ID must be guaranteed globally unique (this rule out probabilistic approaches)

The assigned ID will be used by client for an unbounded time period (this means that once assigned, the ID is gone forever and cannot be reused for subsequent assignment)

The length of the ID is small (say 64-bit) compared to machine unique identifier (this rule out the possibility of using a combination of MAC address + Process id + Thread id + Timestamp to create a unique ID)

Architectural goals ...

The solution needs to be scalable with respects to growth of request rates

The solution needs to be resilient to component failure

General Solution

Using a general distributed computing scheme, there will be a pool of ID generators (called "workers") residing in many inter-connected, cheap, commodity machines.

Depends on the application, the client (which request for a unique id) may collocate in the same machine of the worker, or the client may reside in a separate machine. In the latter case, a load balancer will be sitting between the client and the worker to make sure the client workload is evenly distributed.

Central Bookkeeper

One straightforward (maybe naive) approach is to have the worker (ie: the ID generator) make a request to a centralized book-keeping server who maintains a counter. Obviously this central book-keeper can be a performance bottleneck as well as a single point of failure.

The performance bottleneck can be mitigated by having the worker making a request to the centralized book-keeper for a "number range" rather than the "id" itself. In this case, id assignment will be done locally by the worker within this number range, only when the whole range is exhausted will the book-keeper be contacted again.

When the book-keeper receive a request of a number range, it will persist to disk the allocated number range before returning to the client. So if the book-keeper crashes, it knows where to start allocating the number range after reboot. To prevent the disk itself being a SPOF, mirrored disk should be used.

The SPOF problem of the book-keeper can be addressed by having a pair of book-keepers (primary/standby). The primary book-keeper need to synchronize its counter state to the standby via shared disks or counter change replication. The standby will continuously monitor the health of the primary book-keeper and will take over its role when it crashes.

Peer Multicast

Instead of using a central book-keeper to track the number range assignment, we can replicate this information among the workers themselves. Under this scheme, every worker keep track of its current allocated range as well as the highest allocated range. So when a worker exhaust its current allocated range, it broadcast a "range allocation request" to all other workers, wait for all of their acknowledgments, and then update its current allocated range.

It is possible that multiple clients making request at the same time. This kind of conflicts can be resolved by distributed co-ordination algorithms (there are many of well-known ones and one of them is bully algorithm)

For performance reason, the worker doesn't persist its the most updated allocated id. In case the client crashes, it will request a brand-new range after bootup, even the previous range was not fully utilized.

Distributed Hashtable

By using the "key-based-routing" model of DHT, each worker will create some "DHT Node" (with a randomly generated 64-bit node id) and join a "DHT ring". Under this scheme, the number range is allocated implicitly (between the node id and its immediate neighbor's node id).

Now, we can utilize a number of nice characteristics of the DHT model such as we can use large number of user's resource for workers with O(logN) routing hobs. And also the DHT nodes contains replicated data of its neighbors so that if one DHT node crashes, its neighbor will take over its number range immediately.

Now, what if a DHT node has exhausted its implicitly allocated number range ? When this happens, the worker will start a new DHT node (which will join at some point in the ring and then has a newly assigned number range).