SequenceId Generator for Azure or any Big Table Storage

SequenceId Generator for Azure or any Big Table Storage

Windows Azure and some other cloud table storage systems do not support auto-increment identity columns as Microsoft Sql Server does. Azure groups records by a dual key, the partition key and the row key. The partition key is a group of related records that should all be stored together and you usually query within a single partition at a time. The partition key should not be a unique value. However, the row key must be unique within the partition.

How do you create a unique identifier in Azure or cloud table storage systems? You have to think different than in traditional relational databases. There are no foreign keys and there are no queries with a table join. So why would you want an auto-increment primary key like meaningless value? Your partition and row key values should be a meaningful value. You may even end up storing your records duplicated in multiple tables with different keys just so that they can be searched by different partition keys. Sometimes you need to make a meaningful column value that you can give someone as a reference number. Maybe it is an order number or a confirmation number. This is the scenario the code in this article is targeted at.

Sequence Requirements

The required features in a cloud table unique identifier are

It must be unique every time, obviously.

A role (web or worker) crash should not allow any reuse of identities. Skipping some due to a crash or role stop is ok.

It must not require the identity to be inserted in the order it was generated, since http requests to the table storage may not complete in order of submission.

Ids can be short memorable values, alpha numberic, just alpha, just numeric, or a custom set of characters. 36 character GUID’s are not memorable.

The sequence generator should not require synchronization to a central database server for each new identity value needed. You may have many nodes (web or worker) all attempting to insert records at the same time, and they should not block each other. Some synchronization is expected, but it should be minimal.

Each node (web or worker) should not require special configuration so you can deploy your cloud application and scale up and down at will easily.

Usage Options

You can define multiple instances of a sequence generator, such as one per repository with a singleton lifetime. Or you can store a singleton shared sequence generator instance for all repositories. That decision depends on your dependency inversion architecture. Don’t define a sequence generator for each occurrence that you need an identity as that would waste the rest of the reserved block. I generally create an instance of a sequence generator once per role (web or worker) per repository type. I do this by using ProviderResolver<T> found in a previous article. You can also use a dependency injection framework of your choice.

The first time an identity is needed for a sequence, a block is reserved from the IOptimisticSyncStore given to the generator. The SequenceGenerator will reserve a block of generated sequences and callers will get those until they are used up, then it will repeat by reserving another block. Here is how you use it.

private SequenceIdGenerator ConfirmationNumberGenerator {get {...}}
public void SubmitOrder(Order order){
// I pass in the name of the table the sequence is for. You can use a single
// name for many tables if you don't want the same Id duplicated across them all.
var generatedId = ConfirmationNumberGenerator.NextId(ConfirmTableProxy.TableName);
order.ConfirmationNumber = generatedId
//TODO: persist the order...
}

You can create an instance of a sequence generator for a single sequence, or for all defined sequences. Here is how you define a sequence generator to work with a single sequence, with the definition of the sequence in-line. The in-line definition is only used if the azure table data store does not already have the sequence schema defined. The Sequence stores the schema separately from the last reserved value of the sequence. So updating the sequence schema during multiple application role instance starts does not hurt anything.

How is it different from Snowmaker?

This idea originated from Snowmaker by Tatham Oddie and his idea is from Josh Twist on MSDN. See references below. The biggest difference in this implementation from the others is that this version supports alpha numeric or any other character set as identities. Also both of those implementations used Azure blob storage text files to store the last reserved identity per table, whereas this implementation uses Azure table storage. Both blob storage and table storage support optimistic concurrency using ‘ETag’. This implementation uses table storage instead so that it can record schema information for each sequence.

13 Comments

By Norbert Kardos

Is it possible to include a decrementing sequence type in this solution?
Thanks, Norbert

By mlang74

It is possible. here are a few options.

1) You can create a new implementation of IOptimisticSyncStore
2) update or derive from CloudTableSequenceIdOptimisticSyncStore adding a direction parameter.

Combine one of those options with adding a LexicalSubtract extenstion method that does almost opposite of LexicalAdd in a couple key spots. I have not looked into this yet because I dont see a need for it yet.

So why do you need this? Isnt it better to start with a shorter Id string and grow over time than to start with a max string and worry about when you get down to ‘a’?

Do you just want values that index in table storage putting latest values first? There is another simple solution for that. Use a reverse timestamp instead…

By Norbert Kardos

Thanks for the explanation!
The reason was what you already guessed: I wanted to come the latest inserted first in some storage tables.
First I wanted to use the Ticks solution also, but I’m not sure if it can be treated as really unique (theoretically), if I get high number concurrent inserts (I’m writing a web api). Also I don’t want to use 64 bits in all cases, sometimes 32 is enough and timespan is int64.
SQL 2012 sequence would be fine as well, but it’s not implemented in Azure SQL Database.
Also you implemented radix 36 strings sequences, which was also my intention, so it is almost perfect for me 😉

By mlang74

The ticks solution will get things in the correct order at least. You can make composite partition and row keys. I haven’t written about that yet, but plan to eventually.

Basically, add another unique value to the row key after the ticks… such as a user Id, company id, or whatever is applicable to your scenario.

I typically store records more than once. First with a short auto generated id value per this article. Then another record per search combination, one search usually is a sort by time which means it includes ticks at the start of the row key.

As for efficiently querying records based on composite partition key and row key values, I use candor.windowsazure TableProxy … method QueryPartition.

By Norbert Kardos

I’m aware of this solution to add sg to make it unique, but this will make my row key even bigger. And I’m expecting let’s say 50M rows a day (in some tables).
My goal is to keep it short and unique in all cases.
I would store some rows more than once, but not for sorting reason, but for “joining” reason, to workaround some query which would be impossible on table storage.

By Norbert Kardos

Is it possible that this package breaks now on Azure Storage 3.0.2.0?
I’m experiencing exceptions that I’ve never seen with a nextid call.

By mlang74

You must have the Azure v3 SDK installed for the Azure 3.x NuGet package to work, whether or not you are using the candor azure library. I dont believe it has been released yet. Basically your Azure emulator does not support the 3.x packages. Wait until you have the azure 3.x SDK installed before you use those packages.

Once the azure SDK 3 is released, I will update the candor azure package.

By Girish

hi , pretty impressed with your code rather than SnowMaker , i am able succesfully implement Snowmaker but your code is little clumsy to understand i tried to put every piece of code of Candoy but still some piece is missing need help can i get your email ID for further/Future refrences and help
will be very glad for this act

By Girish Kalamati

can u please provide more about this article , this doc says ConfirmTableProxy but the details of the methods are inline and document also not available

so have trouble in setting the property of connectionString , says not able to parse the connectionstring as it is NULL but iam passing the connectionname in
private SequenceIdGenerator _confirmationNumberGenerator;

private SequenceIdGenerator ConfirmationNumberGenerator
{ //lazy loaded property, need to wait for ConnectionName to be set before loading

at Microsoft.WindowsAzure.Storage.CloudStorageAccount.Parse(String connectionString) at Candor.WindowsAzure.Storage.Table.CloudTableProxy`1.GetTable() at Candor.WindowsAzure.Storage.Table.CloudTableProxy`1.InsertOrUpdate(T item) at Candor.WindowsAzure.Storage.Table.CloudTableSequenceIdOptimisticSyncStore.InsertOrUpdate(SequenceIdSchema sequence) at Candor.Data.SequenceIdGenerator.RenewCachedIds(SequenceIdStore sequence) at Candor.Data.SequenceIdGenerator.NextId(String tableName) at SeqGenRole

By mlang74

Are you using this as a nuget package?

The code is shown for explanation of difference with the Azure SDK. It isn’t detailed enough here to just copy only the code from the article directly into your project.

The source on github has been updated with fixes since the article was published. The nuget package is also updated.

I can give more usage examples once I know how you are using it. It seems you are not passing in connection string or using what you pass in.

By johnhamm

I was inspired by your alphanumeric identifiers and implemented a pure javascript version of Snowmaker for Nodejs and the 0.3.3 latest version of the azure-storage SDK:

Plus, large numbers are compressed into small strings using base62. For instance, if one of your table identifiers ever reached 68,289,801,377,242 then that ID would be stored as “johnhamm”!

By mlang74

I’m glad to have helped.

It seems you still have an integer behind the scenes and you just convert the integer to a base62 encoded string to shorten length. Can it handle when you reach the maximum value for a 32 bit integer? Its not unrealistic to hit that limit in some use cases.

The candor version has no upper realistic integral limit behind getting to the max length string that can be used as a key, which would be totally insane.

I would warn against using mixed case when storing a memorable id as a key. Azure table storage is case sensitive, and users dont expect a case sesitive identifier. They wonder why ‘a’ does not equal ‘A’. I generate all my alphanumeric keys using the case insensitive option of LexicalIncrement, so essentially a 36 base number rather than your use of a base 62.

By johnhamm

It uses a javascript Number type (which has a max int of 9,007,199,254,740,992 that encodes to FfGNdXsE8)

I think I’ll make a caseSensitive property that will toggle base36 vs base62.