Search This Blog

Creating Sequences using SDB

Thanks to the new functionality in Amazon's SimpleDB we can now choose to have consistent DBs instead of highly scalable ones. Previously if we wanted to have a database that was guaranteed to be consistent right away, our only choice was to use our own SQL database, or use RDS.

But why would you want to trade in performance and availability for consistency? It's quite simple, if you've ever tried to generate sequential numbers for any reason (typically because people don't like using random UUIDs), then you know the ONLY way to do this is by using a locking mechanism. Since SDB was previously only eventually consistent, this made it impossible to use such a database for that purpose.

Thanks largely to Mitch Garnaat's post about how to create Counters, I've been able to create a "Sequence" object for boto that will allow you to persist a SequenceGenerator into SDB, and use it reliably across multiple locations, threads, and processes. This new functionality is now in boto.

Using this new sequence object is relatively simple. First, if you have a [DB] section already in your boto.cfg, it's easy to set up a default domain for your sequences. The Sequence object will look first for a key "sequence_db", and if that doesn't exist it will fall back to "db_name" which is used by the rest of the boto.sdb.db module as well. A sample config section would look like this:

[DB]
db_name = default
sequence_db = sequences

Next it's time to launch up python and start playing around.

>>> from boto.sdb.db.sequence import Sequence
>>> s = Sequence() # Note that we can pass in an optional name
>>> s.id # but if we don't it just uses a UUID
'1ce3eb7b-3fdd-4c60-b243-ec33019090bd'
>>> s.val # The value is set to the first value in our set
0
>>> s.next() # Lets get the next value in this set
1
>>> s2 = Sequence(s.id) # Lets load up this set in another object
>>> s2.val # The value should be the same, even if this was somewhere else
1
>>> s.next() # We increment our first object
2
>>> s2.val # And when we look at the second object it's also incremented
2
>>> s.delete()

So that's all fine and dandy if we're just using a simple sequence, but what if we want to do something more complicated like a fibonacci sequence? Lucky for us this is built into our sequence module:

But what is this "fnc" argument you ask? Quite simply the sequence object allows you to pass in a custom function that determines how to get the next value in the sequence. This function is passed in both the current value, and the previous value in the sequence. The fibonacci function, which you could have made yourself, simply looks like this:

def fib(cv=1, lv=0):

"""The fibonacci sequence, this incrementer uses the

last value"""

if cv == None:

cv = 1

if lv == None:

lv = 0

return cv + lv

The important things here to remember is that the first value in the sequence must be returned if both the first and second values passed into the function are "None". The first value passed into this function is the "current" value of the sequence, and the second value passed in is the "last" or "previous" value that was in the sequence just before our current value.

So this is great if you're dealing with integers, but what if I want to increment a string, or a double, or for that matter any random sequence? Lucky for us the cast type is determined automatically, so whatever types you have in your sequence will be the types that come back out of it. So, for example, if you have a string sequence that you want to increment easily, you can use the "increment_string" function:

>>> from boto.sdb.db.sequence import increment_string

>>> s = Sequence(fnc=increment_string)

>>> s.val

'A'

>>> s.next()

'B'

>>> s.next()

'C'

>>> s.val = "Z"

>>> s.next()

'AA'

>>> s.delete()

So what's the magic of this "increment_string" function? Let's take a look:

increment_string = SequenceGenerator("ABCDEFGHIJKLMNOPQRSTUVWXYZ")

What's this SequenceGenerator stuff? Quite simply you can pass in either a string or a list and it'll use that to determine what the next value in the sequence should be. You can also pass in an optional value called "rollover" which will prevent the sequence from "rolling over" and instead just make it return back to the initial value, so instead of going from "Z" to "AA", it would go back to "A":

>>> from boto.sdb.db.sequence import SequenceGenerator

>>> s = Sequence(fnc=SequenceGenerator("ABC", True))

>>> s.val

'A'

>>> s.next()

'B'

>>> s.next()

'C'

>>> s.next()

'A'

>>> s.delete()

With this new SequenceGenerator, and the Sequence object now available in boto.sdb.db, you should now be able to generate any sort of sequence kept in SDB that you could think of.

Popular Posts

Ever wonder how sites like battle.net support things like this in Google Chrome?

Well I did, so I did a little bit of digging. It turns out Google Chrome supports an open standard called Open Search. This format is relatively simple, and very easy to add to your own site. I just added it to some of our systems in under 5 minutes.

Adding OpenSearch to your site is incredibly simple, you just have to add a simple tag to your index HTML page, and add a simple XML file that it points to. The link tag looks like this:
<link rel="search" type="application/opensearchdescription+xml" href="http://my-site.com/opensearch.xml" title="MySite Search" />

For a while, I have been creating command line tools provided right with boto which I used to manage AWS. Recently, others have become interested in these tools as well, and I've seen several other contributors adding to these tools to make them even more useful to others. One recent submission by Ales Zoulek added some nice features to my list_instances command, which I use on a regular basis to list out the instances that are currently active for my account in EC2.

Amazon now lets you add Tags to EC2 objects such as Instances and Snapshots. This allows you to actually "Name" your EC2 instance, as well as add some metadata that could be used for AMI initialization, etc. Ales added the ability to list these tags by name within the list_instances command line application:

Last week, Amazon announced the launch of a new product, DynamoDB. Within the same day, Mitch Garnaat quickly released support for DynamoDB in Boto. I quickly worked with Mitch to add on some additional features, and work out some of the more interesting quirks that DynamoDB has, such as the provisioned throughput, and what exactly it means to read and write to the database.

One very interesting and confusing part that I discovered was how Amazon actually measures this provisioned throughput. When creating a table (or at any time in the future), you set up a provisioned amount of "Read" and "Write" units individually. At a minimum, you must have at least 5 Read and 5 Write units partitioned. What isn't as clear, however, is that read and write units are measured in terms of 1KB operations. That is, if you're reading a single value that's 5KB, that counts as 5 Read units (same with Write). If you choose to operate in eventually consistent mode, you'r…