Search This Blog

Paging SDB results in boto

Boto can be a great tool if you're querying against SDB, and it helps you out by managing paging automatically for you so you don't have to keep querying it for the next set of results. If you're dealing with a web-based application, however, you have to deal with your own paging and simply iterating forever over a large result set will eventually time out your connections. To solve this, you can use the built-in paging system provided by boto.

Everytime you query using "db.select" in boto, you get back a result set. Most people probably just think of this as an iterator, since it does all the magic behind-the scenes and only queries when you start iterating. It also stores that magical "next_token" within itself so it can query for the next page of results from SDB. Normally, you wouldn't even notice this attribute, but if you're dealing with a service that needs to return in a short amount of time, it can be quite useful.

Additionally, there are two important keyword arguments you can specify to the "select" command on any domain. These are max_items, and next_token. The max_items keyword tells boto to return after it has yielded that number of results, instead of simply handling the paging automatically for you. It's also quite important to add the limit SDB command to your query or boto will return in the middle of the result set and you will lose those middle results!

Ok, now to the code:

>>> import boto

>>> sdb = boto.connect_sdb()

>>> db = sdb.get_domain("default")

>>> rs = db.select("SELECT * FROM `default` LIMIT 10", max_items=10)

Notice that we set "LIMIT" and "max_items" both to 10.

Also note that "rs" is the result set of your select query, but only runs after you start iterating, rs.next_token should be blank now

>>> rs.next_token

>>> for i in rs:

... print i

Your first 10 results will print out, now rs.next_token is set:

>>> rs.next_token

u'r........'

Now you can pass that next_token back to the SAME select, it must be the EXACT same query for next_token to work:

Popular Posts

Ever wonder how sites like battle.net support things like this in Google Chrome?

Well I did, so I did a little bit of digging. It turns out Google Chrome supports an open standard called Open Search. This format is relatively simple, and very easy to add to your own site. I just added it to some of our systems in under 5 minutes.

Adding OpenSearch to your site is incredibly simple, you just have to add a simple tag to your index HTML page, and add a simple XML file that it points to. The link tag looks like this:
<link rel="search" type="application/opensearchdescription+xml" href="http://my-site.com/opensearch.xml" title="MySite Search" />

For a while, I have been creating command line tools provided right with boto which I used to manage AWS. Recently, others have become interested in these tools as well, and I've seen several other contributors adding to these tools to make them even more useful to others. One recent submission by Ales Zoulek added some nice features to my list_instances command, which I use on a regular basis to list out the instances that are currently active for my account in EC2.

Amazon now lets you add Tags to EC2 objects such as Instances and Snapshots. This allows you to actually "Name" your EC2 instance, as well as add some metadata that could be used for AMI initialization, etc. Ales added the ability to list these tags by name within the list_instances command line application:

Last week, Amazon announced the launch of a new product, DynamoDB. Within the same day, Mitch Garnaat quickly released support for DynamoDB in Boto. I quickly worked with Mitch to add on some additional features, and work out some of the more interesting quirks that DynamoDB has, such as the provisioned throughput, and what exactly it means to read and write to the database.

One very interesting and confusing part that I discovered was how Amazon actually measures this provisioned throughput. When creating a table (or at any time in the future), you set up a provisioned amount of "Read" and "Write" units individually. At a minimum, you must have at least 5 Read and 5 Write units partitioned. What isn't as clear, however, is that read and write units are measured in terms of 1KB operations. That is, if you're reading a single value that's 5KB, that counts as 5 Read units (same with Write). If you choose to operate in eventually consistent mode, you'r…