Introduction

Amazon SimpleDB is a web service for storing, maintaining, and querying structured data sets in real time. All data is stored in Amazon's web service cloud, making SimpleDB very reliable, scalable, and flexible.

After reading this article, Rails developers will be able to quickly integrate SimpleDB as a storage backend for their projects.

Why Rails?

The Rails web application framework, apart from generally being a wonderful tool, offers out-of-the-box support for web service-based data stores with its ActiveResource sub-framework. Only a very thin adapter layer is necessary to bridge the ActiveResource API to SimpleDB. Rails gives you the unique opportunity to utilize SimpleDB just as any other RESTful resource provided by a Rails application.

This article assumes a basic understanding of SimpleDB and Rails and is based on Rails 2.0.2 (the latest shipping version). If you want to dive into Rails, see the Resources section at the end of this article.

Behind The Scenes

For this tutorial, we are going to use a Rails Plugin called AWS SDB Proxy acting as an adapter layer.

AWS SDB Proxy is an HTTP server built with WEBrick (a pure Ruby web server implementation that comes with Ruby's standard library). The proxy will listen for web service calls initiated by ActiveResource models and forward the requests to SimpleDB using the aws-sdb gem by Tim Dysinger.

URL mapping

ActiveResource uses the standard Rails RESTful routes to access web services. The following table illustrates the mapping of Rails' HTTP actions and URIs to SimpleDB operations performed by AWS SDB Proxy:

HTTP/REST

SimpleDB

GET /domain/resource?attribute=value[&...]

QUERY by exact attribute values

GET /domain/resource/query?query_string

QUERY by SimpleDB query string

GET /domain/resource/itemID

GET ATTRIBUTES

POST /domain/resource/itemID

PUT ATTRIBUTES (create item)

PUT /domain/resource/itemID

PUT ATTRIBUTES (replace)

DELETE /domain/resource/itemID

DELETE ATTRIBUTES (delete item)

Special Attributes

AWS SDB Proxy handles a couple of special attributes transparently. Here is the complete list:

id: Every record automatically gets assigned a unique id using a hash function (see below)

_resource: This attribute will always hold the name of the Rails model the item belongs to. This way AWS SDB Proxy can story multiple Rails models within one SimpleDB domain

created_at: The time the item was initially was created at (ISO 8601 format)

updated_at: The time the item was last modified (ISO 8601 format)

ID Hashing

Record ids are generated using a SHA512 hash function on the request body combined with a timestamp and a configurable salt (config/aws_sdb_proxy.yml). This huge 512 bit hash will make key collisions extremely unlikely.

Pros & Cons

Over at RubyForge, a couple of work-in-progress projects aim at providing an ActiveRecord adapter to SimpleDB. In theory that would enable SimpleDB to become a drop-in replacement for a regular SQL RDBMS in any Rails project. Then again, a lot of jumping through hoops would have to go on to make that happen. SimpleDB just is no RDBMS and is targeted towards completely different usage patterns.

ActiveResource on the other was made exactly with data intergration of remote web services in mind. Despite its somewhat limited feature set compared to ActiveRecord, I like the idea to persue this straight-forward approach.

Getting Up and Running

To set up Amazon SimpleDB for Rails, follow these steps (I assume, you already created a Rails project and have your command line pointed to its directory):

Enter your Amazon Web Service credentials in the config/aws_sdb_proxy.yml file (optionally configure server ports and an individual salt used to generate primary keys with a hash function). Do this at least for the development environment.

Either use an existing SimpleDB domain in your account (you can list your domains with rake aws_sdb:list_domains), or create a new one with rake aws_sdb:create_domain DOMAIN=my_new_domain

Start the AWS SDB Proxy server: rake aws_sdb:start_proxy_in_foreground proving debug output on stdout (once you are confident with the configuration you can use rake aws_sdb:start_proxy to start the server as a background daemon)

Using ActiveResource

To make a Rails model access SimpleDB, it must inherit from ActiveResource::Base. For the following examples we will use a Post model that could represent blog posts and thus create the following models/post.rb file:

It assumes that you run your AWS SDB Proxy on localhost at port 8888 and uses a SimpleDB domain named my_new_domain (adjust this according to the configuration you entered in config/aws_sdb_proxy.yml).

As you can see, a new Post object gets created as usual. The AWS SDB Proxy auto-assigns an id and the additional attributes created_at and updated_at mimicking Rails' standard behaviour.

Please note, that we could have assigned any attributes we wanted to that Post; SimpleDB does not enforce a schema and thus the proxy will happily accept any attributes we throw at it.

Note: Remember that all attributes will be coerced into strings for storage in SimpleDB. No matter what your original data type was, you will always get back a string representation of it when fetching records from SimpleDB.

Updating Records

Let's assign another attribute to the Post and save it to trigger an update operation:

>> p.body = 'Content is king'
=> "Content is king"
>> p.save
=> true

If you started AWS SDB Proxy in foreground, you can see it forward the save operation to SimpleDB.

The first form queries for a single Post with a given id, whereas the second form queries for records with exact matches on every given attribute (:first tells the find method to return only the first of those).

SimpleDB offers more sophisticated query operations than ActiveResource, including lexicographical comparisons, intersection and union. You can pass in native SimpleDB query syntax using this form of find:

Martin Rehfeld is passionate about Ruby on Rails. He has published several Rails plugins and regularly gives talks at Rails related events. If you like Martin's work, consider recommending him on Working With Rails.

Comments

good service, bad libraries and incomplete documentation

the gem has tons of problems. furthermore with the described method, complex queries are not supported (such as OR queries - yes, you can call that complex with simpledb). all in all i would like to see more examples and an updated gem. i think amazon should have enough resources to do that.

It seems that the plugin does not because it is using an old constructor for the aws_sdb library. If you go in to
vendor/plugins/aws_sdb_proxy/lib/aws_sdb_proxy/server.rb
and replace
SDB_SERVICE = AwsSdb::Service.new(Logger.new(nil),CONFIG['aws_access_key_id'],CONFIG['aws_secret_access_key'])
with
SDB_SERVICE = AwsSdb::Service.new({:access_key_id=>CONFIG['aws_access_key_id'],:secret_access_key=>CONFIG['aws_secret_access_key']})
it seems to fix some problems

Interesting article - though the example provided (posts on a blog) begs the question "How do you handle collections?".
Suppose each post is categorized. And you want to maintain attributes about your categories? Do you then store the 'category_id' in the post record? And if so, is there an easier way to fetch all of the category data than looping over an array of category ids?
I really like the concept behind SimpleDB - but I'm not sure of the best approach for implementing even a modestly complex data structure with normalized data elements.
Am I completely missing something obvious?

So, using the proxy clearly is valuable in that it enables the use of REST ... for a client Ruby app and many other apps that can do REST as well. Bravo.
In terms of #s of AMIs needed in a scaleable architecture, I'll need to know how many requests per second such a proxy can handle in a normal EC2 AMI? As apps scale up, we'll need 1 proxy for every N app servers, readers will like to be able to estimate what that number N is.
Good clear article ... well written, easy to follow.
FHW