JVM Languages

Amazon SimpleDB: A Simple Way to Store Complex Data

By Paul Tremblett, January 22, 2010

Simpler solutions are often better than their more complex counterparts

The presence of the last two letters in the name "Amazon SimpleDB" is perhaps unfortunate; it immediately invokes images of everything we have learned about databases; unless, like me, you cut your teeth on a hierarchical database like IMS, that means relational databases and all of the baggage that comes with them: strictly defined fields, constraints, referential integrity and having most of what you are allowed to do defined and controlled by a DBA -- hardly deserving of being described as simple. To allay any apprehensions even thinking of such things might arouse, let me state that Amazon SimpleDB is not just another relational database. So just what is SimpleDB? The most effective way I have found to understand SimpleDB is to think about it in terms of something else we all use and understand -- a spreadsheet. Look at the spreadsheet in Figure 1.

Figure 1: Common spreadsheet.

Typically, you organize spreadsheets into worksheets. In the world of SimpleDB, the approximate counterpart of a spreadsheet is a "domain", which is why I've labeled the tabs at the bottom Domain1, Domain2, etc. instead of more familiar Sheet1, Sheet2, etc.. In a spreadsheet, a worksheet contains a number of rows; SimpleDB has items. When you set up your spreadsheet, you usually create column headers whose names indicate the kind of data that appears in a given column. In SimpleDB, you would call the column headers attribute names.

But when you start putting data into individual cells, the similarity between a spreadsheet and SimpleDB ends. You can almost think of SimpleDB as a 3D spreadsheet, where every cell can contain multiple values. Each such value is expressed as a name-value pair called an "attribute". If you consider sets of attributes as tuples, you could describe SimpleDB as a "domain/item/attribute tuple space model."

Before rolling up your sleeves and getting started with SimpleDB, you will need a pay-as-you-go Amazon Web Services (AWS) account. When you create your account, you will be given an access ID and a secret key. You will need these to use the sample code I present in exploring SimpleDB.

SimpleDB Client Libraries

Interaction between a client and the SimpleDB engine is in the form of a web service. Application programmers prefer not to deal with the low level details of web services but rather to use an API that shields them from the underlying infrastructure and reduces their effort to simply creating instances of Java objects and invoking their methods. There are a number of client libraries that deliver such an API for SimpleDB for the major programming languages. The apparent winner in the Java world appears to be Typica, which provides access to several other Amazon services in addition to SimpleDB. In this article, I use the Java Library for Amazon SimpleDB from Amazon. This does not imply any endorsement nor should it be construed as a recommendation. It just happens to be the one I first started using. The right library for you will be the one that experimentation proves is the right library for you.

Creating and Listing Domains

CreateDomain.java (available here along with the source code and related files) takes three command line arguments. The first two are the AWS access ID and secret key respectively; the third is the name of the domain to be created. The essence of the program can be found in the following lines of code:

The code is so simple that a detailed explanation is unnecessary -- and the good news is that the code I present in this article is almost as easy to understand and use.

Once you have created the domain, you can use ListDomains.java (available here) to obtain a list of all of the domains that have been created for the account. The code is similar to the code we just saw in CreateDomains.java. It creates an instance of AmazonSimpleDB to which it sends a ListDomainsRequest object as an argument. The list of the domains is returned in a ListDomainsResponse object. To reinforce the fact that the response is really an XML message, I displayed the String that was returned when I invoked the ListDomainsResponse object's toXML() method. It looks like this:

Notice that the list of domains (in this case only one) is returned as a collection of <DomainName> nodes, each of which is a child node of the <ListDomainsResult> node. We retrieve the list of domains from the ListDomainsResponse object using the following code:

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!