Introduction to Amazon SimpleDB

The best description of the Amazon SimpleDB service is given by Amazon itself here:

﻿Amazon SimpleDB is a highly available and flexible non-relational data store that offloads the work of database administration. Developers simply store and query data items via web services requests and Amazon SimpleDB does the rest. Unbound by the strict requirements of a relational database, Amazon SimpleDB is optimized to provide high availability and flexibility, with little or no administrative burden. Behind the scenes, Amazon SimpleDB creates and manages multiple geographically distributed replicas of your data automatically to enable high availability and data durability. The service charges you only for the resources actually consumed in storing your data and serving your requests. You can change your data model on the fly, and data is automatically indexed for you. With Amazon SimpleDB, you can focus on application development without worrying about infrastructure provisioning, high availability, software maintenance, schema and index management, or performance tuning.

Amazon stresses that SimpleDB is optimized for high availability, thaks to distributed replicas of data, and low administrative costs, as there are no schemas and indexes to manage. With SimpleDB, you don't have to define a schema for each database table before you can use it, and you don't have to change that schema before storing your data in a different way. As there is no schema, there are no data types, as all data values are treated as variable length char data. So, if you want to add a new field to an existing database, you just data the new field to the data items that require it, and there is no enforcing that all data items contains the same fields. The drawback of having no schema is that you have to handle formatting and type conversions in your code, and this has a serious impact on queries.

The terminology used in SimpleDB is different from the usual DBMS one, so the following table shows how the different terms are related:

DBMS

Amazon SimpleDB

table

domain

row

item

column

attribute

The domain is analogous to a database table, but a domain has no schema, so the only parameter you can set is the name of the domain. All the data stored in a domain has the form of name-value attribute pairs, and each attributes pair belongs to an item, which is similar to a table row. The attribute name is similar to a table column, but each item can have different attribute names, so you can store data with different layouts in the same domain, and add new fields to items without restructuring the table. It is also possible to have, for a given attribute, not just one value, but an array of values, so when you add an attribute name-value pair to an item, you can choose if SimpleDB should replace the existing pair with the same attribute pair, or add the new value to the existing pair.

Queries on SimpleDB are mostly done with a key-value approach, where you retrieve an item and its attributes from the item's name. But you can also use a SQL-style query language to issue queries over the scope of a single domain. When designing queries, you should be aware that all the data stored in SimpleDB is treated as plain string data (be careful with numerical data!), and that all values are automatically indexed.

Before deciding to use SimpleDB in your next project, you should consider the following service limits:

max storage per domain: 10 GB

max attribute values per domain: 1 billion

max domains per account: 250

max attribute values per item: 256

max length of item name, attribute name, or value: 1 KB

max query execution time: 5 sec

max query results: 2500

max query response size: 1 MB

max comparisons per query: 20

The SimpleDB API is really simple, as it contains only the basic functionalities for working on a domain:

CreateDomain — Create a domain that contains your dataset.

DeleteDomain — Delete a domain.

ListDomains — List all domains.

DomainMetadata — Retrieve information about creation time for the domain, storage information both as counts of item names and attributes, as well as total size in bytes.

PutAttributes — Add or update an item and its attributes, or add attribute-value pairs to items that exist already. Items are automatically indexed as they are received.

BatchPutAttributes — For greater overall throughput of bulk writes, perform up to 25 PutAttribute operations in a single call.

DeleteAttributes — Delete an item, an attribute, or an attribute value.

BatchDeleteAttributes — For greater overall throughput of bulk deletes, perform up to 25 DeleteAttributes operations in a single call.

GetAttributes — Retrieve an item and all or a subset of its attributes and values.

Select — Query the data set in the familiar, “select target from domain_name where query_expression” syntax. Supported value tests are: =, !=, <, > <=, >=, like, not like, between, is null, is not null, and every (). Example: select * from mydomain where every(keyword) = ‘Book’. Order results using the SORT operator, and count items that meet the condition(s) specified by the predicate(s) in a query using the Count operator.

SimpleDB and Delphi

Browsing the SimpleDB site, you will notice that there are no frameworks or code samples for using SimpleDB from Delphi code. But Delphi XE2 already has a framework for accessing some Amazon AWS services, including SimpleDB: the Data.Cloud.CloudAPI and Data.Cloud.AmazonAPI units contain a set of classes that greatly simplify the usage of these web services.

login to the AWS portal, click on Security Credentials, and copy the Access Key ID and the Secret Access Key, you will need them later on to access SimpleDB

download the compiled OpenSSL library, and decompress the ZIP file where you wil build the executable file. For running the given SimpleDB demo, you don't need to download OpenSSL, as it is already included in the ZIP file of the demo app.

After completing the preliminary steps, we are ready to start coding in Delphi. Start Delphi XE2, and let's build a sample app just like the one shown below:

First of all, drop a TAmazonConnectionInfo on the form, as this object will contain the parameters used to connect to the SimpleDB service. Then we add a Connect button, that opens a connection with the AWS services:

The only data required to open a connection to the SimpleDB service is the Access Key ID and the Secret Access Key that you previously copied from the AWS portal, so you copy this data into the AmazonConnectionInfo object, and then create an instance of the TAmazonTableService passing the AmazonConnectionInfo as a parameter. Please note that in this demo we will create an instance of the TAmazonTableServiceEx class, which is a derived class that extends the functionalities of the stock Delphi XE2 class, and it is included in this demo app.

Once the connection to SimpleDB is established, we can list the domains on our account with the RefreshDomainsList procedure:

The QueryTables method returns the names of domains in your SimpleDB profile.

The ListDomains operation lists all domains associated with the Access Key ID. It returns domain names up to the limit set by MaxNumberOfDomains. A NextToken is returned if there are more than MaxNumberOfDomains domains. Calling ListDomains successive times with the NextToken returns up to MaxNumberOfDomains more domain names each time.

In every code sample of this demo, you wil see that an instance of TCloudResponseInfo is created before calling the SimpleDB functions, and it is logged and freed at the end of the procedure. It is extremely important to check the result of operations, as many problems may arise when working with a remote DB (e.g. dropped connections), and it also really useful during development to pinpoint errors as quickly as possible. In this demo, the result of the SimpleDB action is only added to the log at the bottom of the screen, but production-ready code should be designed so that it can recover from failed calls.

Please note that domain names do not need to be unique across accounts (as it happens with S3 buckets), so you are not limited to only those names not yet taken by other SimpleDB users. Also consider that the minimum length of the domain name is three characters and the maximum length is 255 characters, and that the only valid characters are letters, numbers,‘_’,‘-’, and ‘.’. If you call CreateTable with the name of an already-existing domain, the CreateTable will do nothing and it will not return any error.

The CreateDomain operation creates a new domain. The domain name must be unique among the domains associated with the Access Key ID provided in the request. The CreateDomain operation might take 10 or more seconds to complete.

The dual operation of creating a table is deleting an existing table, erasing all the items contained in the domain:

If you try to delete a domain that does not exist, DeleteTable will do nothing and no error will be returned.

Before adding CreateTable and DeleteTable to automated test units, please consider that these two functions are by far the most expensive as they take a relatively long time to complete. Calling these functions in a unit test setup and tear-down phases will raise box usage charges.

The DeleteDomain operation deletes a domain. Any items (and their attributes) in the domain are deleted as well. The DeleteDomain operation might take 10 or more seconds to complete.

Now if we click on a domain name, we want to list the properties of the domain and the items contained in the domain.

However, should you try to compile this code, Delphi would stop compiling complaining that GetAttributes is not a method of the TAmazonTableService class. In fact, this method is defined in the TAmazonTableServiceEx class, contained in this project, that extends the base Delphi XE2 class with 3 new methods:

same as SelectRowsXML, but returns the items matching the query as TCloudTableRow objects

While the SelectRows is just an easier way to use the standard SelectRowsXML, both the GetAttributes methods are essential for easily getting the Attributes' values (you can get the same data by running a query on the item's name, but why running query when there is a ready-to-use functionality designed just to do that?). So to use these 3 additional methods, add the Data.Cloud.AmazonAPIEx unit to your project, and create an instance of TAmazonTableServiceEx instead of TAmazonTableService.

GetAttributes returns all of the attributes associated with the item. Optionally, the attributes returned can be limited to one or more specified attribute name parameters. Amazon SimpleDB keeps multiple copies of each domain. When data is written or updated, all copies of the data are updated. However, it takes time for the update to propagate to all storage locations. The data will eventually be consistent, but an immediate read might not show the change. If eventually consistent reads are not acceptable for your application, use ConsistentRead. Although this operation might take longer than a standard read, it always returns the last updated value.

Now that we know how to list the items in a domain, and how to retrieve the attributes of a given item, we can move on and modify the attributes of an item. The first action on the list of attributes is adding a new attribute:

To add a new attribute, or modify an existing one, you create an instance of TCloudTableRow (named NewRow in this code sample), then you add Attribute's name and value pairs, specifying if the new pair should replace an existing pair with the same name, or if the new value should be added to the existing list of values of the given Attribute name (according to the ReplaceData parameter). Once you have completed adding Attribute name and value pairs to the row, you add it to the domain with the InsertRow method.

The PutAttributes operation creates or replaces attributes in an item. You specify new attributes using a combination of the Attribute.X.Name and Attribute.X.Value parameters. You specify the first attribute by the parameters Attribute.1.Name and Attribute.1.Value, the second attribute by the parameters Attribute.2.Name and Attribute.2.Value, and so on. Attributes are uniquely identified in an item by their name/value combination. For example, a single item can have the attributes { "first_name", "first_value" } and { "first_name", second_value" }. However, it cannot have two attribute instances where both the Attribute.X.Name and Attribute.X.Value are the same. Optionally, the requester can supply the Replace parameter for each individual attribute. Setting this value to true causes the new attribute value to replace the existing attribute value(s). For example, if an item has the attributes { 'a', '1' }, { 'b', '2'} and { 'b', '3' } and the requester calls PutAttributes using the attributes { 'b', '4' } with the Replace parameter set to true, the final attributes of the item are changed to { 'a', '1' } and { 'b', '4' }, which replaces the previous values of the 'b' attribute with the new value. Conditional updates are useful for ensuring multiple processes do not overwrite each other. To prevent this from occurring, you can specify the expected attribute name and value. If they match, Amazon SimpleDB performs the update. Otherwise, the update does not occur.

The dual action of adding an Attribute to an item is removing an attribute from an item:

Obviously, no DB would be useful if you could not run queries on it, and so it is time to see how you run queries on SimpleDB domains. First of all, please note that SimpleDB is not a relational database, so you can only run queries on a single domain at a time. Also, the following code sample uses the SelectRows method introduced by TAmazonTableServiceEx, as it does the XML parsing, so remember to Data.Cloud.AmazonAPIEx unit to your project. For a detailed explanation of the SQL-like syntax of SimpleDB, please refer to the Amazon SimpleDB Developer Guide.

The Select operation returns a set of Attributes for ItemNames that match the select expression. Select is similar to the standard SQL SELECT statement. Amazon SimpleDB keeps multiple copies of each domain. When data is written or updated, all copies of the data are updated. However, it takes time for the update to propagate to all storage locations. The data will eventually be consistent, but an immediate read might not show the change. If eventually consistent reads are not acceptable for your application, use ConsistentRead. Although this operation might take longer than a standard read, it always returns the last updated value. The total size of the response cannot exceed 1 MB. Amazon SimpleDB automatically adjusts the number of items returned per page to enforce this limit. For example, even if you ask to retrieve 2500 items, but each individual item is 10 KB in size, the system returns 100 items and an appropriate next token so you can get the next page of results. For information on how to construct select expressions, see Using Select to Create Amazon SimpleDB Queries.

The specifications of the XML data format returned by Select are here. The code looks for Items, extracts the name of each item and then looks for the Attribute name and value pairs, and adds each of them as columns of the resulting row. The result is a list of TCloudTableRow.

The GetAttributesXML method adds support for the GetAttributes request:

Latest Articles

Standing out of the pack starts by being visible, and being noticed by the right group of professionals. No matter how good your profile is, it is lost in a sea of similar profiles, so you need to show up and start attracting

There are many ways to extract data elements from web pages, almost all of them prettier and cooler than the method proposed here, but as we are in an hurry, let's get that data quickly, ok? Suppose we have to extract the

One of the most common roadblocks when scraping the content of web sites is getting the full contents of the page, including JS-generated data elements (probably, the ones you are looking for). So, when using CEFSharp to scrape

Two good news: file I/O is unit-testable, and it is surprisingly easy to do. Let's see how it works!
A software no-one asked for
First, we need a piece of software that deals with files and that has to be unit-tested. The

If you encounter the following error when pulling a repository in SourceTree:
VirtualAlloc pointer is null, Win32 error 487
it is due to to the Cygwin system failing to allocate a 5 MB large chunk of memory for its heap at