Give Codeship a try

Want to learn more?

Recently, I worked on an IoT-based project where we had to store time-series data coming from multiple sensors and show real-time data to an enduser. Further, we needed to generate reports and analyze gathered information.

To deal with the continuous data flowing from the sensors, we chose DynamoDB for storage. DynamoDB promises to handle large data with single-digit millisecond latency, at any scale. Since it’s a fully managed database service, we never had to worry about scaling, architecture, and hardware provisioning. Plus, we were using AWS IoT for sensors, so choosing a NoSQL database from AWS services like DynamoDB was the right decision.

Note that this is the first in a three-part series about DynamoDB. The next part, Querying and Pagination with DynamoDB, explains different ways of querying and the importance of choosing the right indexes and pagination. The last part, Partitioning Behavior of DynamoDB, digs deeper into DynamoDB with detailed analysis of partitioning behavior and strategies to cope with common problems such as the Hot Key Problem.

For this article, the first in the series, we’ll focus on architecture and basic concepts to get you started with DynamoDB.

The Basics

DynamoDB tables are made of items, which are similar to rows in relational databases, and each item can have more than one attribute. These attributes can be either scalar types or nested.

Everything starts with the Primary Key Index

Each item has one attribute as the Primary Key Index that uniquely identifies the item. The Primary Key is made of two parts: the Partition Key (Hash Key) and the Sort Key (Range Key); where the Range Key is optional. DynamoDB doesn’t just magically spread the data into multiple different servers to boost performance, it relies on partitioning to achieve that.

Partitioning is similar to the concept of sharding seen in MongoDB and other distributed databases where data is spread across different database servers to distribute load across multiple servers to give consistent high performance. Now think of partitions as similar to shards and the Hash Key specified in the Primary Key determines in which shard the item will be stored.

In order to determine in which partition the item will be stored, the Hash Key is passed to a special hash function which ensures that all items are evenly spread across all partitions. This also explains why it is called a Partition Key or Hash Key. The Sort Key on the other hand, determines the order of items being stored and allows DynamoDB to have more than one item with the same Hash Key.

The Sort Key, when present and combined with the Partition Key (Hash Key) forms the Primary Key Index, which is used to uniquely identify a particular item.

This is very useful for time series data such as the price of stocks, where the price of one stock item changes over time and you need to track the price in comparison to the stock. In such cases, the stock name can be the Partition Key and the date can be used as a Range Key to sort data according to time.

Secondary indexes

Primary indexes are useful to identify items and allow us to store infinitely large amounts of data without having to worry about performance or scaling, but soon you will realize that querying data becomes extremely difficult and inefficient.

Having worked with relational databases mostly, querying with DynamoDB was the most confusing aspect. You don’t have joins or views as in relational databases; denormalization helps, but not much.

Secondary indexes in DynamoDB follow the same structure as the Primary Key Index, where one part is the Partition Key and the second part is the Sort Key, which is optional. Two types of secondary indexes are supported by DynamoDB: the Local Secondary Index and the Global Secondary Index.

Local Secondary Index (LSI):

The Local Secondary Index is a data structure that shares the Partition Key defined in the Primary Index, and allows you to define the Sort Key with an attribute other than the one defined in the Primary Index. The Sort Key attribute must be of scalar type.

While creating the LSI, you define attributes to be projected other than the Partition Key and the Sort Key, and the LSI maintains projected attributes along with the Partition Key and Sort Key. The LSI data and the table data for each item is stored inside the same partition.

Global Secondary Index (GSI):

You will need to query data from a different attribute than the Partition Key. You can achieve this by creating a Global Secondary Index for that attribute. GSI follows the same structure as the Primary Key, though it has a different Partition Key than the Primary Index and can optionally have one Sort Key.

Similar to the LSI, attributes to be projected need to be specified while creating the GSI. Both the Partition Key attribute and Sort Key attribute need to be scalar.

You definitely should look up the official documentation for GSI and LSI to understand how indexes work.

!Sign up for a free Codeship Account

Setup for DynamoDB

DynamoDB doesn’t require special setup, as it is a web service fully managed by AWS. You just need API credentials to start working with DynamoDB. There are two primary ways you can interact with DynamoDB, using AWS SDK for Ruby or Dynamoid.

Both libraries are quite good, and Dynamoid offers an Active Record kind of interface. But to get an overview of how DynamoDB works, it’s better to start with AWS SDK for Ruby.

In Gemfile,

gem 'aws-sdk', '~> 2'

First of all, you need to initialize a DynamoDB client, preferably via an initializer so as to avoid instantiating a new client for every request you make to DynamoDB.

AWS provides a downloadable version of DynamoDB, ‘DynamoDB local’, which can be used for development and testing. First, download the local version and follow the steps specified in the documentation to set up and run it on a local machine.

To use it, just specify an endpoint in the DynamoDB client initialization hash as shown below:

Put it all together with this simple example

Suppose you need to store the information of users along with shipping addresses for an ecommerce website. The users table will hold information such as first_name, last_name, email, profile_pictures, authentication_tokens, addresses, and much more.

In relational databases, a users table might look like this:

And addresses and authentication tokens will need to be placed separately in other tables with the id of a user as Foreign Key:

Addresses:

Authentication Tokens:

In DynamoDB, there is no concept of Foreign Keys and no joins. There are ways to reference data related to an item from another table as we do in relational databases, but it’s not efficient. A better way would be to denormalize data into a single users table. As DynamoDB is a key value store, each item in a users table would look as shown below:

Make email as the Partition Key of the Primary Key and leave the Range Key optional, as each user will have a unique email id, and we definitely need to look up a user having a particular email id.

In the future, you might need to search users with first_name or last_name. This requirement makes first_name and last_name ideal candidates for the Range Key. Additionally, you may want to get users registered on a particular date or updated on a particular date, which can be found with created_at and updated_at fields, making them ideal for the Partition Key in the Global Secondary Index.

For now, we will make one Global Secondary Index (GSI), where created_at will be the Partition Key and first_name will be Range Key, allowing you to run queries like:

ActiveModel::Model adds callbacks, validations, and an initializer. The main purpose for adding it is to initialize the User model with parameters hash like Active Record does.

ActiveModel::Serialization provides serialization helper methods such as to_json, as_json, and serializable_hash for objects of the User class. After adding these two modules, you can specify the attributes related to the User model with attr_accessor method. At this point, the User model looks like this:

You can create User objects with User.new, pass parameters hash, serialize, and deserialize it, but you cannot persist them in DynamoDB. To be able to persist the data, you will need to create a table in DynamoDB and allow the model to know about and access that table.

I prefer to create a migrate_table! class method where I put the logic required for table creation. If the table already exists, it will be recreated, and the application should wait until the table is created as table creation on DynamoDB takes around a few minutes.

The instance method save simply saves an item and returns either true or false depending upon the response. The instance_values method returns hash of all the attr_accessor fields, which is passed to the item key as item_hash.

The return_values option inside the put_item request determines whether you want to receive a saved item or not. We are just interested in knowing whether an item is saved successfully or not, hence ‘NONE’ was passed.

Reading an item

Getting an item from DynamoDB with the Primary Key is similar to the way records are found by ids by Active Record in relational databases.

The #get_item method is used to fetch a single item for a given Primary Key. If no item is found, then the method returns nil in the item’s element of response.

Updating an item

An item is updated with the #update_item method, which behaves more like the upsert (update or insert) of PostgreSQL. In other words, it updates an item with given attributes. But in case no item is found with those attributes, a new item is created. This might sound similar to how #put_item works, but the difference is that #put_item replaces an existing item, whereas #update_item updates an existing item.

While updating an item, you need to specify the Primary Key of that item and whether you want to replace or add new values to an existing attribute.

Look closely at how item_hash is formed. The attribute hash is passed normally to update the method, which is processed further to add two fields — value and action — in place of simple key => value pairs. This modifies a given simple hash as shown below to attribute_updates key compatible values.

Deleting an item

The #delete_item method deletes items with a specified Primary Key. If the item is not present, it doesn’t return an error.

Conditional Writes

All DynamoDB operations can be categorized into two types: read operations, such as get_item, and write operations, such as put_item, update_item, and delete_item.

These write operations can be constrained with specified conditions, such as put_item, and should be performed only if a certain item with the same Primary Key does not exist. All write operations support these kinds of conditional writes.

For example, if you want to create an item only if it isn’t present, you can add attribute_not_exists(attribute_name) as a value of the condition_expression key in the #put_item method params.

Query and Scan Operations

Batch operations are not very useful for querying data, so DynamoDB provides Query and Scan for fetching records.

Query: Lets you fetch records based on the Partition Key and the Hash Key of the Primary or Secondary Indexes.

You need to pass the Partition Key, a single value for that Partition Key, and optionally the Range Key with normal comparison operators (such as =, >, and <) if you want to further narrow down the results.

Scan: A Scan operation reads every item in a table or a secondary index, either in ascending or descending order.

Both the Query and Scan operations allow filters to filter the result returned from the operation.

Conclusion

The fully managed NoSQL database service DynamoDB is good if you don’t need to worry about scaling and handling large amounts of data. Even though it promises the high performance of single-digit milliseconds and is infinitely scalable, you need to be careful when designing table architecture and choosing Primary Key and Secondary Indexes, otherwise, you can lose these benefits and costs may go high.

Ruby SDK v2 provides an interface to interact with DynamoDB, and you can read more about methods provided in the given SDK Documentation. Also, the Developer Guide is a perfect place to understand DynamoDB in depth.

Subscribe via Email

Over 60,000 people from companies like Netflix, Apple, Spotify and O'Reilly are reading our articles. Subscribe to receive a weekly newsletter with articles around Continuous Integration, Docker, and software development best practices.

We promise that we won't spam you. You can unsubscribe any time.

Join the Discussion

Leave us some comments on what you think about this topic or if you like to add something.

Vyacheslav

Thanks for the article. I would find article name confusing though. First you make a statement that DynamoDB is slow and not scalable. The second part of the article is a basic example how to use DynamoDB.

If you make a statement, can you please also provide facts that support this statement?

It’s very tempting to say that something is slow. But as a software developer I’d like to listen what kind of problem you had and how you solved it. Or how a particular tool was not able to solve it.

Parth Modi

Hi Vyacheslav,

I mentioned multiple times in the article that while using DynamoDB you don’t need to worry about scaling, and the DynamoDB promises the high performance of single-digit milliseconds and is infinitely scalable. Please share which part of article you find misleading where I talk about DynamoDB being slow and not scalable.

I explained why I chose DynamoDB to solve problem of handling massive data flowing through IoT devices in beginning of the article. DynamoDB is working great so far, hence I don’t have anything to complain about speed or scalability, yet!

Actually modeling relationships was the topic which made me interested in learning more about DynamoDB. Thanks.

Lucas Costa

Hi, great post! You stated that “Batch Write Item can perform any write operation such as #put_item, #update_item, or #delete_item for one or more tables”. But the docs you referred are explicit on this: “Note: BatchWriteItem does not support UpdateItem requests”. Have you tried it?