I am Anders Karlsson, and I have been working in the RDBMS industry for many, possibly too many, years. In this blog, I write about my thoughts on RDBMS technology, happenings and industry, and also on any wild ideas around that I might think up after a few beers.

Wednesday, March 21, 2012

Amazon DynamoDB ... Is it any good

As you might have noticed, I'm getting further away from MySQL here. This is just how things are I guess, I just do much less work with MySQL these days. The first migration was from MySQL to MongoDB, which was some time back. This was pretty successful, but note that we still have some data in MySQL, but the bulk of the data is in MongoDB right now.

Running any database on Amazon (and we run all databases on Amazon or on Amazon RDS service) may be costly, depending on how you utilize your resources. The recently announced Amazon DynamoDB is Amazons NoSQL service offering, but it is not like MongoDB with a twist, far from it. If you have read what I have written about MongoDB, I have now and then complained about the lack of functionality, but to be honest, I have learnt to live with MongoDB and it's shortcomings, and have started to like many of the JavaScript features of it (one thing I hate about it, and about JavaScript in general though, is the numeric datatype. It's just plain silly).

That said, we are now taking a shot at migrating again, this time to DynamoDB. In comparison with MongoDB, DynamoDB is incredibly simplistic, there are very few things you can do. To begin with, there is one "index" and one index only on each table. This index is either a unique hash key (that is what they call it) or a combination of a hash-key and a range-key (a unique composite key). I'll soon get into the gory details later. You can not have secondary indexes, i.e. indexes on any other attribute than the "primary key" or whatever you want to call it.

You can then read data in 1 of three ways. Simple:

You read a single row by unique key access. If you have a composite hey, provide both the hash-key and the range-key, else provide just the hash key.

You scan the whole table.

If you have a composite key, access by the hash-key part and scan (you may filter, but in essence, this is still a scan) on the range key.

There is nothing else you can do, and note that unless doing a full table scan, you must always provide the hash-key, i.e. if you do not know the exact hash key for the row to get, you have to do a full table scan. There is just no other way.

The supported datatypes aren't overly exciting either: Number, String and a Set of Number and String. The string type is UTF-8 and the Number is a signed 38 precision number. Other notable limits is that there is a max of 64 K per row limit, and that a scan will only scan up to a max of 1Mb of data. Note that there is no binary datatype (we have binary data in out MongoDB setup and use base64 encoding on that in DynamoDB).

Pricing is interesting. What you pay for is throughput and storage, which is pretty different from what you may be used to. Throughput may adjusted to what you need, and it's calculated in kb of row data per second, i.e. a table with rows of up to 1Kb in size that with a requirement of 10 reads per second will mean you need 10 units of read capacity (there is a similar throughput number for write capacity). Read more on pricing here.

All in all, DynamoDB still has to prove itself, and in some situations it might turn out expensive, in other situations not so. I have one major big gripe with DynamoDB, before I close for this time: DynamoDB is not Open Source, nor is it a product you can buy, except as a service from Amazon. You want to run DynamoDB on a machine in your datacenter? Forget it. This annoys the h*ll out of me!

We are still testing, but so far I am reasonably happy with DynamoDB, despite the issues listed above. The lack of tools (no, there are no DynamoDB tools. At all. No backup tool, no import / export, nothing) means that a certain amount of app development is necessary to access it, even for the simplest of things. Also, there is no Backup, but I am sure this will be fixed soon.

What this post is about is an ongoing migration to DynamoDB from MongoDB. We are not there yet, but we are on the way. As for the rationaly, it's cost and performance. We hope to get more performance from DDB (SSD disks etc) than from our current MongoDB setup, which is heavily sharded, and hence is a costly and complex setup to manage.