I've just started looking into Amazon's DynamoDB. Obviously the scalability appeals, but I'm trying to get my head out of SQL mode and into no-sql mode. Can this be done (with all the scalability advantages of dynamodb):

Have a load of entries (say 5 - 10 million) indexed by some number. One of the fields in each entry will be a creation date. Is there an effective way for dynamo db to give my web app all the entries created between two dates?

A more simple question - can dynamo db give me all entries in which a field matches a certain number. That is, there'll be another field that is a number, for argument's sake lets say between 0 and 10. Can I ask dynamodb to give me all the entries which have value e.g. 6?

Do both of these queries need a scan of the entire dataset (which I assume is a problem given the dataset size?)

1 Answer
1

Is there an effective way for dynamo db to give my web app all the
entries created between two dates?

Yup, please have a look at the of the Primary Key concept within Amazon DynamoDB Data Model, specifically the Hash and Range Type Primary Key:

In this case, the primary key is made of two attributes. The first
attributes is the hash attribute and the second one is the range
attribute. Amazon DynamoDB builds an unordered hash index on the hash
primary key attribute and a sorted range index on the range primary
key attribute. [...]

can dynamo db give me all entries in which a field matches a certain
number. [...] Can I ask dynamodb to give
me all the entries which have value e.g. 6?

This is possible as well, albeit by means of the Scan API only (i.e. requires to read every item in the table indeed), see ScanFilter for details and Scanning Tables in Amazon DynamoDB for respective examples.

Do both of these queries need a scan of the entire dataset (which I
assume is a problem given the dataset size?)

As mentioned the first approach works with a Query while the second requires a Scan, and Generally, a query operation is more efficient than a scan operation - this is a good advise to get started, though the details are more complex and depend on your use case, see section Scan and Query Performance within the Query and Scan in Amazon DynamoDB overview:

For quicker response times, design your tables in a way that can use
the Query, Get, or BatchGetItem APIs, instead. Or, design your
application to use scan operations in a way that minimizes the impact
on your table's request rate. For more information, see Provisioned Throughput Guidelines in Amazon DynamoDB.

So, as usual when applying NoSQL solutions, you might need to adjust your architecture to accommodate these constraints.

Thanks - so essentially you can pick two values for efficient querying via the primary key. If you need to query on more than two values you must scan through all the entries?
–
StuartFeb 6 '12 at 20:44

1

Is another option to use the dynamodb supported query on the primary keys to get a dataset into memory in your webapp, then use (in our case python) to filter through this data. This would obviously run through a lot of memory on the web app, but wouldn't require multiple passes over the data in dynamodb (I'm also thinking of cost here).
–
StuartFeb 7 '12 at 8:23

4

Worth adding that you can now create secondary indexes in DynamoDB.
–
jarmodMay 10 '13 at 0:53

2

@jarmod - Local Secondary Indexes are a highly welcome addition to DynamoDB's query model and always worth considering going forward indeed, however, please note that local is a crucial limitation: A local secondary index is a data structure that maintains an alternate range key for a given hash key - while this covers many real world scenarios, it doesn't apply to arbitrary non primary key field queries like those of the question at hand.
–
Steffen OpelMay 10 '13 at 7:24