Tuesday, October 27, 2009

RDS: The End of SimpleDB?

The recent announcement of Amazon's Relational Database Service is generating a lot of buzz. And well it should. For people who require a relational database for their applications and have been rolling their own with EC2 and EBS, it offers a really nice option. Let AWS manage that database for you and focus more attention on your app. It also represents another inevitable step up the ladder from IaaS to PaaS for AWS and gives pretty good triangulation data about where cloud computing will be in a few years.

But does RDS also mean the end of SimpleDB? There have already been posts on the SimpleDB forum to that affect. I think the answer is "no" but it does illustrate what I think has been a misstep in the evolution of SimpleDB.

Let me start by saying that I love SimpleDB. I use it all the time. I have built a number of real applications and services with it and in my experience it "just works". I know there are some applications that just require a full-blown relational database but in my experience I've been able to do everything I need to do with SimpleDB. And I absolutely love the fact that it's just there as a service, doing whatever it needs to do to scale along with my app.

But it seems like SimpleDB has always been a bit of a red-headed stepchild at AWS. They haven't had a clear, consistent strategy for it. When people compared it to a relational database, rather than following the NoSQL philosophy they tried to make SimpleDB look more like a relational database. They deprecated it's elegant set-based query language with a SQL subset in hopes of attracting the relational crowd. But I think mainly what happens is that people focus on the "subset" aspect and are always pining for yet more SQL compatibility. I just don't think it won them many converts.

So, does RDS represent the end of SimpleDB? I really don't think so. The two offerings are very, very different. AWS needs to embrace that difference and communicate it more clearly. There are a lot of applications out there that can benefit from the lightweight, super-scalable, and easy to use qualities of SimpleDB. MySQL simply can't compete on those dimensions. I'm pretty sure AWS agrees but it would be nice to see some positive reinforcement from them soon, before their user base get's scared. I don't think that building RDS on the back of SimpleDB is what AWS had in mind.

It's interesting since RDS and SimpleDB are completely different things. As you said, the only reason to really use RDS would be to easily get, say, a Django installation up and running, or if I absolutely *need* the relational model.

But you know what - that doesn't really happen that often - at least in my line of work. There's been an explosion of highly scalable key-value stores out there - Cassandra, Project Voldemort, Ringo, Scalaris, CouchDB etc. etc. etc.

For a lot of custom applications I've often thought to myself "man, all I *really* need is BerkeleyDB for this" - key-value stores definitely have their place. I'm using SimpleDB to store account information for a huge userbase - so clearly there is something there.

That being said, it's nice knowing you *can* set up an RDBS on aws just as easily as spawning instances. RDS has its place as well.

I agree with you, Reza. But with all the hub bub around those highly visible key-value stores out there, AWS really hasn't made much noise with SimpleDB. They really seem to be downplaying that aspect and trying to make it look more like a SQL database. Maybe RDS will allow them to focus on being what they really are.

SDB never really attracted the "enterprise" crew though, and RDB is doing that, which seems to be what amazon is trying for these days. I wish SDB did provide the ability to do the backup to a point in time that RDB does though.

Amazon needed a database-like offering for the sake of completeness. Something like a key-value store was easier for them to develop into a hugely scalable product - hence it could be brought to market faster.

While it works great for new apps at startups with savvy developers, such a paradigm shift requires significant rewrite of the applications. Which I believe was something enterprises were not willing to do just yet, possibly in part due to economy and having to cut costs.

I don't think SimpleDB will go away, but how much effort Amazon will dedicate to it from now on - I don't know.

I personally think it's very likely that its featureset will become frozen for some time, which should not impact its stability.

Each domain is limited to 10GB. That's true. By default you have a max of 100 domains/account so that a total of 1TB of data. For my applications, the size of an item in SimpleDB is quite small, around 500bytes on average so I can get around 20 million of those in a single domain and about 2 billion across my 100 domains. That's pretty good scale, I think. Trying to store 2 billion rows in MySQL would be non-trivial.

I think that Amazon's main problem with simpledb is that they provided NO tools for looking at the data in your db. That is terrible when you are asking people to use a database that is very different from the one they have been using for years. I could never get a grasp on what my data was! Also, the libraries for db access, at least for PHP were terrible, and overly complicated. I couldn't even find examples for using the PHP library! In short, that is a bit much to ask of people. There is no doubt in my mind that RDB will be the future at AWS. simpledb may be a great concept, but amazon did nothing to help people see that. and as for the argument that the only reason you need RDB is consistency (which btw simpledb is now achieving, ironically), what is wrong with that? how about math and autoincrement? how about types? BUT the question is not "why use RDB" if you don't need transactions and consistency -- the question is why use simpledb? just my .02.