Friday, January 20, 2012

NodeUp 11 Commentary

Yesterday, I listened to the nodeup podcast (http://nodeup.com/eleven) about databases and node.js. It was pretty epic. I wanted to write up my thoughts and summarize it as best as I could. There are some great things going on in the node.js and database worlds and I have my opinions about them too. Here we go...

1) Postgres

People like Postgres if they need relational database. This is because the community is really cool and they add lots of features (every feature!) There's even a JSON-store feature Postgres is working on. Yammer is the only known (big) company to be using Postgres with node in production, but the node-postgres library is supposedly pretty decent.

2) Database people hate MongoDB, developers love it

Supposedly, 10Gen is the hermit of the database world. They don't hang out with other database people, but what they do is listen to customers/end-users more than database engineers, so it's very developer friendly, but database people think it's technically inferior (??). People say "Mongo loses data", presumably because journaling used to be off by default, but Mongoose has become the de-factor ORM of simple node websites and MongoDB with node is obviously really popular. For the record I LOVE MONGODB.

3) Redis

Redis has a lot in common with node. It's single threaded and fast, but it doesn't scale horizontally very well. It's also really really fast and has really good node support. There actually aren't a lot of good use cases for Redis, because it doesn't scale up super well. Because it's in memory it's going to be faster than most other databases. DShaw wrote a redistore for socket.io, but it's unclear how well it can scale, pretty darn well though, apparently. It does have a pubsub feaure that can be subscribed to with a regex (?), which people think is really cool.

4) CouchDB

So the node community has been huge on CouchDB for a long time. Isaac S. uses it for NPM and a lot of people use it for other things. Pros: HTTP Interface super easy to use, map reduces are incremental and therefore super fast. Cons: It scales terribly. Imagine 1,000,000 users each with their own DB doing master->master replication. It will blow up. A ton of data or a huge number of users don't make a good fit for couchDB. That being said, if you have less users and less data (say in the thousands of things) and want to experiment with master->master replication for example from a mobile app to the web it could be really cool. Again, it can actually be really slow.

5) Riak

Riak does map reduces really really slowly. They were joking that they should be called "work orders" instead of jobs. Someone, maybe it was Voxer? is using them heavily in production and converted their super slow couchDB instance to Riak and they found that it worked way better. Riak is a huge key/value store than is highly available and easy to scale horizontally. It's pretty simple and doesn't have a lot of fancy features, but it works well if you're trying to keep it simple. It has an HTTP interface (like couchDB). If you're trying to be fancy and do a lot of complex map reduces your life will suck and it would be faster to use carrier pigeons.

6) The future

A lot of people are excited about building databases in Javascript. Isaac S. noted some technical limitations of JavaScript, but everyone else was arguing that there are limitations in any langauge and the ones JavaScript has (not good with large numbers, etc), may prove solvable or less important. It seems the idea is that building databases isn't that hard and people (with some helpful tools) will likely continue to build custom databases for their own needs. Awesome.

2 comments:

Your 4th opinion that "A ton of data or a huge number of users don't make a good fit for couchDB" seems to precisely contradict major arguments on why to use a NoSQL solution like CouchDB.See http://www.couchbase.com/why-nosql/nosql-database for more info.

Joe, I know it's like 6 months later, but I'm looking into various NoSQL DBs again and I realized there's a big difference between Couchbase Server and CouchDB, which I think your link was talking about.

It appears there was a split in the community. One side (Couchbase Server) wanted to make it easier to scale and lower-latency and the other side (CouchDB) wanted to continue to support a REST API and couchapps. Both are totally worthy goals and both can scale and be very fast obviously, but they fit diferent use cases.

Here is a good overview of the differences:http://www.couchbase.com/couchdb

CouchDB does scale, but scaling means different things to different people. There are trade-offs. From what I understand if you're using it to back an app server directly it's pretty straightforward, but when you want to have many distributed local databases that sync with a global server (or servers) AND have the ability to share information between users, etc the whole permissions of the thing can get very complicated, not to mention you start maintaining multiple databases per user, which is a bit messy. So yes it scales, but so apparently can the complexity.

It's all a very fascianting area though if you know more about it I'd love to hear more.