NoSQL Berlin Meetup Notes

Posted by phillip
Sat, 31 Oct 2009 19:39:00 GMT

“The world is diverse. Act accordingly.”
—Prof. Dr. Stefan Edlich, in his talk on object databases.

Where do you store your data? In a relational database, of course. It’s so convenient to use the persistence store we are used to, which has been there for us since the day we started programming. But in the spirit of using the right tool for the job – and making our lives easier – it pays off to know other persistent storages—those which aren’t based on the RDBMS/SQL paradigma. They promise to be better suited for some of the problems we face day to day; mapping the real world to a persistent storage, scaling, and reliability being among them.

The NoSQL meetup in Berlin gave a great overview of this active and growing scene, and shed some light on the characteristics of the main tools. Here are some rough notes from the meetup. For the full monty, all video and slides of the talks are available at the NoSQL Berlin website. Thanks guys for the perfect organization!

Consistency in Key-Value Stores (Monika Moser)

The only talk which wasn’t about a specific database. It gave an introduction into the problems and solutions that we face when working with many database servers (nodes). Since the written data has to be distributed across many physical machines, there will be a noticable delay until every node has received the updated data—the replication lag. Only after the replication lag, all nodes will contain the same (=consistent) data.

Two types of consistency were distinguished: Strong consistency (updated data is immediately available to all processes in the system) and eventual consistency (at some point in time all processes will get the update).

Strong consistency is usually expensive to implement on larger systems and isn’t always necessary, so eventual consistency is often acceptable. Depending on the use case, one can go for one of these subtypes of eventual consistency:

“read your writes” consistency

The process that wrote the data will always get the latest data. Other processes may still get old data for some time.

session consistency

A special case of the above: only the session that wrote the data is guaranteed to get the latest data immediately.

monotonic read consistency

after one process has read the new data, all following reads get it. So once the new data is in the system, the old data doesn’t appear again.

Monika went on to describe the CAP theorem (choose 2 of 3 for your storage setup: partition tolerance, availability, consistency), the reasons strong consistency is expensive, and the Paxos algorithm (good trade off between fault tolerance and consistency). See the slides and video for details!

My personal summary of the talk: good overview with lots of pointers to further info. And I’ll care about the details I didn’t grasp when I first need them.

Redis, Fast and Furious (Mathias Meyer)

Redis is awesome, I heard someone say.

Oh … Redis is also like memcached, but with extra features: persistence, additional commands (increment values, sets, push/pop, sorting, a text-based simple protocol). It is also slower than memcache, but not so much you would care.

According to Mathias, Redis is put to good use when storing statistical data (as long as it fits in memory!) and implementing worker queues.

Peer-to-peer Applications with CouchDB (Jan Lehnardt)

Jan contradicted himself on the first slide. It read: “Relax.”. Then he started a 10_000 WPM (words per minute) presentation, that still managed to raise my interest in CouchDB again. The presentation was about the “what can it do” instead of “how to do it”. Good choice to go this way.

a nice explanation “CouchDB is built “of the Web””—REST, JSON and HTTP are core technologies of the database.

Learning curve: store full documents, not relations (JSON). No data normalization into tables => make developers happy, not computers.

meant to be robust: append-only design for the database file. on crash, old data is not damaged.

scales out (horizontally). Does Master-Master replication. No scaling built in, but prepared (use couchdb-lounge). Then a scaled CouchDB cluster looks like a single DB from the outside.

scales down (runs on small devices). Own your own data, take it with you on your device.

incremental map-reduce: after updates, only the affected documents get reindexed

as with any document-oriented database, store full documents as JSON, not relations. Good tip in the Q&A: a document is something that will be updated and used as a whole. “put stuff into seperate documents when it is updated seperately”. There’s no clear guideline however, it depends on the use case.

RESTful HTTP: “text-based protocol is not slower than binary” / “all HTTP infrastructure and tools can be used”

BBC uses CouchDB in production, after a survey/comparison of storage solutions.

no convincing answer why OO databases haven’t entered mainstream, while OO programming has—it sounds like such a good idea. My impression was they are a great tool for specific uses (high performance, huge scale), but exotic and commercial solutions with high up-front investment.

Somehow I wonder if document-oriented databases will make it, when object-oriented DBs haven’t …

A talk not held: Neo4j – The Benefits of Graph Databases

There was no talk on GraphDBs, which are designed to store nodes and the relationships between them. As in social networks (nodes = people, relationships = connections/friendships). Slideshare to the rescue, it has Neo4j – The Benefits of Graph Databases. There also was a talk on Neo4j at NoSQLEast.