CouchDb at Webtuesday Zurich

Wednesday, June 13. 2007

From the city centre of Zurich to the major connecting points for travelling (main station, airport, motorways) it takes you an average of 7 minutes to get there. There’s no other city in Europe that as fast to get out of. People argue wether this is a good or bad thing.

The Webtuesday people I met here in Zurich were coming from a couple of independent web-development shops and they are working on some of the top 20 Swiss websites. From the feedback, the talk was well received; overall, they found it interesting to learn about a new technology they could actually use. I’ll be putting up the slides for the talk soon.

What I found most interesting were the discussions after the talk, when a couple of guys spend some time trying to apply CouchDb to the problems they currently face. The first bit of interest is that they could take CouchDb right now and create a working system; even with CouchDb not yet recommended for productive use. The other thing I find remarkable is that, because the problems they face are very different, all had a different idea of tweaking CouchDb to be an even better fit or make it work for other, bigger problems as well.

Harry faces a lot of data. The way to scale a lot of data is to partition it smartly across multiple machines. He want’s CouchDb to that and he’s lucky, while the actual implementation is not done yet, CouchDb is prepared to do exactly that transparently for you. Harry then want CouchDb’s replication feature to be smart about partitions (instead of only databases) to allow him to replicate only certain partitions to specified machines.

Tristan takes it a bit further and want to be able to rank data on the document-level according to importance. If you then create a replicated cluster of CouchDb machines, the documents which are most important are replicated through the entire cluster, less important data is only replicated two or three times and not at all important data isn’t replicated at all. I can’t tell if that’s easy to implement, but it certainly sounds doable. A resulting system would act very much alike Google’s GFS. There’s another thing Tristan came up with, but I’ll save that for a later post.

Toby and Harry are also in need for a reliable store for binary data with some metadata. CouchDb is a natural fit because of it’s attachment support for documents. What they could use would be a Mod Rewrite-like system to allow mapping pretty URLs to CouchDb resource URLs in CouchDb, effectively creating user-friendly aliases. At some point Damien and I were looking into using yaws instead of the built-in HTTP server and I hope (not having a net connection right now) that they have something like that in place already. The thing they would actually need is that CouchDb should have to be smart about storing attachments. Again, there’s lots of data and the possibility of duplicates is significant. Actually storing multiple copies of data in a single machine and then replicating that to other machines is not a good idea. They want CouchDb to detect automatically, if a an attachment is already in a database and then only store a reference to that when more versions come in. With a small middleware script this is can be done easily, but they want to deploy CouchDb directly on the web, distributing binary files without any middleware-layer that makes things complicated.

Urban Dominic Müller, of brainfuck fame, is specifically interested in how the lookup index of views work, optimally being able to tweak things there.

And one last thing, when somebody asked how to implement something and the solution wasn’t obvious, they tried to apply a pattern they use in a traditional database which is not directly portable to CouchDb. This is intended! But there’s the need for a cookbook-like resource that tells you how to use CouchDb when you are stuck. I hope to get that into the wiki.

This is all very interesting stuff and really needed to bring CouchDb forward. If you have something to say about how to improve CouchDb to make it a better fit for your problem, please do let us know. We can’t promise your pet-feature gets in, but we’re actively looking for any input we can get. Thanks in advance!

Also, thanks a lot to Christian, Harry and Martin for organizing this (and the beer). Specifically thanks to Liip for sponsoring my train ride and thanks to namics for proving a place to meet and finally thanks to Silvan for providing dinner and bed. The trip was well worth it!

Oh, and this talk was the first time I publicly talked about The Couch Book. I’m working on an eBook for all things CouchDb and I hope it to be ready when CouchDb becomes stable.

E-Mail addresses will not be displayed and will only be used for E-Mail notifications

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.Enter the string from the spam-prevention image above: