There are so many ways to have a play with CouchDB. This time I thought about using CouchDB as a TileCache storage. Sounds easy, so it was.

What is a tilecache

Everyone knows Google Maps and its small images, called tiles. Rendering those tiles for the whole world for every zoom level can be quite time consuming, therefore you can render them on demand and cache them once they are rendered. This is the business of a tilecache.

You can use the tilecache as a proxy to a remote tile server as well, that’s what I did for this benchmark.

Coding

The implementation looks quite similar to the memcache one. I haven’t implemented locking as I was just after something working, not a full-fledged backend.

When I finished coding, it was time to find out how it performs. That should be easy, as there’s a tilecache_seeding script bundled with TileCache to fill the cache. So you fill the cache, then you switch the remote server off and test how long it takes if all requests are hits without any fails (i.e. all tiles are in your cache and don’t need to be requested from a remote server).

The two contestants for the benchmark are the CouchDB backend and the one that stores the tiles directly on the filesystem.

Everyone loves numbers

We keep it simple and measure the time for seeding with time. How long will it take to request 780 tiles? The first number is the average (in seconds), the one in brackets the standard deviation.

Filesystem:

real 0.35 (0.04)
user 0.16 (0.02)
sys 0.05 (0.01)

CouchDB:

real 3.03 (0.18)
user 0.96 (0.05)
sys 0.21 (0.03)

Let’s say CouchDB is 10 times slower that the file system based cache. Wow, CouchDB really sucks! Why would you use it as tile storage? Although you could:

easily store metadata with every tile, like a date when it should expire.

keep a history of tiles and show them as « travel through time layers » in your mapping application

easy replication to other servers

You just don’t want such a slow hog. And those CouchDB people try to tell me that CouchDB would be fast. Pha!

Really??

You might already wonder, where the details are, the software version numbers, the specification of the system and all that stuff? These things are missing with a good reason. This benchmark just isn’t right, even if I would add these details. The problem lies some layers deeper.

This benchmark is way to far away from a real-life usage. You would request much more tiles and not the same 780 ones with every run. When I was benchmarking the filesystem cache, all tiles were already in the system’s cache, therefore it was that fast.

Simple solution: clear the system cache and run the tests again. Here are the results after as echo 3 > /proc/sys/vm/drop_caches

Filesystem:

real 8.36 (0.71)
user 0.29 (0.04)
sys 0.18 (0.03)

CouchDB:

real 6.64 (0.15)
user 1.13 (0.07)
sys 0.29 (0.06)

Wow, the CouchDB cache is faster than the filesystem cache. Too nice to be true. The reason is easy: loading the CouchDB database file, thus one file access on the disk, is way faster that 780 accesses.

Does it really matter?

Let’s take the first benchmark, if CouchDB would be that much slower, but isn’t it perhaps fast enough? Even with those measures (ten times slower than the filesystem cache) it would mean your cache can take 250 requests per second. Let’s say a user requests 9 tiles per second it would be about 25 users at the same time. With every user staying 2 minutes on the map it would mean 18 000 users per day. Not bad.

Additionally you gain some nice things you won’t have with other caches (as outlined above). And if you really need more performance you could always dump the tiles to the filesystem with a cron job.