Thoughts on Perl and Emacs, technology and writing

That Syncing Feeling – Safety Is Expensive

Recently I’ve been thinking about storing records on disk quickly. For my general use case, an RDBMS isn’t quite fast enough. My first thought, probably like many a NoSQL person before me, is how fast can I go if I give up ACID?

Even if I’m intending to do the final implementation in C++, I’ll often experiment with Perl first. It’s often possible to use the C libraries anyway.

First up, how about serialising records using Storable to an on-disk hash table like Berkeley DB.

(Aside: I’m probably going to appear obsessed with benchmarking now, but really I’m just sticking a finger in the air to get an idea about how various approaches perform. I can estimate 90cm given a metre stick. I don’t need a more precise way to do a rough estimate.)

The Results

Conclusion

Unsurprisingly syncing is expensive – it adds 400% overhead. However, even with the sync, we’re still able to store 5.5 million records an hour. Is that fast enough for me? (I need some level of reliability) It might well be.

Berkeley DB is fast. It only adds 170% overhead to the serialisation itself. I’m impressed.

In case anyone is interested. I ran a more comprehensive set of benchmarks.

Share this:

Like this:

Related

4 Responses

* if you don’t need the full power of Storable, try JSON::XS: on my benchmarks is faster than Storable, and the end result is user readable and reusable from other languages;
* BerkleyDB is not the fastest disk DB in town. I would also experiment with Tokyo Cabinet (http://fallabs.com/tokyocabinet/);
* Also interesting to try would be Redis. Given that it would run on a separate process, with pipeline commands you should be able to get decent performance.

I haven’t done the like-for-like RDBMS comparison. However, I do have an RDBMS based system I looking to replace that can handle a peak of about 120 updates/sec on significantly superior hardware than my laptop. It’s also doing a lot more work than is demonstrated in the benchmark. If I added the rest of the updates and required indices, I suspect I’d probably be able to get 500 updates/sec with Berkeley DB and syncing after every request. That probably isn’t quite fast enough for my needs.