Historical data: LevelDB versus MySQL

So, I’ve got me a Cubietruck with an 60GB SSD attached to it. On the SSD there’s a LevelDB database with historical data for 121 device values. The total number of keys stored in this database is about 1,050,000. It takes about 8 minutes to fill this database from scratch with a MS SQL Server table as ‘source’.

That means that I can put (write) more than 2,000 entries to the database per second. That will do. Just kidding – that’s more than I’ll ever need of course.. but what about the get (read) performance? Being able to read 2,000 values per second from a database is not that much, so I hope the read performance is even better. A small test showed me that LevelDB could produce about 4500 values per second. Would it be possible to improve this? A lot, please?

So I decided to do some testing, so that I won’t end up with a user interface with charts that take ages to load. Therefore I wrote a small script that would query the database to retrieve everything for all the device values in the database. The result set, retrieved with the createReadStream() function of LevelUP, can have a size in the range of 4 to ~50,000 values. By counting the number of returned values and the time that was needed to get those (and push them to an array for further processing), I could get an indication of how fast LevelDB really is, based on a real database (not just a bunch of “123456abcdef” keys…) and based on result set ranges from ‘very small’ to ‘very large’. As in real life 😉

And why not do the same test with MySQL and see how that one performs as well? With a MySQL installation ‘out of the box’ (so no performance-enhancing tricks) I created a database with a single table in it. It contains the same data as the LevelDB version: device value ID, timestamp and value (and some more). And an index of course:

Filling this table with the same data as the LevelDB database took a bit longer: 56 minutes vs. 04:19 for LevelDB.. I hope this isn’t indicative for the read performance 😉

Well, here are the numbers:

The numbers presented show the average number of rows per second each database can deliver in the specified size range of the result. An example for LevelDB: 100 createReadStream() calls resulted in a total of 62606 values in 13122 milliseconds, i.e. an average of 4771 values/second. MySQL: the same number of query() calls resulted in a total of 62606 values in 5996 milliseconds, i.e. an average of 10141 values/second.

The size ranges with which I tested may seem a bit strange, but that’s because I just didn’t have any historical data for a single device between 5k…10k – the gap between historical data that’s being stored temporarily and the historical data which is stored forever.

The numbers show a clear winner: MySQL. I guess that using a ‘traditional’ RDBMS for my historical data is not such a bad idea after all…

This is not what I expected actually. All those hyper-new database engines must have some benefits – cause what other reason would make them so popular lately? Well, I didn’t find them… at least not with my hardware, my historical data characteristics and my priorities … and since every second counts, I’m going for MySQL!

6 Responses to Historical data: LevelDB versus MySQL

This is a not a fair comparison. Leveldb is a in-process db which in case of node.ja this means utilizing 1 core. Mysql runs as a separate multithreaded process utilizing all cores available. A better comparison would be Leveldb vs Sqlite or Mongodb vs Mysql. The reason these noSql database are that popular is because they do not require a predefined data model and their eventually consistent characteriatics which enablesthe creation of distributed databases without distance constraints.

In my case it is. This is not a contest, it’s about choosing a tool. 😉

What’s wrong with comparing 2 different ways to store my historical data, look at the results and pick the one that fits with my priorities best? I should not do that at all? Right. Despite the differences between the 2, for me the goal is the same one: storing historical data. Nothing more, nothing less.

Comparing tools is never unfair, especially from ‘the user perspective’. Just like it’s fair to compare a hammer with a chainsaw when you’re trying to find the best tool to destroy something. I (“the user”) don’t care how they work, I just need the best tool for the job.

You are absolutely right in your situation, however I was referring to your rather blunt statement “All those hyper-new database engines must have some benefits – cause what other reason would make them so popular lately? Well, I didn’t find them… “.

The reason for that is that you’ve totally missed the point of my post. Maybe, because I don’t think you did, you should take the time to really read it.

I don’t care whether it’s in-process or not, cause I’ll use any DB I want to. Even dBase III, if that would perform much better than the rest… 😉 But for some reason I chose the solution that works (surprise, surprise) best in my situation. There is no such thing as OSFA in this, because the choices one can make are there for a reason. I used LevelDB as-is, just like MySQL. BTW, I don’t have 12 CPUs @ 2.67 GHz and I sure don’t have 144 GB of RAM. Talking about comparing? I don’t know why I should have a look at those numbers presented where you linked to, because they don’t tell me anything about my own situation.

This is Home Automation related, I guess you missed that one too. Please, take the time to place things in perspective before you start judging and posting irrelevant benchmarks. All I need is the fastest DB engine for my setup. And if you’re trying to convince me that LevelDB should be the one I should choose based on the results, well, in that case you’re just ignoring the facts I presented. And I can’t think of any reason why I (or you) should ignore those facts. Cause you don’t question those facts, or do you?