Now that I’ve addressed some new NewSQL entrants, namely NuoDB and GenieDB, it’s time to circle back to some more established ones. First up are my clients at Tokutek, about whom I recently wrote:

Tokutek turns a performance argument into a functionality one. In particular, Tokutek claims that TokuDB does a much better job than alternatives of making it practical for you to update indexes at OLTP speeds. Hence, it claims to do a much better job than alternatives of making it practical for you to write and execute queries that only make sense when indexes (or other analytic performance boosts) are in place.

That’s all been true since I first wrote about Tokutek and TokuDB in 2009. However, TokuDB’s technical details have changed. In particular, Tokutek has deemphasized the ideas that:

Vaguely justified the “fractal” metaphor, namely …

… the stuff in that post about having one block each sized for each power of 2, …

… which seem to be a form of what is more ordinarily called “cache-oblivious” technology.

Rather, Tokutek’s new focus for getting the same benefits is to provide a separate buffer for each node of a b-tree. In essence, Tokutek is taking the usual “big blocks are better” story and extending it to indexes. TokuDB also uses block-level compression. Notes on that include:

It’s LZMA.

It’s expensive to write, cheap to read.

5X compression is common, 9X happens, and higher figures yet happen in a few edge cases.

LZMA detects and compresses repeated values, so it has some of the benefits of tokenization.

However, TokuDB has to decompress data before operating on it.

Somewhat like NuoDB, Tokutek talks in terms of sending messages to blocks. The TokuDB durability story involves streaming messages to disk and also checkpointing all dirty blocks to disk every minute or so. Further, TokuDB has an online schema change approach based on broadcasting messages about various column operations (delete, add w/ default value, etc.)

Beyond that:

Like most other RDBMS vendors I talk with, Tokutek goes for MVCC (Multi-Version Concurrency Control), if for no other reason than to obviate a need for read locks.

TokuDB doesn’t have much in the way of a scale-out story. But as for any other NewSQL vendor of whom that’s true — e.g. Akiban — expect that to change. And even if it doesn’t, one could use TokuDB in conjunction with a transparent sharding tool such as dbShards.

For more technical detail, Tokutek offers a web page with several detailed slide decks and so on.

And finally, Tokutek company basics include:

15-16 employees.

A few more paying customers than those logoed on its website.

Free customers beyond that. (TokuDB is free under 50 GB.)

Notwithstanding the meaningless of the phrase, “Fractal Tree indexing” is Tokutek’s story and it’s sticking to it.

Comments

It’s also worth noting that TokuDB has horrendous bugs in their mutex code. On a server handling 1,100 clients simultaneously (each doing small batches up upserts), we were seeing MySQL fall over every 90 seconds or so — always in the TokuDB mutex code.

Even with only two connections (one doing reads from one table and upserts into another table, the other loading a mysqldump), we’ve seen the mutex code kill mysqld.

As promising as their performance characteristics are, they’re just way too unstable to rely on right now.

I’m sorry that you seem to have hit a bug that affects your ability to use thousands of concurrent connections. Thanks for sharing your feedback so we can resolve any open issues.

In terms of our ability to handle large thread count, we run TPC-C and Sysbench with up to 1024 connections as part of our ongoing development process and run some tests with 2048 and have not found this particular problem. Our published Sysbench benchmark goes to 1024 and the results can be found on our benchmarks page (http://www.tokutek.com/resources/benchmark-results/benchmarks-vs-innodb-hdds/). Towards the end of the page you’ll find the command line we used which would allow anyone to reproduce the tests in their own environments.

If you could provide us with a reproducible case that we can use to debug our software, we’ll get to work on it.

In regards to the pricing issue, the shift from capacity based pricing to server based pricing was a business decision we took to better align ourselves with pricing in the MySQL ecosystem. TokuDB continues to be free for development and so far the reaction to the new server based pricing has been great.