Two different vendors recently tried to inflict benchmarks on me. Both were YCSBs, so I decided to look up what the YCSB (Yahoo! Cloud Serving Benchmark) actually is. It turns out that the YCSB:

Was developed by — you guessed it! — Yahoo.

Is meant to simulate workloads that fetch web pages, including the writing portions of those workloads.

Was developed with NoSQL data managers in mind.

Bakes in one kind of sensitivity analysis — latency vs. throughput.

Is implemented in extensible open source code.

That actually sounds pretty good, especially the extensibility part;* it’s likely that the YCSB can be useful in a variety of product selection scenarios. Still, as recent examples show, benchmark marketing is an annoying blight upon the database industry.

*With extensibility you can test your own workloads and do your own sensitivity analyses.

A YCSB overview page features links both to the code and to the original explanatory paper. The clearest explanation of the YCSB I found there was:

Each operation against the data store is randomly chosen to be one of:

Insert: Insert a new record.

Update: Update a record by replacing the value of one field.

Read: Read a record, either one randomly chosen field or all fields.

Scan: Scan records in order, starting at a randomly chosen record key. The number of records to scan is randomly chosen.

As was anyway obvious from the benchmark’s purpose, there’s nothing about joins, distributed transactions, or other hallmarks of OLTP (OnLine Transaction Processing).

NuoDB generated some mediocre YCSB results, then made a big fuss because NuoDB got those numbers while operating through SQL. Blech. I guess they proved that NuoDB’s SQL parsing/execution layer is better than the worst thing one can imagine an undergraduate writing as a homework project, but otherwise little substance was demonstrated.

AeroSpike’s YCSB story isn’t as bad. Aerospike seems to have used the benchmark pretty much the way it was intended, and produced numbers that look better than NuoDB’s. Still, a few vertical markets aside, why does it matter how far under 10 milliseconds latency can get?* Further, Aerospike managed a 60 GB database with 30 GB of RAM per server, which is an awkward fit with its “You don’t need to put everything in RAM because we’re so fast on flash memory” positioning.

*If you really do care about that, maybe your app shouldn’t be making so many round trips.

So once again, I stand by my position that benchmark marketing is an annoying waste of everybody’s time.

Comments

On the latency note, VoltDB prior to 3.0 used to get 5-10ms of latency per transaction in a typical cluster configuration. This became a sales problem for any customer unwilling or unable to embrace asynchronous client development. Many people evaluating VoltDB would ask us why it would only run 200 txns per second, not realizing how very idle the server was. For 3.0, we re-architected how transactions are ordered and processed and now latency is often well under 1ms. Old and new customers seem pleased. Now that doesn’t help much with the HFT crowd, as they often have a 100us or so budget for their whole stack.

On the benchmarketing side, I agree there’s a lot of slight of hand in these kinds of things, but prior to NuoDB’s launch, I was very interested to see what kind of benchmark they would choose to run on their system. Even if you ignore the actual numbers, choosing a tweaked-YCSB with 95% reads says something about what they think the system will and won’t be good at. How much they say about fault tolerance or durability is also interesting.

As for latency — interesting. I’ll confess that it never occurred to me that heavily multi-user applications would allow users to bottleneck each other of it could be avoided. That would seem like Concurrency 101.

A frustrating problem we had with 5ms latency was with evaluating users writing semi-trivial apps to test out VoltDB.

The evaluating user would write a synchronous app prototype with one server and a local client and get 5000tps. Then they would run the same client code against a full cluster and get 200tps. Explaining this is due to latency over and over grew tiresome. 200tps is actually worse than MySQL does at synchronous workloads.

If we could keep talking to them long enough to make their app asynchronous, they could see 6 and 7 figure TPS numbers pretty easily. With 3.0, these apps still run better when asynchronous, but the synchronous numbers are now as good or better than anybody else’s.

In addition to evaluators, we also see people building applications that are sensitive to latency without being HFT-sensivite. Digital ad-tech is a great example of this. When drawing an ad for the user, 5ms is a big chunk of your page-display budget; 1ms is not a big deal.

“If you really do care about that, maybe your app shouldn’t be making so many round trips.”

My colleague, Ryan Betts, pointed out that this is one of the ways VoltDB stored procedures are so valuable. Some of our digital ad-tech customers put a fair bit of logic in a stored procedure, significantly reducing round-trips to the app-server and thereby reducing total latency in pushing out an ad. Things like real-time bidding can be done entirely inside a single VoltDB procedure, with serializable consistency.

You mention that “Aerospike managed a 60 GB database with 30 GB of RAM per server,” but that’s not quite true. My company, Thumbtack Technology published this report, and the database sizing decisions were our own. You’re right that 30GB of RAM seems like a lot for 60GB of disk when testing Aerospike, but it’s also absolutely too small for running the same test on Couchbase. There are a lot of little compromises that need to be made when trying to create a useful baseline for meaningful comparison across databases, and this is one of them.

Of course, we do have a lot of information around how each of the databases perform in isolation using configurations that might fit better with their individual positioning. The numbers we published reflected our best attempts at an apples-to-apples comparison between databases. This is a judgment call in many ways, but we settled on configurations that yielded performance similar to what we were seeing in a more individually tuned scenarios.

I’m almost certain that all the vendors in our study have ideal configurations that would better serve their own marketing agendas. But I’d argue that the report’s results are truly useful in that they provide independent insight on to which kinds of databases do what kinds of things well. We have clients approaching us all the time for this advice, and they definitely do not have this knowledge already.

”
* If you really do care about [10 milliseconds latency], maybe your app shouldn’t be making so many round trips.
”

It’s funny how much I agree with this comment — yet, those crazy app developers are demanding more database power to create richer user experiences – that drives the thirst for better databases.

App developers have a point, and we should make them happy — even if, with a bit of thinking, they could make their application more efficient.

People _like_ little widgets with recent activity by their friends. They _like_ a personalized view of their “timeline”. Geography based “daily deals”. A live view of the “likes and dislikes”. How many times an item has been referenced in social media. _And_ snappy response times.

These cute little widgets require more and more database lookups, touching more data than the competition. At one company I observed, a simple page required 70 individual database fetches. Cache – memcache – helps, but these “crazy app developers” push that envelope. I’ve seen multiple companies with more than 4T of memcache – lots of servers, lots of power, care and feeding.

Brian, are you suggesting that those 70 lookups have to be made in series? If so, why?

Srini V. Srinivasan on
January 18th, 2013 11:25 pm

“But why is 5 ms a long time in digital advertising? Yes, response time is critical — but even 50-100 ms total on a page shouldn’t be that terribly big of a deal … ”

Curt, this is a very good question. With real-time bidding (RTB) taking over a large part of the display advertising space, there is usually a 50ms budget to run the real-time auction to decide a winning display ad.

In this 50 ms, the bidders have to look up information about the user (things like what actions this user has done in the past few seconds can even be used to determine bid values, for example). Using the user specific data from the database, these RTB algorithms typically perform some kind of intelligent determination to decide whether to bid or not and how much to bid. If the bidding process takes too long, the bid is lost anyway.

The user profile database lookup is only one part of the bidding calculation and the budget is usually for a few of these (some can be done in parallel, some cannot). 20ms is definitely too long. 10ms may be sufficient in some cases but 5ms is a comfortable number for meaningful bid algorithms to be run on real-time data.

Note that 5ms is not the average latency – usually 99.9% requests need to be less than 5ms to guarantee bid times of under 50ms.

What makes the 5ms latency a demanding task is the need for the database to deal with a high rate of updats and get the latest data on the read. Making the wrong decision on incorrect data could be throwing money down the drain. Not being able to bid at all means no money is being made.

After the winning bid happens, the system probably has 100ms or so to deliver the ads themselves and render these on the browser. Otherwise, the user will be gone by then.

Just multiply this process by the billions and billions of impressions by internet users every day and you get the picture as to the magnitude of this problem.

I should look into this real-time-bidding stuff in more detail. It’s a long-time interest of mine. Indeed, one of the closest things I have to an academic publication was a talk at a conference on auctions and bidding.

Curt,
I am playing around with nuodb because their sales force has been visiting… so not a nuodb defender by any means. I am curious about your description of their benchmark (YCSB) as mediocre. One million TPS sounds impressive on the face of it, although I am skeptical since it sounds like they might have just tested their cache over 24 hosts, basically. Not quite enough detail on their brochure. Could you elaborate at all?
Thanks
Allen

It’s been long enough that I’ve happily forgotten the details, but IIRC the TPS/node wasn’t that high.

joe on
September 21st, 2014 7:00 am

Any benchmark comparing MonetDB, Hyperdex, Aerospike, DynamoDB, Voldermort, VoltDB or ExtremeDB?
All claim to be faster than Cassandra or Redis but I would like to see some comparison against each other.