But how did that happen? There's an interesting story there. Let's tell that story and benchmark with Rails Ruby Bench, shall we? (Don't care about the story? Scroll down to the end for graphs of the speed differences.)

A Beginning and Some Dueling Banjos

It begins with Vladimir Makarov proposing open addressing for Ruby's hash tables and including a patch. Open addressing is a better match for modern multilevel CPU caches than Ruby's previous method. That was very nice of him. Thank you, Vladimir! (Here's his explanation of the hash table changes.)

Is that the end of the story? Not so much.

Koichi points out that his very first patch wasn't perfect, and increased memory usage in some cases (true.) Nobu and Yura Sokolov (funny_falcon) point out some other minor problems. Feedback happens, especially with a large patch, or one that touches very common functionality like hash tables.

Vladimir responded, more back-and-forth ensued, and funny_falcon continued to engage more and talk about how he'd have done it (he didn't think open addressing was necessary, for instance, and that he could get similar results without it.) Vladimir responded to him. There was a highly-technical argument, mostly good-natured, going strong. And eventually less good-natured. It's easy for tempers to run hot in technical discussions -- I do the same thing, and they clearly understood what was going on. Isn't it wonderful to watch engineers doing what they feel passionate about, showing that they care but also acknowledging that we all want the same thing? I love watching that.

If you have time, read through the whole thread. The back-and-forth is wonderful, and highly educational -- "you should use quadratic probing," "here's the wikipedia article for...," "I disagree that this should be int32," "test large inserts, does the time grow linearly?" It's not just a great deep dive into hash tables. It's a great study in passionate disagreement between highly skilled engineers.

It also involved Vladimir and Yura proposing and counter-proposing patches with different good and bad points, back and forth, and critiquing each other's code constantly. Who had the better hash table implementation?

Eventually Shyouhei and Koichi (prominent core Ruby committers) looked over the results and checked for errors. The patches continued to improve, and the edge cases kept getting fixed. Either Yura's or Vladimir's patch might win. Each had taken tricks from the other.

And it appears that Vladimir enjoyed his time modifying Ruby - he's recently put together a whole new Ruby VM, still in early development, that significantly improves overall speed! Unfortunately, it's not ready to run Rails yet so I can't measure it with Rails Ruby Bench. Soon, perhaps?

How To Measure?

I thought, "I'll check how much faster the patch is using my Discourse-based Ruby benchmark!" Koichi, like the built-in Discourse benchmarks, tends to microbenchmark by testing the same URLs many times, while my benchmark tries to simulate a realistic, varied, highly-concurrent workload.

It works, though! As before, this benchmark uses a multithreaded load-test program, a vigorously multiprocess and multithreaded Puma and Discourse, running flat-out on an EC2 m4.2xlarge dedicated instance.

I've doubled the number of HTTP requests per run from 1500 to 3000. With newer Ruby and Rails versions, the benchmark runs quite quickly, and some randomness and slowdowns that were "in the noise" are now big enough to see in my graphs. Running more requests is giving me more predictable results in return for a bit more CPU time.

How Fast?

Like my previous articles, I used a mixed grouping of Discourse HTTP requests. The per-request speedup is subtle enough to be hard to see:

Yes, all the right-hand bars are a little lower at every percentile. No, not much lower. so: new-style hash tables improve each request, but not by a huge amount.

I can see the difference in the full thread runtimes better. Perhaps you can too:

This is how long it takes for 100 consecutive requests. The per-request speedup adds up. These are the shortest, median and longest times to process 100 requests during a benchmark run. Left side are old-style closed addressing hash results, right are open-addressing.

The median request with closed addressing (old-style) hash tables takes 0.134 seconds, and the 90th percentile takes 0.371 seconds (see below.) With open addressing, that's 0.127 median and 0.355 for the 90th percentile. In both cases, that's about a 5% speedup -- not for just the hash operations, but for the entire Rails request time. That's not bad.

The median run with closed addressing hash tables takes 17.312 seconds, and the 90th percentile takes 17.95 seconds. With open addressing it's 16.577 median and 17.417 for the 90th percentile. That's also around 5% speedup, give or take.

Metric

Closed

Open

Median Req

0.134

0.127

90th Pct Req

0.371

0.355

Median Full Run

17.312

16.577

90th Pct Full Run

17.950

17.417

(As always, feel free to request my JSON data files, or to spin up an m4.2xlarge dedicated instance with the benchmark code and try for yourself.)

Conclusions

I started investigating this because I wanted to make sure my benchmark worked and made sense when checking out new Ruby optimizations. So I tried out the Ruby 2.4.0 hash table changes - a case where they really cared about the answer, "does this make a difference for Rails applications?" The short answer is "yes -- these hash table changes speed up a real Rails app by about 5% overall." Which is pretty serious!

The bug report and its story are, of course, a whole saga unto themselves.