Ruby 1.9 Internals: Ordered Hash

With the first stable release of Ruby 1.9 out the door (1.9.1) it is time to start thinking about migrating your code to the newer runtime. Performance is much improved, and plenty of new features are waiting to be taken advantage of. One of these is the new ordered Hash which unlike Ruby 1.8.x preserves the insertion order:

Of course, the question you're asking is, how does it do that? What about performance? Let's take a look inside.

The C behind the Ruby

For a good, albeit incomplete and now somewhat outdated primer on Ruby internals, make sure to read through the Ruby Hacking Guide (I hope we'll get a full English version for Ruby 1.9). The source code reveals that the data layer of a Ruby Hash is, in fact encapsulated in a st_table struct, which in turn, links to a collection of pointers to st_table_entry objects which are the actual values. Add a hashing function on top and you've got yourself a Ruby 1.8 Hash.

However, this structure also has no notion of order, which means that a separate structure has to be created if the property is needed in your application - duplicating memory, and adding overhead overall. For that reason, if you've ever used the FacetsDictionary class, you would have noticed that it simply creates an adjacent array for the keys (every insert pushes the key on top of the stack). It's dirty, but it works. What about Ruby 1.9?

It turns out in Ruby 1.9, a Hash is also a doubly-circular linked list! Every st_table_entry now contains two pointers for moving forwards and backwards in the list. Hence, aside from the pointer manipulation, no other overhead is incurred:

Hash performance in 1.8 and 1.9

Ruby mailing list archive is full of heated discussions on whether a Ruby hash should be an ordered hash: simplicity vs performance vs least surprise. And strictly speaking, the new Ruby Hash is slower, because deletion and insertion requires more pointer manipulation. However, value fetch has not changed, and traversal is now much faster (since we can just follow the pointers):

# Ruby 1.8.7 user system total realHashinsert0.3500.6100.960(0.990365)Hashaccess0.7700.0100.780(0.801897)Dictinsert1.0001.2402.240(2.333807)Dictaccess1.1000.0201.120(1.304059)RBTreeinsert4.8201.0005.820(6.293545)RBTreeaccess5.1800.1105.290(6.060176)# Ruby 1.9.1 user system total realHashinsert0.6500.1500.800(0.828039)Hashaccess0.6500.0000.650(0.683463)

The performance numbers are slightly obscured by the fact that the Ruby 1.9 interpreter is faster to begin with, so we can't attribute the performance gains to Hash implementation alone, but I think it highlights the fact that the right decision was made: more powerful Hash in 1.9, faster.

Red-Black Trees

One interesting alternative to consider is a Red-black tree, which is a data structure commonly used to maintain associative arrays with a nice property of O(log n) access time for all operations. Unlike a simple binary tree, a red-black tree introduces additional constraints which force it to be a balanced tree after every operation - meaning, there are no worst case scenarios as you get in a binary tree. It is a nontrivial structure, but thankfully it's also available as a Ruby 1.8 gem: rbtree.

High-Performance Browser Networking (O'Reilly)

What every web developer must know about networking and browser performance: impact of latency and bandwidth, TCP, UDP, and TLS optimization, performance tips for mobile networks, and an under the hood look at performance of HTTP 1.1/2.0, XMLHttpRequest, WebSocket, WebRTC, DataChannel, and other transports.