New Ruby programmers mistakenly believe that hashes should be used everywhere for everything. They grow attached to hashes and use them in many places they shouldn’t; creating and passing hashes when a proper Plain Old Ruby Object would be much better. Eventually, they begin to wish hashes behaved more like objects and this is a horrible idea, as we will see in a short while.

I love hashes and I love objects. You can store values in hashes and store logic in your objects. To understand why we need to do some digging. Hashes in Ruby are dumb data structures. If you mistype a key, there are no warnings or errors. There’s no easy way to do custom setters or getters on a hash. Hashes are fast, and they’re flexible. Hashes work fine for passing data, but work poorly for storing it in a controlled and structured manner. For the uninitiated, here’s what I’m talking about:

This error is extremely valuable, it gives us feedback about our mistake early. With the hash, we wouldn’t get an error until we try to access the value of hash[:spelling] later, only to find it returning nil. Then, we have to hunt down the line to what caused the error, and when I’m tired and hungry, it’s frustrating. Using a Ruby object, we get this feedback at the cause of the error rather than somewhere later down the line.

At some point and time, you’ve likely said: “Hmm…hashes, look like objects, wouldn’t it be great if I could access and set values on one like an object?”. Hopefully when this happened you found OpenStruct which lets you do basically the same things as a hash, but with the accessors of a user-defined object:

require'ostruct'foo=OpenStruct.newfoo.spellning="richard"

Okay so we get no errors, but it “feels” like an object. Open struct can be a convenient way to pass around data, but it has similar limitations to a Hash. Even better, OpenStruct has hash-like operators, we can use it almost like a hash. The key word here is “almost”.

Wow, okay, that’s a lot of differences. An open struct behaves more like an object than a data store. It is missing manipulation methods like merge! and meta information methods like empty?. This makes sense, when was the last time you merged a user object?

Value Objects

I lump both Hash and OpenStruct as value objects, because they’re good for transporting values, but they don’t act as a typical user-defined object. They’re not good for persisting data and encapsulating complex logic. For example, a user’s name should always start with a capital letter, this is easy for objects:

Using Hashes

Likely, you’re already familiar with using hashes to transport data:

deflink_to(name,path,options={})# ...

Here, options is a hash, it makes sense to pass in a variety of different configuration options without having to specify them all in ordered arguments. However, since a hash is so flexible, we need to do additional error checking, such as ensuring that a critical key is present, or that its value isn’t unexpected (i.e. someone passed in a number when you expected a string).

Using OpenStructs

Using an Open Struct is less obvious. If you are interacting with a library and they expect an object input, you can fake it by using an Open Struct.

I find open structs useful when in the console and experimenting with new code, sometimes I use them to test interfaces in code I write. Honestly though, I generally don’t use them much. Usually, when I think I want to use an open struct, what I really want is a plain old ruby object. It is much easier to manipulate the data in a hash than an Open Struct because they have all those meta methods, and they’re more lightweight (Open Struct creates and stores a hash under the hood).

It’s worth noting that OpenStruct is pretty slow compared to a regular hash:

Because of this speed disparity and the confusion of interface, I recommend staying away from using OpenStruct in production code. Check out the OpenStruct source code (it is in Ruby) to see how it’s implemented. Bonus points if you can guess why it’s so much slower.

What about Hashie?

It seems that, at this point, it would make sense to create an OpenStructHash object that behaved like an open struct and a hash at the same time. This is exactly what Hashie does (specifically Hashie::Mash). Unfortunately, it’s a really bad idea. I’ve tried to use Hashie on several projects and always walked away frustrated and angry. I’ve used other people’s code with Hashie deeply embedded, and it’s always been a sore spot. But why?

Hashie tries to be two things at the same time. It has all the manipulation methods of a Hash and all accessor methods of a OpenStruct. This means your object now has a massive method surface area and an identity crisis.

This isn’t so bad if you’re using it as a simple hash, but then you don’t need the extra methods…just use a hash. Having this advanced pseudo-object creates problems. For example how does it behave when it interacts with other objects. Let’s say I want the values in my hashie object to take precedence in a merge, so I pass it into the Hash#merge method:

Well, that stinks. Did you notice anything else? Hashie::Mash lets you access the hash with a string or a symbol for convenience. This produced a weird result here, where some of the keys in result are strings and some are symbols.

putsresult.inspect# => {:job=>"programmer", "name"=>"schneems"}

This is really weird, if we merge! a hash twice, we expect to get the same result:

WAIT, Now we’ve got the same value with two different keys (:job and "job"). This isn’t really a “bug” so to speak, hashie does its best to do the right thing, but in this case it can’t, it doesn’t have enough information. There are more issues than just merge, but they’re not as easy to show in a few lines of code.

Hashie - Bad goes to Worse

Having multiple access modes to a hash (string and symbol) is really convenient, so some may use hashie for this task. In Rails, HashWithIndifferentAccess does this chore, and it’s really helpful. The “oh crap I used a string and I meant a symbol” is a common and painful error with hashes. However, it rarely stops there with Hashie.

Most people use hashie for either configuration objects, where they can’t be bothered to define the config attributes properly or as a way to “cheaply” build objects from the JSON response of an API. If you poke around, maybe some of your favorite API wrapper libraries use Hashie.

Both of these are horrible choices. For the config case, you now open up your users to a multitude of misconfiguration options (misspelling, no input validation, etc.) You can build these into a hashie::mash object, but it’s not simple:

Ughhhh. You can also overwrite def []=(key, value), but then, what if someone passes in a hash at initialization, well, you have to overwrite that too. Hashie has some internal helper methods for these cases, but…why not just use a class? Help your consumers with meaningful error messages and behavior. If you want them to interact with an object, return an object. If you want to return a hash, give them a hash. Giving them a pseudo object that behaves as both opens up weird edge cases and confusion for your consumers. Much easier to write

It gets worse (again) - Memory Overload

Let’s say, for some reason, you love this weird edge-casey nature and undecided pseudo-object behavior. You choose to use Hashie for a really popular project, let’s hypothetically call it omniauth. The insanely open behavior that you crave so much come at a very high cost of large numbers of short-lived objects used internally by hashie.

My replacement uses a custom object that inherits from OpenStruct from the standard lib and we can see it creates fewer objects 1337 (super l33t) versus 4615. The change also had a measurable impact on speed. I’m still tweaking, but initial benchmarks indicate a 5% increase in speed in the total request. This is a 5% increase on TOTAL request time, i.e. the app got 5% faster…not just omniauth.

Alternatives

The easiest way to quit smoking is to never start. If you’ve inherited a hashie addicted project, what can you do? In Omniauth, I removed hashie, let all the tests fail, then worked on one test at a time till they were all green. In this case, Omniauth is really popular, so we can’t just change the interface without proper deprecation warning. Ideally, in the future, we can isolate how the code is used, and replace it with some stricter (and therefore easier reason about) interfaces that are even faster.

If you really need to take arbitrary values, consider a plain ole’ Ruby Hash. If you really need the method access using the dot syntax, use a Struct, an OpenStruct, or even write a custom PORO. If you’re using hashie in an object that also wraps up logic, get rid of hashie, and keep the logic. Subclassing Hash is pretty much evil. It’s a proven fact(TM) that subclassing hashes causes pain and performance problems so don’t do it.

While I’ve ripped on Hashie a good amount: it’s a good, fun library to play with, and you can learn quite a bit about metaprogramming through the code. I recommend you check it out, but whatever you do…don’t ever put it in production.

Subscribe to my Newsletter 😻 🤠

Keep Reading 🚀

Today I have an unusual proposition for you. I’m spending a bunch of time to try to get Beto elected to Texas Senate, so I’ve not been able to write as much technical content. Rather than slow down on my door knocking, I’m looking to pick up the pace, and I want you to do it with me. Starting today, I’m offering anyone who phone banks or “block walks” (knocks on doors) the opportunity to win some of my technical time. Here’s how it’s going to work.

You might know rubocop as the linter that helps enforce your code styles, but did you know you can use it to make your code faster? In this post, we’ll look at static performance analysis and then at the end there’s a video of me live coding a PR that introduces a new performance cop to rubocop.

Rails 5.2 was just released last month with a major new feature: Active Storage. Active Storage provides file uploads and attachments for Active Record models with a variety of backing services (like AWS S3). While libraries like Paperclip exist to do similar work, this is the first time that such a feature has been shipped with Rails. At Heroku, we consider cloud storage a best practice, so we’ve ensured that it works on our platform. In this post, we’ll share how we prepared for the release of Rails 5.2, and how you can deploy an app today using the new Active Storage functionality.