Symbol GC in Ruby 2.2

What is symbol GC and why should you care? Ruby 2.2 was just released and, in addition to incremental GC, one of the other big features is Symbol GC. If you’ve been around the Ruby landscape, you’ve heard the term “symbol DoS”. A symbol denial of service attack occurs when a system creates so many symbols that it runs out of memory. This is because, prior to Ruby 2.2, symbols lived forever. For example in Ruby 2.1:

Here we create 100,000 symbols and they’re still around, even though we’ve run GC and no variables reference those objects. This could easily be a problem if you wrote some code that accepted a user parameter and calls to_sym on it:

def show
step = params[:step].to_sym
end

In this case, someone could make many requests to example.com/step= and, since your application never clears out symbols, your program would eventually run out of memory and crash. This may sound like a fabricated example, but it was similar to code I actually had committed in my Wicked gem (don’t worry, it’s fixed now). It’s not an isolated case either:

Since the symbols we create aren’t referenced by any other object or variable, they can be safely collected. This helps in preventing us from accidentally creating a scenario where a program creates and retains so many objects that it crashes. However, Ruby doesn’t garbage collect ALL symbols.

WAT?

#not_all_symbols

Previous to Ruby 2.2, we couldn’t collect symbols because they were used internally by the Ruby interpreter. Basically, each symbol has a unique object ID. For example :foo.object_id always needed to be the same value for the duration of the program execution. This is due to the way rb_intern works.

In C-Ruby, when you create a method it stores a unique ID to a method table.

Slide from Nari’s talk on Symbol GC

Later, when you call the method, Ruby will look up the symbol of the method name, and then get the ID of that symbol. The ID of the symbol is used to point at the static memory of the function in C. The function in C is then called and that’s how Ruby executes methods.

If we garbage collected a symbol and that symbol was used to reference a method, then that method is no longer callable. That would be bad.

To get around this problem Narihiro Nakamura introduced the idea of an “Immortal Symbol” in the C World and a “Mortal symbol” in the Ruby world.

Basically, all symbols created dynamically while Ruby is running (via to_sym, etc.) can be garbage collected because they are not being used behind the scenes inside the Ruby interpreter. However, symbols created as a result of creating a new method or symbols that are statically inside of code will not be garbage collected. For example :foo and def foo; end both will not be garbage collected, however "foo".to_sym would be eligible for garbage collection.

There are gotchas with this approach, it’s still possible to have a DoS if you’re accidentally creating methods based on user input.

define_method(params[:step].to_sym) do
# ...
end

Because define_method calls rb_intern behind the scenes, even though we are passing in a dynamically defined (i.e. to_sym) symbol, it will be converted to an immortal symbol so it can be used for method lookup. Hopefully, you wouldn’t be doing that anyway, but it’s still good to point out dangerous bits in Ruby.

Even though the variable is nil, it uses a symbol behind the scenes that will never get garbage collected. In addition to avoiding randomly defining methods based on user input, also watch out for creating variables based on user input:

self.instance_variable_set( "@step_#{ params[:step] }".to_sym, nil )

To be truly safe, you should periodically check Symbol.all_symbols.size after running GC.start to ensure that the symbol table isn’t growing. Moving into the future, hopefully some good standards around what is and isn’t safe to do with symbols becomes more general knowledge. If you find another really common gotcha, reach out to me on twitter and I’ll try to keep this section updated.

I Feel the Need for Speed

In addition to security, the biggest reason you should care about this feature is speed. There’s a ton of code written around turning symbols into strings to avoid accidentally allocating symbols from user input. Generally when you put the words “ton” and “code” together, the results aren’t fast.

The most common example of avoiding Symbol allocations is Rail’s (ActiveSupport’s) HashWithIndifferentAccess. Since I wrote about subclasses of Hash like Hashie being slow, you may not be surprised to find that this behavior in Rails comes with a huge performance penalty.

We see that indifferent access hash with a string is about half the speed of a regular hash with symbol keys. We also see that using a symbol to access the value in an indifferent access hash is a whopping 5 times slower than using a regular hash with symbol keys. I wrote about how string key performance in Ruby 2.2 is getting a big improvement, however, accessing a hash with a symbol is still the fastest and, some might argue, the most aesthetically pleasing way to access a hash. Now with Ruby 2.2, we could use symbol keys in parameters in Rails. If we made that switch, we don’t have to worry about security, and we wouldn’t have to incur the overhead of the HashWithIndifferentAccess tax.

Note: You should do benchmarking at the application level before making any big performance changes, especially whenever it requires an API deprecation. Don’t ever submit a performance patch with the justification that “some blog said it was faster” even if that blog is mine. Always verify claims with a case by case benchmark.

Recap

Symbol GC saves your butt from DoS attacks and allows you the flexibility of using symbols wherever you want. Coupled with Ruby’s 2.2’s host of other performance features, including incremental GC and string de-duplication in with Hash keys, there’s no reason not to upgrade right away. Install locally:

Replies

This has been one of te weakest points of Ruby for any interactive app, and clearly one of the most common weak point of any Rails app out there. (It has also been one of my favourite Ruby's tales for new comers )

This "bug" has actually changed the respect we have towards Symbols, and even the way/place we use them, making us pick Strings in a lot of cases or at least forcing us to add dumb validations here and there.

Personally, I celebrate Symbol GC even more than named params or even some of the performance improvements.