How One Ruby Method Let us Delete Thousands of Lines of Code.

Posted on Saturday, Mar 05, 2016

At VTS we have a saying: code is a liability. As an application grows, legacy code develops. This code might not ever be covered in your production application, but you still incur the cost of maintaining it. Unfortunately, knowing what code is legacy and safe to remove can be difficult in a complex application. In order to try and determine what code was being hit we opted to use a gem called coverband. Coverband is a ruby gem to measure production code coverage. When enabled it logs every file and line number which has been hit to redis. Coverband can be used as a rack middleware which means that, once configured, we were able to trace the execution path of requests on our production servers.

Up until very recently, coverband worked by using a method on the Kernel module called set_trace_func. The method set_trace_func takes a blocks as an argument with 6 parameters: The event name, filename, line number, object id, binding, and the name of a class. When called, set_trace_func will invoke the passed in block whenever one of the following events occurs:

From a super high level and ignoring many implementation details the set_tracer method is called on init and calls set_trace_func on the current thread, passing in a block. The block itself calls the add_file method which will log the file and line to redis. Overtime you build up a set of key value pairs in redis where the key is the filename and the value is an array of line numbers. You can then generate a SCOV style report and get a nice output making it easy to see what lines of code are covered in your application!

But how does set_trace_func actually work? Seeing set_trace_func in action can feel a bit like that classic “ruby magic”. The thing that is most confusing is, how ruby knows when to call the proc. To understand this, we have to take a little detour into how ruby actually runs programs.

Tokenization

Let’s take the following very simple program:

2+2

When you run the program, ruby first tokenizes the input. During tokenization, ruby steps through the characters one at a time, and groups them together into tokens. Tokenization is the first step in turning the text into an actual program. We can actually see how ruby does its tokenization using the built in ripper tool:

Each line represents a single token. The first array value is an array which consists of the line and column number. The second array value is the token itself, which corresponds to the actual C parse code. For instance the :on_int token corresponds to the actual tINTEGER token that would be found in ruby’s source code. The third value is the string of characters in the ruby code that correspond to the token.

Tokenization is responsible only for the process of turning characters into tokens, it is not responsible for determining whether the passed in characters are actually valid ruby. That is the responsibility of the parser.

Parsing

After the tokenization step, ruby then parses the tokens, grouping them into sentences and phrases as defined by ruby’s grammar rules. Ruby uses the LALR (Look-ahead left reversed rightmost derivation) algorithm to parse the code and generate an AST(abstract syntax tree). I will not detail the algorithm here, but you can see a textual representation of the AST ruby generates for our code by running the following program:

require'ripper'require'pp'code=<<CODE
2 + 2
CODEppRipper.sexp(code)

Output:

[:program,[[:binary,[:@int,“2”,[1,0]],:+,[:@int,“2”,[1,4]]]]]

Ruby has compiled the stream of tokens into an AST, a description of the actual program. The AST is a tree data structure which represents the structure of the program code. The textual output might look a bit confusing, so here is a corresponding graphical output:

We start here with the top level program node. Every ruby generated AST begins with this node. The + operation is contained in a binary node, as in a binary operation. You can see that walking the AST top down and then left to right yields a description of the actual program.

Compiling to YARV Instructions

As of Ruby 1.9, Ruby uses a virtual machine called YARV(Yet Another Ruby Virtual Machine). This means there is an additional compile step after generating an AST. Ruby recursively iterates over the AST from the top down and compiles each node into YARV instructions. We can see the compiled YARV instructions for our example using built in library tools:

First we start with the trace instruction. The trace instruction is exactly what allows us to use set_trace_func. It indicates that a new event has occurred. In this case, the event is line.

Next YARV pushes the object 2 on the stack as a receiver.

Then another object 2 is pushed onto the stack as an argument.

The final instruction, opt_plus says “send the plus message to the receiver (the first 2 object pushed onto the stack) with one argument (the next 2 object on the stack). The ARGS_SIMPLE indicates that the arguments are simple values.

Now let’s confirm that the trace YARV instruction corresponds to the events hooked into by set_trace_func:

Curiously, we see two events here, this is because set_trace_func actually traces itself! So we see the first event, c-return or return from a c function call, in this case the built in set_trace_func method, and then the line event we expected to see. The important thing to remember is this. When ruby executes your program it first translates the source code into a stream of tokens. Ruby then parses the stream of tokens into an AST. Finally each node scope in the AST is compiled into a snippet of YARV instructions. These YARV instructions include calls to trace which allow us to hook into internal ruby events.

This is just one of the many beauties of Ruby. The dynamic nature of the language allows us to do things which would be impossible in other languages, and which can generate real business value. In this case, understanding one simple built in method allowed us to delete thousands of lines of unused code from our production application, as well as the corresponding tests. For us at VTS it means we have faster test suite, and fewer lines of code to maintain. For more information I would highly recommend checking out the coverband gem for yourself. For more information on how ruby tokenizes, parses, and compiles programs, I would recommend Pat Shaughnessy’s “Ruby Under A Microscope”, a spectacular dive into Ruby’s internals. In a follow up blogpost I would like to detail how we were able to speed up coverband by swapping out set_trace_func for a new ruby class called TracePoint which provides the same functionality as set_trace_func, but which allows us to hook into specific events. Thanks for reading!