Refactoring Ruby programmatically.

When you join a new team, one of of the first things you'll figure out is their preferred coding style. They probably have a linter like rubocop or flake8 to delegate style arguments to computers who are the supreme pedantics. Sometimes though, you'll find reasons to change a repository’s coding style or to merge in another code base with different style choices.

At a certain scale, you’ll probably just fixing things by by hand, but for projects that span thousands of files, no amount of caffeine can mask the pain.

Linting is just one example of a broader category of problems: how do you refactor large codebases? Although a common answer is simply not to refactor at scale, that tends to cause codebases to degrade rapidly over time. It can be better!

Most languages come with libraries to represent source code as s-expressions, which you can then modify in new ways to generate modified source code. For Ruby, two libraries that do just that are:

Imagine we have a method incr which used to require two parameters, but most invocations incremented by 1, so we added 1 as the default value for the second parameter. Now we want to rewrite all calls to incr to only pass a second value if it is different from the default.

Assuming you have the example input above in a file named refactor.input and you name this file refactor.rb, then you can run it using:

ruby refactor.rb < refactor.input

This is actually pretty cool, because we’re taking in some code, parsing it, and then recombining it, but the really fun part is what comes next: modifying it!

Astute readers will notice the output version has some extraneous parentheses. I’m skimming over because it’s equivalent Ruby code, but it’s a bit annoying, and perhaps an astute reader will propose a non-regex based solution.

The rewrite function gets called by ruby_parser on the top-level s-expression, from which you
can recursively explore all the program's s-expressions. To explore the structure of individual
s-expressions a bit, consider the input:

incr(3, 1)

Which is represented by a Ruby object whose structure is:

s(:call, nil, :incr, s(:lit, 1), s(:lit, 2))

In order, these values are:

:call is the kind of s-expression (some other common kinds are :block, :lasgn and :defn) for invoking a function,

the second value, nil, doesn’t contain anything interesting for :call, although it does for other kinds,

:incr is the name of the function invoked,

remaining values are the parameters passed to invoked function.

Reminding ourselves of our original problem statement: can we remove the second parameter of calls to the :incr function if they specify the same value as the default parameter? Yup, we now know enough to write that function:

If we’re calling incr and the second parameter is the new default parameter, a lit of value 1, then we should remove it.

Recursively descend into the contents of each s-expression. Otherwise you’ll only see the top-level :block s-expression which is pretty boring.

Stepping back, I think this is pretty awesome! We’re now programmatically rewriting code. We can use this to maintain even large codebases without doing huge amounts of manual toil.

Let’s try it again, doing something a bit more ambitious. Imagine you’ve hired a bunch of Python programmers on our team who keep writing Python-style for loops instead of learning Ruby's each idiom, and that we want to rewrite them to use each.

A bit messier, but also a pretty neat demonstration of what you can do is once you start playing around with this technique. For example, you could imagine only doing this if the complexity of the refactored for loop is low enough.

Most importantly, I think this is a good reminder to avoid falling into the
"I'll just work through it" mindset for large migrations, which I believe can
become the limit on your company's overall throughput.