Expression-oriented programming (also known as functional or side-effect-free programming, although the three things are related, not synonymous) is a wonderful way to make calculations easier to understand and maintain. However, sometimes deeply nested function calls or mixing function calls with method invocations can make an expression difficult to understand at a glance. Here is a tip for refactoring your expressions so they are easier to read.

Expressions naturally form a tree, with values at the leaves and function calls or method invocations at each node. In this post, I’ll be talking about the simplest form of expression, a pipeline. A pipeline is an expression that does not branch: a value (or often collection of values) is transformed by two or more function calls or method invocations in succession. Here’s a slightly obfuscated example of a pipeline working with collections from one of our Rails applications:

While the details don’t make much sense out of context, the overall pattern ought to be familiar as an example of the MapReduce pattern (without the distributed processing, of course).

Pipelines read from right-to-left, left-to-right, or both. For example, this set of three nested function calls reads from right-to-left:

sum_numbers.call(square_numbers.call(odd_numbers.call(1..100)))

If I try to read it from left-to-right, it’s sounds like a caricature of speech: “The sum of the squares of the odd numbers from one to one hundred.” You can’t figure it out unless you build an abstract syntax tree in your head and then evaluate it with a stack machine. Having to emulate a computer to figure out what something means is not a good sign. it reads much easier from right-to-left: “Take the numbers from one to one hundred. Select the odd ones. Square them. And finally, take the sum.”

Popular languages like Ruby make it easy to write expressions that read from left-to-right directly: here’s an example from Ruby 1.9 (or with Symbol#to_proc):

(1..100).select(&:odd?).map { |n| n*n }.inject(&:+) => 166650

Object orientation’s emphasis on nouns at the expense of verbs has its issues. But when a computation really is a step-wise transformation of data, I find that chaining methods makes code a lot easier to understand than nesting functions. On the other hand, I prefer nesting functions when the expression has more of a tree form.

But whichever direction you prefer, I find it very difficult to read code that mixes directions in the same expression:

My first reaction was to think that adding factorial as a method was an idea from another planet:1 why should integers know how to answer their own factorials? This seemed like a classic case of a function that should not be an object method. But nevertheless, having calculations be methods instead of functions lets you write a certain type of expression consistently from left-to-right (5.succ.factorial.succ.odd) instead of mixing directions (factorial.call(5.succ).succ.odd?).

All the same, there are good reasons why we don’t overload numeric classes with every possible calculation and formula. So what can we do? How about:

I read this as “Start with five, get its successor, put that into the factorial proc, take the result’s successor, and answer whether it is odd.” The whole thing reads in one consistent style, you aren’t mixing left-to-right method chaining with right-to-left nesting functions. I wouldn’t go crazy with Object#into in a program, but if you have an expression that is predominately chaining methods, Object#into can make it consistent and improve its readability.

Function Composition

There is more than one way to skin a cat. If f(g(h(value))) is too constricting, we can compose functions instead of nesting them. So we can write:

That saves us from writing 5.into(minus1).into(squared).into(plus1) if we find three instances of “into” a little noisy. Composing functions using * lets us maintain right-to-left order and composing functions with | lets us create left-to-right order when we are making a “pipeline” of expressions.

Summary

In the end, this is a very trivial idea: When an expression can be written so that it reads consistently from left-to-right or consistently from right-to-left, do so. The code will be easier to read.

Uh, yes, I am familiar with Smalltalk. I’m thinking that my opinion of my ability to make a joke far exceeds my actual ability: the phrase is meant as a pun on Edgar Rice Burroughs’s Barsoomian Tales, featuring the Warlord John Carter. But all that being said, regardless of how OO you want to get, I am not convinced that objects are responsible for every operation that can possibly be performed on them.

So to play Smalltalk's advocate: why *shouldn't* we "overload numeric classes with every possible calculation and formula"? It drives me nuts that in Ruby, for example, numbers know how to square themselves (10 ** 2) but not how to take their own square root (Math.sqrt(100)).

Purely from a pragmatic software engineering standpoint: when you have so many different data types (small integers, large integers, floats, scaled decimals, complex numbers, amounts of money...) which respond to the same set of operations, it seems foolish not to take advantage of method dispatch to allow a different/optimized implementation of each operation for each type of numeric value.

Things definitely change in a multiple-dispatch world; if we're talking about CLOS or Dylan, the design space looks quite different. And if we're talking about Java, which is IMO broken by not having open classes, then we're in trouble. But at least in the context of Ruby, Smalltalk, C# and Objective-C, it seems that maybe we agree :).

I'm a little suspicious of a concept like the "intellectual surface area" of Integer, though, because I think that is (and should be) a moving target depending on what packages you have loaded. The more relevant surface area seems to me to be at the package level: if I load a package, I have to know what new classes and methods it introduces, regardless of the class it adds them to. If I don't load that package (say I don't load a math package because I don't need factorial or sqrt) then I don't need to know about those methods at all.

I agree 100% with your suggestion that the important thing is which packages you load.

It is trivial to build a 'package' in Ruby that opens Integer and adds math methods to it. That gives you some of the advantages you mentioned, such as leveraging implementation optimizations, without turning the standard library into a swiss army knife of capabilities.

I have an unfinished post where I laud open classes while lamenting their "globality." I would love to be able to decorate Hash, Integer, Object, Proc, and lots of other things but only in the context of a specific class.

Take adding * to Proc for composition. What happens if my code does that but someone else's code adds * to Proc meaning produce the Cartesian product of the results of the two Procs?

I wish that it was possible for both pieces of code to happily co-exist.

It seems to me that this issue is solved fairly neatly in Scala via their "implicit conversion" feature.

What happens if my code does that but someone else's code adds * to Proc meaning produce the Cartesian product of the results of the two Procs?

Your code would define an implicit conversion to a "RegProc" which has a "*-for-composition" operator, and Avi's code would have the other implicit conversion to an "AviProc" whose * means the Cartesian product.

I wish that it was possible for both pieces of code to happily co-exist.

At least in this case, they could co-exist without further ado in separate parts of your application. In order to get them to coexist in the same scope, a little more work might be required (changing half of the call sites to disambiguate the intent).

The Smalltalker's answer would be that using * is a bad idea in both of those cases - much better to be explicit and use #compose: in the one case and #cartesianProduct: in the other. And if you end up with two packages that both define #cartesianProduct: in conflicting ways on the same class, well, then you have a bigger problem than namespacing will solve.

The issue of * vs. "compose" is interesting, but a much larger discussion of readability and trade-offs. Let's stick to the semantics. Even if both methods are named "compose" or "cartesianProduct," you can have two people write them such that they are broadly the same but differ in minor details, causing conflicts.

This will certainly be the case for anything non-trivial. For example, Ruby's Symbol#to_proc allows you to use a symbol representing a method with any arity, however all of the documented examples I found concerned methods taking no parameters.

It is included as part of Ruby on Rails and will be in Ruby 1.9, but otherwise if you want it you must copy and paste or roll your own.

What happens if someone rolls their own based on the informal examples and documents on the web? Their version would be incomplete and clash with anyone using &:merge or &:+.

We can blame them for a buggy implementation, but that bothers me: their implementation works for all of the code they wrote, how are we supposed to coördinate everyone's requirements for a method in an open class?

I don't have an answer, but I do stand by the thought that this is a problem with open classes: their global nature. Most of the time when I extend an open class, I am making changes that are really private to my use.