Wondrous oddities: R's function-call semantics

Every so often, I am going to write about wondrous oddities – obscure programming-language features that are so cool they deserve wider notice. Today, in the first installment, I want to show you the function-call semantics of R, a great system for statistical computing.

You might not expect a statistics system to have a first-class programming language at it’s heart, but if you think about it, it does make sense. The R language, actually a dialect of the S language, is described as “a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.” All true. It gives me the feeling of an infix Lisp or Scheme whose syntax is slanted toward mathematics and vector operations. The language has an object layer, too, but that’s not why we are here.

Flexible argument binding

Here is a simple function of two arguments:

f <-function(tens, ones = tens)
ones +10 *tens

The function f has two formal arguments, tens and ones, the second of which has a default value, defined to be tens, referring back to the first argument. R lets you call the function like so, passing in arguments by position:

f(3, 4) # 34

But you can also specify arguments by name, in any order:

f(tens=3, ones=4) # 34f(ones=4, tens=5) # 54

And, if you leave off the ones argument, it will get its value from tens because of its default definition:

f(3) # 33f(tens=2) # 22

At this point, you’re probably thinking that this is nice and all, but not “wondrous oddity” material. Hold that thought for a moment.

Moving on, you can mix positional and named arguments and even shuffle the argument ordering:

f(tens=2, 6) # 26f(6, tens=2) # 26f(ones=9, tens=8) # 89

You can even abbreviate arguments:

f(tens=2, o=6) # 26f(t=3, ones=9) # 39f(o=9, t=4) # 49

To explore the full abbreviation semantics, we need a more complex function:

But, R is smart enough not to consider an abbreviation ambiguous if the ambiguity goes away when other arguments are matched exactly:

g(t=0, thousands=9) # 9301

Before we move on, let’s review R’s argument-binding features:

you can pass arguments by position or by name

you can omit arguments that have defaults

you can abbreviate argument names

you can use any combination of the above features, provided the combination results in no ambiguity

Lazy argument evaluation

Unlike most programming languages, R evaluates bound arguments lazily, meaning that the expressions you pass as arguments are not converted into values until needed. This lets you create functions that act like control structures. For example, the following function acts like an if-then-else control structure:

Another benefit of R’s lazy argument evaluation is that you can provide mutually recursive defaults, which is a great way to implement adaptive interfaces. For example, here is a function that computes a point’s representation in both Cartesian and polar coordinate systems. You can specify the input point in either system, and the function adapts automatically:

Notice how there was no need for me to test the arguments to see how the function was called. All I did was define each set of argument defaults in terms of the other set of arguments. R can figure out the rest based on how the function is called. That’s programmer friendly.

if you don’t use an argument, you don’t have to pay for R to evaluate it

Split-horizon scoping

R’s scoping rules give passed arguments and default values different perspectives – split horizons, if you will. Passed arguments see what was visible at the time of the call. No biggie here; every language works this way. Default values, on the other hand, see what is inside of the function as it evaluates. That means defaults have access to bound arguments and local variables, which means you can write functions whose defaults rely upon values computed in the function body.

This is a great feature that combines with R’s lazy argument binding to eliminate argument-handling logic. For example, a lot of R’s library code takes advantage of the following idiom:

The myplot function plots the values you pass it in vals. By default the function scales the plot to show all of the values. If you want, however, you can constrain the vertical extent of the plot by passing in ymin and/or ymax arguments. Note the refreshing lack of logic to handle the arguments. The code just gets down to business.

For comparison, here is a Ruby version of the function. When it comes to this kind of thing, Ruby is better than most mainstream languages, but it still makes us do about twice the work that R does:

To recap, R’s scoping rules, when combined with lazy argument evaluation, let you shave away tedious argument tests and placeholder defaults such as nil. Instead, you can focus on the core logic, letting R take care of the argument handling burdens. The win might seem small, but when you write a lot of code, the clarity and code reduction add up.

That’s it

So there you have it: a surprisingly sophisticated function-call semantics that does away with argument-handling tedium. That you’ll find it in a statistics system and not in a mainstream programming language makes it a wondrous oddity.