The magrittr abstraction

Once "library(magrittr)" is loaded we can treat the expression:
7 %>% sqrt()
as if the programmer had written:
sqrt(7)
.

That is the abstraction of magrittr into terms one can reason about and plan over. You think of x %>% f() as a synonym for f(x). This is an abstraction because magrittr is not in fact implemented as a macro source-code re-write, but in in terms of function argument capture and delayed evaluation. And as Joel Spolsky famously wrote:

All non-trivial abstractions, to some degree, are leaky.

The magrittr pipe is non-trivial (in the sense of doing interesting work) because it works as if it were a syntax replacement even though you can use it more places than you could ask for such a syntax replacement. The upside is: magrittr makes two statements behave nearly equivalently. The downside is: we expect this to fail in some corner cases. This is not a criticism; it is as Bjarne Stroustrup wrote:

There are only two kinds of languages: the ones people complain about and the ones nobody uses.

The tidyeval/rlang abstraction

The package dplyr 0.6.* brings in a new package called rlang to supply a capability called tidyeval. Among the abstractions it supplies are: operators for quoting and un-quoting variable names. This allows code like the following, where a dplyr::select() takes a variable name from a user supplied variable (instead of the usual explicit take from the text of the dplyr::select() statement).

Notice in the above example we had to specify the abstract varName by calling quo() on a free variable name (disp) and did not take the value from a string. tidyeval is working hard to supply a parametrizable non-standard interface, and it doesn’t look like a standard interface is the central goal. That is: the following is not intended to work:

This is unfortunate as the main reason you want to parameterize over variable names is that the names are coming from somewhere else, and likely supplied as strings not as quosures (which themselves carry details of environment, meaning they are more like bound variables than free variables). I am sure you can convert a string into a column reference in rlang/tidyeval but it doesn’t seem to be the central use case (or is least not held out as such in the help and examples).

An issue

dplyr issue 2726 (reproduced below) discusses a very important and interesting issue.

At a cursory glance the two discussed expressions and the work-around may seem alien, artificial, or even silly:

(function(x) select(mtcars, !!enquo(x)))(disp)

(function(x) mtcars %>% select(!!enquo(x)))(disp)

(function(x) { x % select(!!x)})(disp)

However, this is actually a very crisp and incisive example. In fact, if rlang/tidyeval were a system up for public revision (such as a RFC or some such proposal) you would expect the equivalence of the above to be part of an acceptance suite.

The first expression looks very much like rlang/tidyeval package examples and is the “right way” in rlang/tidyeval to send in a column name parametrically. It is in the style preferred by the new package so by the package standards can not be considered complicated, perverse, or verbose. The second expression differs from the first only by the application of the “magrittr invariant” of “x %>% f() is to be considered equivalent to f(x)“.

The outcome is the first expression currently executes as expected, and the second expression errors-out. This can be considered surprising as this is not something anticipated in the documentation or recipes for building up tidy expressions. This is a leak in the combined abstractions, something we are told to back away from as it doesn’t work.

The proposed work-around (expression 3) is helpful, but itself demonstrates another leak in the mutual abstractions. Think of it this way: suppose we had started with expression 3 as working code. We would by referential transparency expect to be able to refactor the code and replace x with its value and move from this third working example to the second expression (which happens to fail).

To summarize: expressions 1 and 3 are equivalent. They differ by two refactoring steps (introduction/removal of pipes, and introduction/removal of a temporary variable). But we can not demonstrate the equivalence by interpolating in 2 named transformations (going from 1 to 2 to 3, or from 3 to 2 to 1) as the intermediate expression 2 is apparently not valid.

The wrapr::let version of the issue author’s desired expression 2 is:

(function(x) let(c(X = x), mtcars %>% select(X)))('disp')

Conclusion

wrapr::let() is a useful abstraction:

It directly takes strings as variable names (the most common source of parametric variable names).

It is a marco-like replacement and easy to teach as a code re-writing abstraction.

It has a small interaction surface, and plays well with delayed evaluation packages such as magrittr and dplyr 0.5.0.