Wednesday, July 30, 2014

More Readable Code with Pipes in R

Several blog posts have made mention of the 'magrittr' package which allows functional arguments to be passed to functions in a pipes style fashion (David Smith ).

This stylistic option has several advantages:

1. Reduced requirements of nested parenthesizes
2. Order of functional operations now read from left to right
3. Organizational style of the code may be improved

The library uses a new operator %>% which basically tells R to take the value of that which is to the left and pass it to the right as an argument. Let us see this in action with some text functions.

require('magrittr')# Let's play with some strings
str1 = "A scratch? Your arm's off."
str2 = "I've had worse."
str1 %>% substr(3,9)#[1]Evaluates to "scratch"
str1 %>% strsplit('?',fixed=TRUE)#[[1]]#[1] "A scratch" " Your arm's off."# Pipes can be chained as well
str1 %>% paste(str2) %>% toupper()# [1] "A SCRATCH? YOUR ARM'S OFF. I'VE HAD WORSE."# Let's see how pipes might work with drawing random variables# I like to define a function that allows an element by element maximization
vmax <- function(x, maximum=0) x %>% cbind(0) %>% apply(1,max)
-5:5 %>% vmax
# [1] 0 0 0 0 0 0 1 2 3 4 5# This is identical to defining the function as:
vmax <- function(x, maximum=0)apply(cbind(x,0),1,max)
vmax(-5:5)# Notice that the latter formation uses the same number of parenthsize# and be more readable.# However recently I was drawing data for a simulation in which I wanted to # draw Nitem values from the quantiles of the normal distribution, censor the# values at 0 and then randomize their order.
Nitem <- 100
ctmean <- 1
ctsd <- .5
draws <- seq(0,1, length.out = Nitem+2)[-c(1,Nitem+2)] %>%
qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem)# While this looks ugly, let's see how worse it would have been without pipes
draws <- sample(vmax(qnorm(seq(0,1, length.out = Nitem+2)[-c(1,Nitem+2)],ctmean,ctsd)),Nitem)# Both functional sequences are ugly though I think I prefer the first which# I can easily read as seq is passed to qnorm passed to vmax passed to sample# A few things to note with the %>% operator. If you want to send the value to# an argument which is not the first or is a named value, use the '.'
mydata <- seq(0,1, length.out = Nitem+2)[-c(1,Nitem+2)] %>%
qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem) %>%
data.frame(index = 1:Nitem , theta = .)# Also not that the operator is not as slow as you might think it should be.# Thus:1 + 8 %>% sqrt# Returns 3.828427# Rather than(1 + 8) %>% sqrt# [1] 3