Year: 2014

If you use R in a production environment, you have most likely experienced that some circumstances change in ways that will make your R scripts run into trouble. Many things can go wrong; package updates, external data sources, daylight savings time, etc. There is a general increasing focus on this within the R community and words like “reproducibility”, “portability” and “unit testing” are buzzing big time. Many really neat solutions are already helping a lot: RStudio’s Packrat project, Revolution Analytic’s “snapshot” feaure, and Hadley Wickham’s testthat package to name a few. Another interesting package under development is Edwin de Jonge’s “validate” package.

I found myself running into quite a few annoying “runtime” moments, where some typically external factors break R software, and more often than not I spent just too much time tracking down where the bug originated. It made me think about how best to ensure that vulnarable statements behaves as expected and how to know exactly where and when things go wrong. My coding style is heaviliy influenced by the magrittr package’s pipe operator, and I am very happy with the workflow it generates:

It’s like a recipe. But the problem is that I found no existing way of tagging potentially vulnarable steps in the above process, leaving the choice of doing nothing, or breaking it up. So I decided to make “ensurer”, so I could do:

Now, I don’t have a blog, but Tal Galili has been so kind to accept the ensurer vignette as a post for r-bloggers.com. I hope that ensurer can help you write better and safer code; I know it has helped me. It has some pretty neat features, so read on and see if you agree!

Introduction

Testing is a crucial component in ensuring that the correct analyses are deployed. However it is often considered unglamorous; a poor relation in terms of the time and resources allocated to it in the process of developing a package. But with the increasing popularity and commercial application of R it testing is a subject that is gaining significantly in importance.

At the time of writing there are 5987 packages on CRAN. Due to the nature of CRAN and the motivations of contributors the quality of packages can vary greatly. Some are very popular and well maintained, others are essentially inactive with development having all but ceased. As the number of packages on CRAN continues to grow, determining which packages are fit for purpose in a commercial environment is becomming an increasingly difficult task. There have been numerous articles and blog posts on the subject of CRAN’s growth and the quality of R packages. In particular, Francis Smart’s R-bloggers post entitled Does R have too many packages? highlights five perceived concerns with the growing number of R packages. I would like to expand on one of these themes in particular, namely the “inconsistent quality of individual packages”.

There are many ways in which a package can be assessed for quality. Popularity is clearly one: if lots of people use it then it must be quite good! But popular packages tend to also have authors that actively develop their packages and fix bugs as users identify them. Development activity is therefore another factor; the length of time that a package has existed for; the package dependency tree and the number of reverse ‘Depends’, ‘Imports’ and ‘Suggests'; the number of authors and their reputation; and finally there is testing. Francis briefly mentions testing in his post noting that “testing is still largely left up to the authors and users”. In other words there is no requirement for an author to write tests for their package and often they don’t!

R 3.1.2 (codename “Pumpkin Helmet“) was released last week. You can get the latest binaries version from here. (or the .tar.gz source code from here). The full list of new features and bug fixes is provided below.

Upgrading to R 3.1.2 on Windows

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code:

CHANGES IN R 3.1.2:

[…] improvements for the log-Normal distribution function, improved axis controls for histograms, a fix to the nlminb optimizer which was causing rare crashes on Windows (and traced to a bug in the gcc compiler), and some compatibility updates for the Yosemite release of OS X on Macs.

This is a guest post by Stefan Milton, the author of the magrittr package which introduces the %>% operator to R programming.

Preface (by Tal Galili)

I was first introduced to the %>% (a.k.a: pipe) operator in R, thanks to Hadley Wickham’s (fascinating) dplyr tutorial (link to the workshop’s material) at useR!2014. After several discussions during the conference (including one very influential conversation with Rstudio’s Joe Cheng), I got convinced that the pipe operator is one (if not THE) mostimportantinnovation introduced, this year, to the R ecosystem.

Soon after, I contacted Stefan Milton (the author of the magrittr package), asking him to write about his implementation of the pipe operator. Stefan generously agreed, and what follows is what he had to share with the rest of us.

magrittr: The Difficult Crossing – by Stefan Milton Bache

Background

It has only been 7 months and a bit since my initial magrittr commit to GitHub on January 1st. It has had more success than I had anticipated, and it appears that I was not quite alone with a frustration which caused me to start the magrittr project. I am not easily frustrated with R, but after a few weeks working with F# at work, I felt it upon returning to R: I had gotten used to writing code in a different way — all nicely aligned with thought and order of execution. The forward pipe operator |> was so addictive that being unable to do something similar in R was more than mildly irritating. Reversing thought, deciphering nested function calls, and making excessive use of temporary variables almost became deal breakers! Surprisingly, I had never really noticed this before, but once I did my returning to R became a difficult crossing.

An amazing thing about R is that it is a very flexible language and the problem could be solved. The |> operator in F# is indeed very simple: it is defined as let (|>) x f = f x. However, the usefulness of this simplicity relies heavily on a concept that is not available in R: partial application. Furthermore, functions in F# almost always adhere to certain design principles which make the simple definition sufficient. Suppose that f is a function of two arguments, then in F# you may apply f to only the first argument and obtain a new function as the result — a function of the second argument alone. This is partial application, and works with any number of arguments, but application is always from left to right in the argument list. This is why the most important argument (and the one most likely to be a left-hand side object in the pipeline) is almost always the last argument, which in turn makes the simple definition of |> work. To illustrate, consider the following example:

some_value |> some_function other_value

Here, some_function is partially applied to other_value, creating a new function of a single argument, and by the simple definition of |>, this is applied to some_value.

It was clear to me that because R is lacking native partial application and conventions on argument order, no simple solution would be satisfactory, although definitely possible, see e.g. here or here. I wanted to make something that would feel natural in R, and which would serve the main purpose of improving cognitive performance of those writing the code, and of those reading the code.

It turned out that while I was working on magrittr’s %>% operator, Hadley Wickham and Romain Francois was implementing a similar %.% operator in their dplyr package which they announced on January 17. However, it was not quite as flexible, and we thought that piping functionality was better placed in its own more light-weight package. Hadley joined the magrittr project, and in dplyr 2.0 the %.% operator was deprecated — instead%>% was imported from magrittr.

R 3.1.1 (codename “Sock it to Me“) was released today! You can get the latest binaries version from here. (or the .tar.gz source code from here). The full list of new features and bug fixes is provided below.

Upgrading to R 3.1.1 on Windows

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code:

After running “updateR()”, the function will detect that R is available for you, and will download+install it (etc.).

Note that the latest installr version (0.15.3) was released just less than a month ago to CRAN, and it is recommended to upgrade to it, since it has more updated URLs to some software.I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to leave a comment below.

If you use the global library system (as I do), you can run the following in the new version of R:

This week I presented in the useR!2014 my package dendextend (also on github), for easily manipulating, visualizing, and comparing dendrograms. Put simply, it is a package designed to easily create figures like these:

I highly welcome features suggestions and bug reports (or just “wow, this is awesome”) sent to my e-mail (tal.galili AT gmail.com), you can also leave a comment or use the github issue page.

A sidenote on useR!2014: this year’s useR conference was wonderful! I enjoyed the many talks, sessions, posters, and especially the so many wonderful R users I got to meet (and I will not try to list all of you – but you know who you are, and how much I enjoyed seeing you!). As corny as it may sound – we, the people who use R, are truly a community. There is a lot to be said about getting to meet so many people who share my own passion for statistical programming, open source, collaboration, open science, and a better future in general. Gladly, you can get a sense of what happened there by having a look at the twitter hashtag #useR2014. Several great R bloggers already started writing about it, you can see their posts here: 1, 2, 3, 4, 5. And I hope more posts will follow. I hope to see you in next year’s useR!2015!

Upgrading to R 3.1.0

If you are using Windows, it might take another 24 hours until you could update R. For convenience, you can upgrade to the latest version of R using the installr package. Simply run the following code:

After running “updateR()”, the function will detect that R is available for you, and will download+install it (etc.).

Note that the latest installr version (0.14.0) was released a week ago to CRAN, and it is recommended to upgrade to it, since it is now more robust for various extreme cases of upgrading R.I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to leave a comment below.

If you use the global library system (as I do), you can run the following in the new version of R:

CHANGES IN R 3.0.3:

NEW FEATURES

I am happy to officially announce a new website called R-users.com. The idea of the site is that community members will invite other R users to join them in their R projects, conferences, and work places.

This site is a “job board” for R users, hosting various “call to action” to R-users, to do stuff such as:

For example, I am the author of the R package “installr” for easily updating R on windows. However, I would love for someone who is a mac/linux user to expend my package for non-Windows users. Hence, I created a new “job”, inviting help on this project, which you may see in this link.

If you also wish to post your own “R job” for other R-users to see, here is a very short presentation on how to do it: