As you can imagine, when giving an on-site course, a reasonable question is what version of R is required for the course. We always have an RStudio cloud back-up, but it’s nice for participants to run code on their own laptop. If participants are to bring there own laptop it’s trivial for them to update R. But many of our clients are financial institutions or government where an upgrade is a non-trivial process.

So, what version of R is required for a tidyverse course? For the purposes of this blog post, we will define the list of packages we are interested in as

The code below will work with any packages of interest. In fact, you can set pkgs to all R packages in CRAN, it just takes a while.

Package descriptions

In R, there is a handy function called available.packages() that returns a matrix of details corresponding to packages currently available at one or more repositories. Unfortunately, the format isn’t initially amenable to manipulation. For example, consider the readr package

changed the matrix to a data frame/tibble, which made selecting easier

Looking at the read_desc, we see that it has a minimum R version

readr_desc$Depends
## [1] "R (>= 3.0.2)"

but due to the format, it would be difficult to compare to R versions. Also, the list of imports

readr_desc$Imports
## [1] "Rcpp (>= 0.12.0.5), tibble, hms, R6"

has a similar problem. For example, with the data in this format, it would be difficult to select packages that depend on tibble.

Tidy package descriptions

We currently have four columns

Imports, Depends, Suggests, Enhances

each entry in these columns contains multiple packages, with possible version numbers. To tidy the data set I’m going to create four new columns:

depend_type: one of Imports, Depends, Suggests, Enhances and LinkingTo

depend_package: the package name

depend_version: the package version

depend_condition: something like equal to, less than or greater than

The hard work is done by the function clean_dependencies(), which is at the end of the blog post. It essentially just does a bit of string manipulation to separate out the columns. The function works per package, so we iterate over packages using map_df()

and we can see minimum R version the package authors have indicated for their package. However, this isn’t the minimum version required. Each package imports a number of other packages, e.g. the readr imports 4 packages

The largest difference in R versions is for readr (which feeds into the tidyverse). readr claims to only need R version 3.0.2 but a bit more investigation shows that readr depends on the tibble package which is version 3.1.0. Although, it is worth noting that 3.1.0 is fairly old!

Take away lessons

The takeaway message is that dependencies matter. A single change affects everything in the package dependency tree. The other lesson is that the tidyverse team have been very careful about there dependencies. In fact, all of their packages are checked on R 3.1, 3.2, ..., devel

Simplifications: skipping package versions

In this analysis, we've completely ignored version numbers and always assumed we need the latest version of a package. This clearly isn't correct. So to do this analysis properly, we would need the historical DESCRIPTION files for packages and use that to determine versions.

Thanks to Jim Hester who spotted an error in a previous version of this post.

2 thoughts on “What R version do you really need for a package?”

Last thursday I was giving a tidyverse training for people from a large company. It turned out that they have R 3.2.0 installed and they cannot update it without thousands of permissions from hundreds of departaments.

I asked them to do it anyway, but now, thanks to above post, I know that their 3.2.0 version was sufficient for material I use in my course, so they didn’t have to update their software. If you only post this a week earlier they would save time and paper 😉