Using R — Installing Packages

One of the reasons to use R for analysis and visualization is the rich ecosystem of ‘packages’ contributed by others. In most cases, just as with smartphones, “There’s a package for that.” If you want to be efficient you need to embrace other people’s work and in the case of R that means installing packages. This post walks you through the basics of package installation and use and gives some tips on workarounds when a package won’t install.

Simple example

For the impatient lets start off with a simple example. In this example (on Ubuntu Linux) we’ll run R as the superuser so that packages will be installed in the default location. We will install the “geonames” package and then show off the new functionality we just added.

If you don’t run R as superuser you won’t have permission to write packages into the site-library and you will be prompted to create a personal library. You can specify the library, repository and a few other options by passing parameters to the install.packages() method. Use ?install.packages to learn more.

So what extra functionality does this new “geonames” package bring? You’ll have to do a little reading to figure out the details but for now just paste these lines into your R session:

Who’d-a-thunk that R could so easily be turned into a real time weather system?

CRAN

In order to install and make use of packages you first have to find them. Luckily, most (but not all) R packages are organized and available from CRAN — the Comprehensive R Archive Network. Just click on the Packages link to see the full list of contributed packages. Packages are listed alphabetically with a short description. Unfortunately, there is no rating system but you can get a quick sense of quality by clicking on a package link and looking at the “Published” date and especially any “Reverse dependencies” listed at the the bottom of a package. Reading the documentation and looking at the number of releases in the “Old sources” is also very helpful.

CRAN also maintains a set of Task Views that identify all the packages associated with a particular task. The maintainers of these views do a generally excellent job of staying on top of their area of interest and giving a detailed summary of which packages do what. If one of the task views is a perfect match you can have R install every package from that view using the “ctv” package. Yes, “ctv” is a package to automate package installation. See the section below on “Installing older versions” if you have trouble installing “ctv”.

Installing packages

The basics of package installation are given in chapter 6 of R Installation and Administration. There are two ways to do a command line installation of packages: from the R command line and from the shell command line.

> install.packages() # at the R prompt

Within R you can use install.packages() as demonstrated in the example above. This will always attempt to install the latest version of packages it knows about.

$ R CMD INSTALL # at the shell prompt

You can also invoke R from the command line. This is useful for some packages when install.packages() doesn’t work or for packages that are not part of CRAN. More information is available with R CMD INSTALL --help. To install packages this way you must first download the package source to your local machine. Here is a quick demonstration:

Installing older versions

If you have total control over your system and always keep it at the bleeding edge then you will have no problem installing the latest and greatest versions of R packages. However, if your version of R is older (Perhaps you are running R on a webserver with CentOS?) then some of the more recent releases of packages will not work and install.packages() will generate messages like:

Warning message:
In install.packages(c("sp")) : package ‘sp’ is not available

This is when you have to poke around in the “Old sources” link on the CRAN page for that package and use trial-and-error to find an older version of the package that will work with your version of R. You should start by determining what version of R you have:

$ R --version
R version 2.8.1 (2008-12-22)

Given that our version of R was released at the end of 2008, any version of the “sp” package released in 2008 should definitely work. At least some of the 2009 releases should also work. Perusing the sp archive, we might try installing version 0.9-37, the last of the 0.9-3x series which was released in May of 2009:

Over time, your package library will contain more and more packages. Or perhaps system administrators or other users have also installed packages. It’s good to know what’s installed and at what version. This is where the location of the package library comes in handy. If you poke around you will find out that most packages come with a DESCRIPTION file that contains that information. To see all the package versions on our Ubuntu system we could just type:

Of course there is also an ‘R’ way of getting this information. All of the fields in DESCRIPTION files are accessible through the installed.packages() command (note the spelling) which returns a matrix of information with packages as row names and fields as column names. The following example shows how to access this information programmatically from within R:

Special Cases

ncdf

The ncdf package requires that NetCDF — including the development libraries — first be installed on your system. Unfortunately, the NetCDF libraries and include files are not installed in a uniform location across Unix systems. This is a case where we need to pass configuration arguments to R CMD INSTALL. Here is what ended up working on Ubuntu 10.04 LTS: