March 26, 2011

I’m currently writing some lecture notes on R and I used the phrase “a R package” without thinking. Since the word following the article “a” was a consonant, I automatically went for “a” instead of “an”. The problem is that “R” sounds likes a vowel, so “a R package” grates on the listener. The correctrule is to use “an” when the word following the article “sounds like a vowel”.

March 23, 2011

In some work that I am currently involved in, we have to decide which GUI engine we should use. As an obvious starter, we decided to have a look at what other people are using in their packages. While cran helpfully displays all the R packages that are available, it doesn’t (I don’t think), give a nice summary of the package dependencies. After clicking on a few dozen packages and examining their dependencies, I decided that a quick script was in order.

General idea

For each package, scrap the associated web-page and retrieve its dependencies.

For example, ADaCGH has a large number of packages under the “DEPENDS” section.

Pre-processing

To make life easier, I made a few simplifications to the data:

any dependencies on R, MASS, stats, methods and utils were removed when plotting;

I removed any bioconductor and omega hat packages;

version numbers in the DEPENDS section were ignored.

It should be stressed that I’m only picking up what is listed in the DEPENDS section. For example, suppose a package depends on both”ggplot2″ and “plyr”. Since “ggplot2” depends on “plyr” the package author may only list “ggplot2”

Results

The top six packages based on the DEPENDS section are:

lattice – 165 times

survival – 107

mvtnorm – 103

tcltk – 76

graphics – 76

grid – 60

You could argue that I should remove “graphics” by the same arbitrary criteria I used when removing “MASS”. The total number of packages that are referred to in the DEPENDS section is just over 782 (out of a possible 3000 packages). The following graph plots the package name against the number of times it appears in the DEPENDS section of another package. There is a clear exponential decay highlighting a few key packages.

In fact the top 40 packages, account for 50% of all dependencies, and that’s after the dependencies on R, utils, methods,.. were removed.

I also constructed a graphical network using cytoscape. However, it’s quite large (~2MB). You can download the network separately. To construct this network, I only used packages that had three or more dependencies. There were a dozen or so smaller graphs that I pruned.

R Details

To scape the web-pages I used regular expressions. Yes, I know you shouldn’t use regular expressions for parsing html, and should use a proper html parser, but

the web-pages were all well formed since they were generated from the package DESCRIPTION file

November 6, 2010

Part of the reason R has become so popular is the vast array of packages available at the cran and bioconductor repositories. In the last few years, the number of packages has grown exponentially!

This is a short post giving steps on how to actually install R packages. Let’s suppose you want to install the ggplot2 package. Well nothing could be easier. We just fire up an R shell and type:
> install.packages("ggplot2")

In theory the package should just install, however:

if you are using Linux and don’t have root access, this command may not work. If you use Ubuntu then this isn’t a problem.

if you have limited space in /home/ then you may want to install the package in the non-default directory.

you will be asked to select your local mirror, i.e. which server should you use to download the package.

if you have a proxy, then you may run into problems.

Installing packages in a different directory

First, you need to designate a directory where you will store the downloaded packages. On my machine, I use the directory /data/Rpackages/ After creating a package directory, to install a package we use the command:
> install.packages("ggplot2", lib="/data/Rpackages/")
> library(ggplot2, lib.loc="/data/Rpackages/")

It’s a bit of a pain having to type /data/Rpackages/ all the time. To avoid this burden, we create a file .Renviron in our home area, and add the line R_LIBS=/data/Rpackages/ to it. This means that whenever you start R, the directory /data/Rpackages/ is added to the list of places to look for R packages and so:

> install.packages("ggplot2")
> library(ggplot2)

just works!

Setting the repository

Every time you install a R package, you are asked which repository R should use. To set the repository and avoid having to specify this at every package install, simply: