Math, programming, ecology, and more

Category Archives: R

We have a growing interest in using our favorite tools (R and Mathematica) to build web interfaces to interactively explore and visualize data. Our last 5 posts have involved interactive tools, namely Mathematica’s computable document format and R’s new Shiny package.

There is a new kid on the block for interactive visualization tools in R, healthvis. I have not yet taken healthvis for a spin, but the survival example in the introductory blog post inspired me to create a Shiny app to visualize the results of a survival analysis conducted for my dissertation.

The people behind the wonderful RStudio, which I gushed about in a previous post, have developed a new package, Shiny, that makes it easy to develop interactive web applications with R. Shiny is not the first package to facilitate building web apps with R (see here for comparison of Shiny and gWidgetsWWW2.rapache), but it is arguably the easiest to learn. Shiny has an enthusiastic and engaged user community and the people at RStudio are very responsive to questions posted to the mailing list.

I write sloppy R scripts. It is a byproduct of working with a high-level language that allows you to quickly write functional code on the fly (see this post for a nice description of the problem in Python code) and the result of my limited formal training in computer programming. The lack of formal training makes scientists self-conscience of the bits of code that they cobble together to solve research problems, but a professional software engineer reassuringly points outthat most software runs on messy code. Although sharing sloppy code is better for research progress than not sharing any code at all, you can make the code sharing experience better by picking up some good programming habits. Even if you don’t intend to share your code, which is arguably bad for science, adopting good programming habits should improve your workflow by making old bits of code more easily understandable and re-usable.

The Department of Biostatistics at Vanderbilt University provides a nice list of programming tips for statisticians. For R-specific recommendations, see Google’s R Style Guide. Hadley Wickham provides his own recommendations, which are generally—but not always—aligned with Google’s R Style Guide. I previously thought that the best way to improve your code was by adding comments, but I hadn’t thought about how copious comments may actually make your code less readable. I intend to adopt nearly all of the recommendations in Google’s R Style Guide. Well, I will immediately adopt the style guidelines for any code that makes a public appearance on this blog, but changes to my usual programming habits will likely be more gradual.

RStudio™ is a free and open source integrated development environment (IDE) for R. You can run it on your desktop (Windows, Mac, or Linux) or even over the web using RStudio Server.

For a glimpse of the beauty of RStudio, take 2 minutes to watch the screencast found on the RStudiohome page. If you watch the screencast, you will also get an overview of the functionality of RStudio.

I have resisted learning the popular R graphics package, ggplot2. I dismissed ggplot2 as primarily useful for exploratory graphics and rationalized my avoidance of ggplot2 by assuming that it would require just as many (or more) lines of code as the R base package to whip the default plots into publication-quality figures. The few times that I poked at ggplot2 I quickly retreated to the cozy confines of the base package (see here, here, and here for tips on creating figures with base graphics).

Well, the tipping point for me came from an unlikely source—a web interface for ggplot2. Watch the demo video below to get a taste of the power of ggplot2 through the web.

Ecologists commonly collect data representing counts of organisms. Generalized linear models (GLMs) provide a powerful tool for analyzing count data. [As mentioned previously, you should generally not transform your data to fit a linear model and, particularly, do not log-transform count data.] The starting point for count data is a GLM with Poisson-distributed errors, but not all count data meet the assumptions of the Poisson distribution. Thus, we need to test if the variance>mean or if the number of zeros is greater than expected. Below, we will walk through the basic steps to determine which GLM to use to analyze your data.

The next step is not necessary, but makes the subsequent code more readable.

1

2

trt=levels(dataset$Trt)

sex=levels(dataset$Sex)

The following example is silly because you would rarely want to split your data as shown in this example, but (hopefully) it clearly illustrates the general idea of using paste( ) to create dynamic file names when writing files.

As a general rule, you should not transform your data to try to fit a linear model. But proportions can be tricky. If the proportion data do not arise from a binomial process (e.g., proportion of a leaf consumed by a caterpillar), then transformation is still the best option. In an excellent paper, David Warton* and Francis Hui propose that the conventional transformation for proportion data (i.e., arcsine square root) is asinine, particularly if you have binomial data, and that the logit transformation is preferable for non-binomial proportion data.

The objective of this post is simply to demonstrate how to transform the axes of plots in R, but the context of the example is the logit transformation of non-binomial proportion data. First, we need to generate some data.

In a previous post, I showed how to keep text and symbols at the same size across figures that have different numbers of panels. The figures in that post were ugly because they used the default panel spacing associated with the mfrow argument of the par( ) function. Below I will walk through how to adjust the spacing of the panels when using mfrow.

In R,there are a couple of packages that allow you to create multi-panel figures (see examples here and here), but, of course, you can also make multi-panel figures in the base package*. Below I provide a simple example for creating a multi-panel figure in the R base package with the focus on making the text and symbols the same size in all of your figures, which is a desirable trait for a set of figures that will appear in the same manuscript.