Plotting grouped data vs time with error bars in R

This is my first blog since joining R-bloggers. I’m quite excited to be part of this group and apologize if I bore any experienced R users with my basic blogs for learning R or offend programmers with my inefficient, sloppy coding. Hopefully writing for this most excellent community will help improve my R skills while helping other novice R users.

I have a dataset from my dissertation research were I repeatedly counted salamanders from some plots and removed salamanders from other plots. I was interested in plotting the captures over time (sensu Hairston 1987). As with all statistics and graphs in R, there are a variety of ways to create the same or similar output. One challenge I always have is dealing with dates. Another challenge is plotting error bars on plots. Now I had the challenge of plotting the average captures per night or month ± SE for two groups (reference and depletion plots – 5 of each on each night) vs. time. This is the type of thing that could be done in 5 minutes on graph paper by hand but took me a while in R and I’m still tweaking various plots. Below I explore a variety of ways to handle and plot this data. I hope it’s helpful for others. Chime in with comments if you have any suggestions or similar experiences.

You’ll notice that when importing a text file created in excel with the default date format, R treats the date variable as a Factor within the data frame. We need to convert it to a date form that R can recognize. Two such built-in functions are as.Date and as.POSIXct. The latter is a more common format and the one I choose to use (both are very similar but not fully interchangeable). To get the data in the POSIXct format in this case I use the strptime function as seen below. I also create a couple new columns of day, month, and year in case they become useful for aggregating or summarizing the data later.

As you can see the date is now in the internal R date form of POSIXct (YYYY-MM-DD).

Now I use a custom function to summarize each night of counts and removals. I forgot offhand how to call to a custom function stored elsewhere to I lazily pasted it in my script. I found this nice little function online but I apologize to the author because I don’t remember were I found it.

As you can see this is not attractive but it does show that I generally caught more salamander in the reference (red line) plots. This makes it apparent that a line graph is probably a bit messy for this data and might even be a bit misleading because data was not collected continuously or at even intervals (weather and season dependent collection). So, I tried a plot of just the points using the scatterplot function from the “car” package.

> ### package car scatterplot by groups ###

> library(car)

>

> # Plot

> scatterplot(count ~ date + trt, data = gCount,

+ smooth = FALSE, grid = FALSE, reg.line = FALSE,

+ xlab=”Date”, ylab=”Mean number of salamanders per night”)

>

This was nice because it was very simple to code. It includes points from every night but I would still like to summarize it more. Before I get to that, I would like to try having breaks between the years. The lattice package should be useful for this.

> library(lattice)

>

> # Add year, month, and day to dataframe

> chardate <- as.character(gCount$date)

> splitdate <- strsplit(chardate, split = “-”)

> gCount$year <- as.numeric(unlist(lapply(splitdate, “[“, 1)))

> gCount$month <- as.numeric(unlist(lapply(splitdate, “[“, 2)))

> gCount$day <- as.numeric(unlist(lapply(splitdate, “[“, 3)))

>

>

> # Plot

> xyplot(count ~ trt + date | year,

+ data = gCount,

+ ylab=”Daily salamander captures”, xlab=”date”,

+ pch = seq(1:ntrt),

+ scales=list(x=list(alternating=c(1, 1, 1))),

+ between=list(y=1),

+ par.strip.text=list(cex=0.7),

+ par.settings=list(axis.text=list(cex=0.7)))

>

Obviously there is a problem with this. I am not getting proper overlaying of the two treatments. I tried adjusting the equation (e.g. count ~ month | year*trt), but nothing was that enticing and I decided to go back to other plotting functions. The lattice package is great for trellis plots and visualizing mixed effects models.

I now decided to summarize the data by month rather than by day and add standard error bars. This goes back to using the base plot function.

Now that’s a much better visualization of the data and that’s the whole goal of a figure for publication. The only thing I might change would be I might plot by year with the labels of Month-Year (format = %b $Y). I might add a legend but with only two treatments I might just include the info in the figure description.

Although that is probably going to be my final product for my current purposes, I wanted to explore a few other graphing options for visualizing this data. The first is to use box plots. I use the add = TRUE option to add a second group after subsetting the data.

> ### Boxplot ###

> # Ref: http://personality-project.org/r/r.plottingdates.html

>

> # as.POSIXlt(date)$mon #gives the months in numeric order mod 12 with January = 0 and December = 11

Clearly this is a mess and not useful. But you can imagine that with some work and summarizing by month or season it could be a useful and efficient way to present the data. Next I tried the popular package ggplot2.

+ theme_bw() + # make the theme black-and-white rather than grey (do this before font changes, or it overrides them)

+

+ opts(legend.position=c(.2, .9), # Position legend inside This must go after theme_bw

+ panel.grid.major = theme_blank(), # switch off major gridlines

+ panel.grid.minor = theme_blank(), # switch off minor gridlines

+ legend.title = theme_blank(), # switch off the legend title

+ legend.key = theme_blank()) # switch off the rectangle around symbols in the legend

>

This plot could work with some fine tuning, especially with the legend(s) but you get the idea. It wasn’t as easy for me as the plot function but ggplot is quite versatile and probably a good package to have in your back pocket for complicated graphing.