Category: Beginners

I was tinkering around in R to see if I could plot better looking heatmaps, when I encountered an issue regarding how specific values are represented in plots with user-specified restricted ranges. When I’m plotting heatmaps, I usually want to restrict the range of the data being plotted in a specific way. In this case, I was plotting classifier accuracy across frequencies and time of Fourier-transformed EEG data, and I want to restrict the lowest value of the heatmap to be chance level (in this case, 50%).

See the gray areas? What’s happening here is that any data points that fall outside of my specified data range (see the limits argument in scale_fill_viridis) are being converted to NAs. Apparently, NA values are plotted as gray by default. Obviously these values are not missing in the actual data; how can we actually ensure that they're plotted? The solution comes from the scales package that comes with ggplot. Whenever you create a plot with specified limits, include the argument oob = squish (oob = out of bounds) in the same line where you set the limits (make sure that the scales package is loaded). Ex:

scale_fill_viridis(name = "",limits = c(0.5,0.75),oob=squish)

What this does is that values that fall below the specified range are represented as the minimum value (color, in this case), and values that fall above the range are represented as the maximum value (which seems to be what Matlab defaults to). The resulting heatmap looks like this:

Plot the means

# clear the environment
rm(list=ls())
# read in Data Set #2
data <- read.csv("RClub_DataSet2_11.3.15.csv", header = TRUE)
# Lets take a look at the data - do we see a linear trend?
# Since we're not plotting the data themselves but rather means, first we need to calculate that, so we can feed it into the plot.
# To do that, we'll use a few functions from the dplyr package. Check out this tutorial: http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
library(dplyr)

Factorial ANOVA

First, get a plot of your means, so you can see what’s going on.

# I'm choosing to plot with instruction type across the x-axis and grouped by age. It would also be fine to do it the other way around.
library(dplyr)
groups <- group_by(data, instructions, age) # this just prepares it for us to calculate everything within each combination of instructions and age
plot.data <- summarise(groups,
mean = mean(score, na.rm=TRUE),
sd = sd(score, na.rm=TRUE),
n = n(),
se=sd/sqrt(n),
ci = qt(0.975,df=n-1)*se)
plot.data # take a peek

# The best way to run this is actually with the lm() command, not aov(). It stands for "linear model". ANOVAs, regressions, t-tests, etc. are all examples of the general linear model, so you can use this one command to do pretty much any of them in R.
# aov() works, and it will generate exactly the same source table for you (the math is all identical), but lm() gives you more useful output.
model <- lm(score ~ instructions*age , data=data)
# When you specify an interaction with *, R automatically assumes you want the main effects as well.
# So "instructions*age" is shorthand for "instructions + age + instructions*age".
anova(model) # to get a source table