Saturday, February 9, 2013

Finding a jump in data

The other day I saw a plot with some data with a clear jump in it. So I wondered if it were possible to determine the position of that jump. Data with a jump is easy enough made:library(ggplot2)n <- 100mydata <- data.frame(x=runif(n))mydata$y <- rnorm(n,0,1) +ifelse(mydata$x>.5,0,1)qplot(x=mydata$x,y=mydata$y)

Now to find where that jump was. After some consideration, the most simple way seemed appropriate. Just take any possible point and examine the corrected sums of squares of the parts left and right of the break.mydata <- mydata[order(mydata$x),]ss <- function(x) sum((x-mean(x))^2)mids <- (mydata$x[-1]+mydata$x[-nrow(mydata)])/2sa <- sapply(mids,function(x) ss(mydata$y[mydata$x<x])+ss(mydata$y[mydata$x>x]))qplot(x=mids,y=sa, ylab='Sum of squares',xlab='Position of break')

That was very simple. But still too complex. I realized I was recreating a tree with only one node. There is a way to get that first node more quickly:library(tree)tree(y~x,data=mydata,)

As above there are several packages devoted to the problem of detecting jumps in data. The one that springs to mind is strucchange for changes in regression. The advantage of this package is that you can enter many explanatory variables, just as you would to the lm function.

Wiekvoet

Wiekvoet is about R, JAGS, STAN, and any data I have interest in. Topics range from sensometrics, statistics, chemometrics and biostatistics. For comments or suggestions please email me at wiekvoet at xs4all dot nl.