Sunday, April 27, 2014

Project Tycho, Correlation between states

In this fourth post on Measles data I want to have a look at correlation between states. As described before, the data is from Project Tycho, which contains data from all weekly notifiable
disease reports for the United States dating back to 1888. These data are freely available to anybody interested.

New variable

In previous posts it became clear there is in general a yearly cycle. However, the minimum in this cycle is in summer. This means for yearly summary it might be best not to use calender years, but rather something which breaks during summer. My choice is week 37.with(r6[r6$WEEK>30 & r6$WEEK<45,], aggregate(incidence,by=list(WEEK=WEEK),mean)) WEEK x1 31 0.0167574402 32 0.0137761823 33 0.0113133914 34 0.0087832595 35 0.0073486036 36 0.0068439307 37 0.0065284678 38 0.0070781719 39 0.00865254610 40 0.01678420511 41 0.01339237512 42 0.01615880513 43 0.01839163214 44 0.021788221r6$cycle <- r6$YEAR + (r6$WEEK>37)

Plot

States over time

Since not all states have complete data, it was decided to use state-year combinations with at least 40 observations (weeks). As can be seen there is some correlation between states, especially in 1945. If anything, correlation gets weaker past 1955.library(ggplot2)ggplot(with(r6,aggregate(incidence, list(cycle=cycle, State=State), function(x) if(length(x)>40) sum(x) else NA)), aes(cycle, x,group=State)) + geom_line(size=.1) + ylab('Incidence registered Measles Cases Per Year') + theme(text=element_text(family='Arial')) + scale_y_log10()

Between states

I have seen too many examples of people rebuilding maps based on traveling times or distances. Now I want to do the same. Proper (euclidean) distance of the states would make the variables the year/week combinations, which gives all kind of scaling issues. What I did is to use correlation and transform that into something distance like. ftime is just a helper variable, so I am sure the reshape works correctly.r6$ftime <- interaction(r6$YEAR,r6$WEEK)xm <- reshape(r6, v.names='incidence', idvar='ftime', timevar='State', direction='wide', drop=c('abb','Cases','pop'))

MDS is most nice to look at. I will leave comparisons to the US map to those who actually know all these state's relative locations.library(MASS)mdsx <- isoMDS(dd)par(mai=rep(0,4))plot(mdsx$points, type = "n", axes=FALSE, xlim=c(-1,1), ylim=c(-1,1.1))text(mdsx$points, labels = dimnames(cc)[[1]])

No comments:

Post a Comment

Wiekvoet

Wiekvoet is about R, JAGS, STAN, and any data I have interest in. Topics range from sensometrics, statistics, chemometrics and biostatistics. For comments or suggestions please email me at wiekvoet at xs4all dot nl.