Other sites

Looking at Measles Data in Project Tycho

Project Tycho includes data from all weekly notifiable disease reports for the United States dating back to 1888. These data are freely available to anybody interested. I wanted to play around with the data a bit, so I registered.

Measles

Measles are in level 2 data. These are standardized data for immediate use and include a large number of diseases, locations, and years. These data are not complete because standardization is ongoing. The data are retrieved as measles cases and while I know I should convert to cases per 100 000, I have not done so here.The data come in wide format, so the first step is conversion to long format. The Northern Mariana Islands variable was created as logical, so I removed it. In addition, data from before 1927 seemed very sparse, so those are removed too.r1 <- read.csv(‘MEASLES_Cases_1909-1982_20140323140631.csv’, na.strings=’-‘, skip=2)r1 <- subset(r1,,-NORTHERN.MARIANA.ISLANDS)r2 <- reshape(r1, varying=names(r1)[-c(1,2)], v.names=’Cases’, idvar=c(‘YEAR’ , ‘WEEK’), times=names(r1)[-c(1,2)], timevar=’State’, direction=’long’)r2$State=factor(r2$State)r3 <- r2[r2$YEAR>1927,]

Occurrence within a year by week

Winter and spring seems to be the periods in which most cases occur. The curve seems quite smooth, with a few small fluctuations. The transfer between week 52 and week 1 is a bit steep, which may be because I removed week 53 (only present in part of the years).

A more detailed look

Trying to understand why the week plot was not smooth, I made that plot with year facets. This revealed an interesting number of zeros, which are an artefact of processing method (remember, sum(c(NA,NA),na.rm=TRUE)=0). I do not know if the data distinguishes between 0 and ‘-‘. There are 872 occurrences of 0 which suggests 0 is used. On the other hand, week 6 and 9 in 1980 in Arkansas each have one case, the other weeks from 1 to 22 are ‘-‘, which suggests 0 is not used. My feeling for that time part is that registration became lax after measles was under control and getting reliable data from the underlying documentation is a laborious task.