analyze the behavioral risk factor surveillance system (brfss) with r and monetdb

experimental. the behavioral risk factor surveillance system (brfss) aggregates behavioral health data from 400,000 adults via telephone every year. it’s um *clears throat* the largest telephone survey in the world and it’s gotta lotta uses, here’s a list neato. state health departments perform the actual data collection (according to a nationally-standardized protocol and a core set of questions), then forward all responses to the centers for disease control and prevention (cdc) office of surveillance, epidemiology, and laboratory services (osels) where the nationwide, annual data set gets constructed. independent administration by each state allows them to tack on their own questions that other states might not care about. that way, florida could exempt itself from all the risky frostbite behavior questions. in addition to providing the most comprehensive behavioral health data set in the united states, brfss also eeks out my worst acronym in the federal government award – onchit a close second.

annual brfss data sets have grown rapidly over the past half-decade: the 1984 data set contained only 12,258 respondents from 15 states, all states were participating by 1994, and the 2011 file has surpassed half a million interviews. if you’re examining trends over time, do your homework and review the brfss technical documents for the years you’re looking at (plus any years in between). what might you find? well for starters, the cdc switched to sampling cellphones in their 2011 methodology.

unlike many u.s. government surveys, brfss is not conducted for each resident at a sampled household (phone number). only one respondent per phone number gets interviewed. did i miss anything? well if your next question is frequently asked, you’re in luck.

all brfss files are available in sas transport format so if you’re sittin’ pretty on 16 gb of ram, you could potentially read.xport a single year and create a taylor-series survey object using the survey package. cool. but hear me out: the download and importation script builds an ultra-fast monet database (click here for speed tests, installation instructions) on your local hard drive. after that, these scripts are shovel-ready. consider importing all brfss files my way – let it run overnight – and during your actual analyses, code will run a lot faster. the brfss generalizes to the u.s. adult (18+) (non-institutionalized) population, but if you don’t have a phone, you’re probably out of scope. this new github repository contains four scripts:

1984 – 2011 download all microdata.R

create the batch (.bat) file needed to initiate the monet database in the future

if you’re just scroungin’ around for a few statistics, the cdc’s web-enabled analysis tool (weat) might be all your heart desires. in fact, on slides seven, eight, nine of my online query tools video, i demonstrate how to use this table creator. weat’s more advanced than most web-based survey analysis – you can run a regression. but only seven (of eighteen) years can currently be queried online.

since data types in sql are not as plentiful as they are in the r language, the definition of a monet database-backed complex design object requires a cutoff be specified between the categorical variables and the linear ones. that cut point gets defined using the check.factors argument in the sqlsurvey() and sqlrepsurvey() function calls. check.factors defaults to ten, but can be raised or lowered as needed. here’s how it works:

confidential to sas, spss, stata, sudaan users: when statistical languages are plotted on cartesian coordinates, what-you-paid-for vs. what-you-get are best represented as y = 1/x. time to transition to r. 😀