RNHANES provides an easy way to download and analyze data from NHANES, the National Health and Nutrition Examination Survey conducted by the Centers for Disease Control.

The included analysis tools focus on the laboratory data, but the package can still be used to search for and download other types of data in NHANES.

library(RNHANES)

NHANES

NHANES is a national survey that covers demographics, health, nutrition, and environmental chemical exposures. NHANES as a modern program started in 1999, with the survey being administered in two-year cycles.

The released data is split into demographic, dietary, examination, and laboratory data for each survey cycle. RNHANES is designed primarily to work with demographic and laboratory data.

There is one demographic data file for each survey cycle. There is a collection of laboratory data for each cycle, split into files related to different analyte groups.

Searching

First, we need to figure out what data we want to analyze and where we can find it in NHANES.

To find the data you're interested in, you can search either by file or by variable. First, use RNHANES to download a list of NHANES files and the comprehensive variable list. This data isn't bundled with the package because it is sometimes updated to fix errors or add new data. Downloading the lists lets you get the most recent versions.

files <- nhanes_data_files()
variables <- nhanes_variables()

Use nhanes_search to search within file and variable lists. You can restrict the searches by specifying conditions on any of the columns in the list.

Downloading

Once you've identified the data you need for your analysis, the next step is to download the appropriate data files from NHANES through the nhanes_load_data function. This function has a lot of options, so let's start simple and go through them.

Downloading one file

The most basic way to download data is to specify the name and cycle year of one data file.

nhanes_load_data("EPH_E", "2007-2008")

You can leave off the trailing suffix (e.g. the "_E" in "EPH_E") on the file name and it will be filled in for you.

nhanes_load_data("EPH", "2007-2008")

To save time, nhanes_load_data downloads the files and saves them so they don't need to be redownloaded every time you run your script. By default, it saves the files to a temporary directory. You can optionally set where you want the files to be downloaded to.

nhanes_load_data("EPH", "2007-2008", cache ="./nhanes_data")

So far, we've been downloading the data without its accompanying demographic information, which contains demographic information like age, gender, etc. as well as the survey weights. This information is available in a separate file for each cycle. RNHANES can automatically download the correct demography file and merge it with your data.

Some ordinal fields in NHANES are coded as numeric factors. RNHANES can decode these fields, replacing the factors with their textual description.

Downloading multiple files

You can also download multiple files from NHANES at once to simplify your code. You can do this in several ways; first, by specifying a vector of file names and cycle years. The result will be a list containing a data frame for each requested file.

This looks a little awkward with only two analytes, but becomes more useful if you have a lot of analytes you want to analyze.

nhanes_quantile transparently handles data that was loaded from multiple files and cycle years. The variable triclosan is a list of data frames; let's compute quantiles for triclosan in each one.

In this case, you have to supply a data frame that specifies the columns to look at for each file name and cycle year.

This is a good example because for the 2003-2004 cycle, the triclosan column appears to be misnamed: it is "URDTRS", when the naming convention in the rest of the file is to have column names start with "URX".