EXTENDING THE ACS: DATA-DRIVEN OUTREACH

by STAR YING, COMMERCE DATA SERVICE

February 2016

As part of the Commerce Data Usability Project, United States Census Bureau in collaboration with the Commerce Data Service has created a tutorial that processes bulk five-year releases of the American Community Survey, merges to an external dataset, and maps it as a way of refining outreach. If you have question, feel free to reach out to the Commerce Data Service at DataUsability@doc.gov.

introduction

Every month, hundreds of thousands of households fill out the American Community Survey. Among the largest surveys conducted by the U.S. Census Bureau is the American Community Survey (ACS), an ambitious program to better understand the American experience. Covering a sample of over 3 million respondents each year, the aggregate summary data is published with details of hundreds of attributes, such as language spoken to average commute time to public assistance usage among others. The data is provided in standard geographic levels of aggregation (e.g. Census tracts, counties and states), which enables the ACS to be joined to external data sources that are similarly aggregated. Given its versatility and robustness, the data is a mainstay of social research and enables organizations to be more data-driven, whether they are non-profits, corporations, or government agencies.

case study: non-profit

Imagine a nonprofit that has a national mandate to provide aid and services, specifically to financially distressed neighborhoods with girls under the age of 5, who live in households that collect SNAP benefits, and have remained in the same home over the past year. As it turns out, ACS five-year estimates would allow for an analyst at a non-profit to focus on geographic units that are far smaller than a county, such as Census tracts. And as Census tracts are a common unit of analysis, additional datasets can be merged to improve the diversity of data and better characterize a local neighborhood.

For this example, we used the Consumer Complaint Database produced by the Consumer Financial Protection Bureau and transformed the counts of complaints from each zipcode and into to Census tract approximations following the Census tract to ZCTA relationship file. In the interactive graph below, we show the distribution of the number of financial complaints limited to a maximum of 60 complaints and can see that few Census tracts receive more than ten financial complaints. Thus, we can limit our focus to those areas.

getting the data

So how do we get ACS data? The American Community Survey releases datasets on one, three and five-year aggregates at varying levels of geographic specificity. With 20,000+ attributes collected on the American experience, it can be challenging to explore and search the attributes in the ACS data repository. For a cursory glance at the available data, there is the US Census Bureau's American Factfinder. For dynamic access, the ACS five-year release is available through of the US Census Bureau's APIs. Here, we show how to access the ACS five-year estimates using a third method that is easier for bulk analysis of Census tract-level data.Due to the size of the ACS, the US Census Bureau breaks the ACS five-year release into five parts:

You can see from the information above that a bulk download of ACS five-year aggregates separates the metadata from the actual estimates. The process to get any specific estimate for the nation at the tract level is roughly:

Extract the variable name from Variable Inventory

Find the table number containing variable name from the SAS program

Pull the actual estimate from the correct ACS estimates file

Match the relevant log record number to find the correct geography from the Geography Files

Rinse, repeat for each state

In the Code section below, we include Python snippets that process any specified list of variables from the Variable Inventory. We use the output to show how to find areas of particular interest.

visualizing neighborhoods

So where are they located? Back to our non-profit example. Recall the measures that we are interested in:

girls under five,

who lived in the same house one year ago,

lived in a household that collected SNAP benefits over the past year,

and reside in a tract with 10 or more financial complaints,

initial effort

How does an analyst determine where a non-profit should focus its efforts? Below we highlight Census tracts in state of Alabama that are in the top 30th percentile for each ACS estimate or have more than ten financial complaints. Each pane shows a different measure of interest, highlighting seemingly different areas for outreach.

unified view

Now, how can we highlight only those tracts that are highest ranked for across all of the criteria? By averaging the rank of each tract across all measures, we can map them again to find candidates for outreach. This data-driven methodology is simple yet holds the potential of large returns. Below we show that map nationally and list the top ten Census tracts for any view.

getting started

In this tutorial, we will illustrate how to pull ACS bulk data, convert zipcode level data to Census tract, and output to GeoJSON using the Python programming language. By the end of the notebook you will have data processed for the visualizations you see above.

To get started quickly, the code for this tutorial can be found at the following Github repo.

download data

Before the executing the code, we have to acquire the data. Here are the inputs required for this tutorial:

Processing CFPB Financial Complaints Database

Now process CFPB Fiancial Complaints Database. This involves loading the downladed CSV file, counting at the zipcode level and apportioning it to the Census tract level. We write out a JSON of the distribution of financial complaints to see what the upper limit is.

Next we have a directory of shapefiles that we want to append the additional information from the JSON file to the geography. We mount a virtual drive to save disk space for each archive and write the resulting GeoJSON for each state.

Drawing a Word Cloud for ACS

We use the wordcloud package in Python to generate a wordcloud from the descriptions in ACS Variable Inventory. Using the original image, we color shift the white to another color and then overlay a color mask to get the final image.