The census most people think about is the regular decennial census. A census attempts to record information for every member of the population and has been conducted in various forms by governments going back thousands of years. The number, type, and structure of the questions asked in the US decennial census changes with each administration, making comparisons between years difficult for some of the more nuanced questions around race, ethnicity, occupation, and household status as perceptions, attitudes, and in some cases laws have changed. In the most recent census (2010) no questions about income or employment status were asked, limiting the data available to age, gender, race, ethnicity, household composition, and housing unit status.

A survey sampling, such as the ACS, is a sampling of the population, usually weighted in such a way as to predict the actual value measured in the whole population. Assuming everyone responded to the questions of the census, the census would be the most accurate measure of the population, but a response rate of 100% is impossible and in the pool of non-respondents you’ll tend to find a concentration of particular demographics who are over represented (like paranoid Michele Bachmann types). With a high number of refusals, a regular sampling of the population can be more accurate than the census as groups that would be otherwise missed in a census can be captured and estimated in a sampling.

The ACS also asks a far more detailed set of questions than are found on the decennial census. Besides household, family, and individual income, which are the bread and butter for any sales operation, there are detailed questions on family composition, education, occupation, and much more detailed gender and race breakdowns, as well as data on commuting patterns, immigration status, geographic mobility, and various compositions of these various demographic characteristics.

In both cases, the Census Bureau reports summary data at various levels of aggregation. The basic unit is the individual, but for obvious reasons, this data isn’t made public for 72 years after the data is collected. The Census uses a hierarchy of reporting that starts at the Census Block and moves up to the nation as a whole. The common currency of Census data is the Census Tract.

Tracts, blocks, and block groups generally follow existing state and local boundaries, while the Census also uses areas such as Census Designated Places and Metropolitan Statistical Areas that don’t correlate exactly to local government boundaries, but represent a relatively significant concentration of the population.

The Census also uses ZIP Code Tabulation Areas (ZCTAs). These are often confused for actual USPS ZIP code boundaries, but there is a significant difference between them. The US Postal Service assigns a delivery address to a delivery route, designated by the ZIP code. As such, the address is a point with an associated attribute (the ZIP code). If you aggregate all the delivery addresses with the same ZIP code, you come up with the area covered by that particular ZIP code. Where there are no delivery addresses, the boundary between one ZIP code and another can’t be calculated and must be inferred. That is generally what commercial ZIP code data vendors do. They estimate the boundaries between ZIP codes where there are no delivery addresses (natural land and water features, expanse of uninhabited land, etc.).

A ZCTA, by contrast, takes the most prevalent ZIP code in the census block and assigns that to the whole block. It then aggregates the blocks up with blocks of the same ZIP code to create the ZCTA. In this way, the basic unit of the ZCTA is really the census block and the boundaries will follow those of the block. It’s possible a ZIP code could not be a majority in any blocks and therefore not have a ZCTA but still exist as a valid ZIP code. Likewise, with constant ZIP code changes, a ZCTA could contain an invalid ZIP code.

Because accuracy in a survey sampling grows as the number of samples increase, the Census Bureau releases three types of ACS reports, a 1-year, a 3-year, and a 5-year estimates that aggregate samples taken over the multiple years and combines them together. The trade-offs between them are essentially currency for accuracy and completeness. The 1-year is the most current, but is the least accurate, comprised of a relatively small number of respondents. The 5-year is the most accurate but the least current. The 3-year balances the two.

As in all things, you need to be clear about your goals and the level of accuracy you need to accomplish them. For most tasks, diving into the full summary data for the 2010 Census is unnecessary, as headline demographics for each Census Tract are released in the Demographic Profile (DP1) table. You can find most of what you need in a simple shapefile format. The summary files are arcane and difficult to process, but unlock some of the more complex demographic information the DP1 table leaves out. But if you don’t need that complexity, then this is what you’ve been looking for. There is no equivalent in the ACS data that I’ve found.

So that was a lot of technical information I’ve thrown at you all at once, but I hope it’s been helpful.

Here’s a picture of the beach in Monterey to help you recover your sanity:

**Note: The preceding post was edited to properly render “zip” as “ZIP”, since it’s an acronym for Zone Improvement Plan. I also changed the capitalization style to no longer capitalize “census block.”