State Government

The 2010 census has been rolling out since February with New York state getting the first of its data on March 24 and more data releases during the summer. Yet, almost every day reporters, redistricting specialists and even other demographers ask when data to answer questions such as the following will be released:

When will we know the number of immigrants in New York City and in various neighborhoods throughout the city?

How many Hispanic citizens of voting age live in Washington Heights?

Has the median income in Jackson Heights grown or declined since 2000?

How many veterans from the Iraq and Afghanistan wars live in Staten Island? In the Bronx? On the Upper West Side?

Which recent college graduates are ending up with jobs? How much do they make? How many are living at home with their parents?

Has the number of people working in finance increased?

The answer is simple and surprising: "All the information you need was already released on Dec. 14 last year."

As you may recall, the census form you filled out last spring had just a few basic questions. So the data released on the basis of that can only include: number of people, dwelling owned with or without a mortgage or rented with or without cash rent, relationship to householder, sex, age, and race or races, including principal tribe or group, if Native-American, Asian or Pacific Islander. The other data -- indeed the data that in many ways is the most interesting and heavily used -- comes from the American Community Survey, or ACS, which is actually "the rest of the census" and was released in December.

The New Census

The census has been radically redesigned, and many census users have yet to catch up. Every census from 1940 through 2000 included a "short form" given to everyone to get a full enumeration of the population, as dictated by law. It asked only for very basic demographic information. The Census Bureau also administered a "long form" questionnaire that went to only a sample of persons or households and included a much larger set of social and demographic questions.

In 2000 the short form included only eight questions for the householder and six for every person living in the household. These were the same basic questions that would be asked in 2010 about sex, age, relationship to the householder, race and Hispanic status.

About one in six households received the "long form," which included some 53 questions. In addition to the basic questions in every census form, this survey also contained questions on a whole panoply of characteristics, including housing, moving in the last five years, employment, detailed income sources, military service, disability status, ancestry, place of birth and education.

Until 2010, the long form and short form were distributed in tandem, but then the government split them. Beginning in 2005, the Census Bureau began to collect data for the American Community Survey, which is very similar to the old long form. That survey gets responses from about 2 million households and residents of group quarters (prison, dormitory, institution) every year, and tracks the same sort of data that was produced by the long form sample. The survey takes place all year and interviews, where necessary, are conducted by permanent staff -- not the temporary workers who interview for decennial census.

The Census Bureau releases three sets of data each year from the community survey: the one-year, the three-year and the five-year files. The one-year file is released for areas of at least 65,000 in population, the three-year file for areas of at least 20,000 and the five-year file for all of the areas for which the long form was released (block-groups, tracts and higher). The census releases data only for certain locations with larger populations at one and three-year intervals. This is because, since the ACS is a sample, its reliability depends on sample size.

So on Dec. 14, 2010, the Census Bureau released the first five-year file from the American Community Survey, including all the data that used to be released from the census long form. This totaled 32,000 columns with 675,000 rows of data. These data are comparable to what was compiled from the 2000 Census and allow one to answer many questions at the neighborhood level, where data are needed beyond the few questions in the 2010 census.

Using the Data

There are some differences between the American Community Survey and the old census long form. The first five-year file includes data from 2005 through 2009, while the long form data were collected on the same date as the census. So now relatively current data are available and updated every year. This can be problematic for users -- for instance, the five-year file includes data from both before and after the financial crisis of 2007-08. On the other hand, the benefit is that even for small areas data never are more than a few years out of date. For larger areas the one-year file (from 2009) and the three-file (for 2007-2009) also are available. So we now have detailed social and economic data for a large sample available in a much more timely fashion every year.

As was the case with the long form, the community survey is a sample and so subject to sampling error. The confidence interval is often expressed as percent or number plus or minus another number that defines the interval. For instance: Bush 51, Gore 49 plus or minus 3 percent. The estimate should be between 54 percent Bush, 46 percent Gore to 48 percent Bush to 52 percent Gore. This would be accurate 90 percent, 95 percent or 99 percent of the time, depending upon the level chosen.

Unfortunately, the Census Bureau's approach to estimating the confidence interval is flawed in a number of ways and vastly overestimates the potential error in many instances. For example, if no one is found from a given group, say Native Americans or Italians or Colombians in a given tract in Brooklyn, the confidence interval is arbitrarily set to some number (often 75) and is noted in the data. This would mean, if it were taken literally, that the number of a group in a given tract could be plus or minus 75 and potentially create a negative population count. The Census Bureau now has a note that such absurdities should not be taken seriously, but this implies that the method for computing confidence intervals they used is wrong.

The bureau also used this flawed method in deciding whether to release tables for specific areas from the one-year and three-year files, something it calls "filtering." For example, for years, the bureau did not release foreign place of birth for people living in the Bronx or Staten Island. There are many, many examples of filtered data, which makes the data less useful. (A memo I wrote to the Census Bureau on the topic of their estimation of confidence intervals and it misuse, along with the agency's response is available here. )

At this point, the Census Bureau is mulling over what to do, but in the meantime one should realize that many of the reported confidence intervals are too large and can make it difficult to correctly estimate the confidence interval in cases where the user wants to combine data from individual neighborhoods to come up with statistics for larger areas.

A second issue with the American Community Survey is that its results at the county level are forced to conform to the system used by Census Bureau for yearly population estimates. This means that the actual number reported for a given location in the community survey may conflict with the census counts. This problem, of course, afflicts New York City, where the American Community Survey and census estimates are much higher -- about 250,000 people -- than the census counts. However, the proportions of a given group or category in the community survey should be correct, even if the totals are incorrect. Direct comparisons of totals in the survey files with those in the census files, though, can lead to anomalies.

In sum, despite some differences between the American Community Survey and the old census long form data, the survey is in most ways superior. Not only is the data released more often and in a more timely fashion, but the numbers may be more accurate since the survey is conducted by a permanent interview staff. So the next time you cannot find the data you want in the census report look for it in the American Community Survey.

Andrew A. Beveridge has taught sociology at Queens College since 1981, done demographic analyses for the New York Times since 1993, and been in charge of Gotham Gazette's demographics topic page since 2000. The opinions expressed are his alone.

Editor's Choice

The comments section is provided as a free service to our readers. Gotham Gazette's editors reserve the right to delete any comments. Some reasons why comments might get deleted: inappropriate or offensive content, off-topic remarks or spam.

The Place for New York Policy and politics

Gotham Gazette is published by Citizens Union Foundation and is made possible by support from the Robert Sterling Clark Foundation, the John S. and James L. Knight Foundation, the Altman Foundation,the Fund for the City of New York and donors to Citizens Union Foundation. Please consider supporting Citizens Union Foundation's public education programs. Critical early support to Gotham Gazette was provided by the Charles H. Revson Foundation, Rockefeller Brothers Fund and the Alfred P. Sloan Foundation.