Principal Investigator(s):United States Department of Commerce. Bureau of the Census; Inter-university Consortium for Political and Social Research

Summary:

Prepared by the Inter-university Consortium for Political
and Social Research, this data collection consists of selected subsets
extracted from the Census of Population and Housing, 2000 [United
States]: Summary File 1, Advance National (ICPSR 3325). Summary File 1
data contain information compiled from the questions asked of all
people and of every housing unit enumerated in Census 2000: questions
covering sex, age, race, Hispanic or Latino origin, type of living
quarters (household/group quarters), household relationship, housing
unit... (more info)

Prepared by the Inter-university Consortium for Political
and Social Research, this data collection consists of selected subsets
extracted from the Census of Population and Housing, 2000 [United
States]: Summary File 1, Advance National (ICPSR 3325). Summary File 1
data contain information compiled from the questions asked of all
people and of every housing unit enumerated in Census 2000: questions
covering sex, age, race, Hispanic or Latino origin, type of living
quarters (household/group quarters), household relationship, housing
unit vacancy status, and housing unit tenure (owner/renter). The
information is presented in 286 tables, which are tabulated for every
case, i.e., every geographic unit represented in the data. There is
one variable per table cell, plus additional variables with geographic
information. All cases in the summary file data are classified by
levels of observation, known as "summary levels," in the Census
Bureau's nomenclature. These levels of observation served as the
selection criteria for the subsets. Each subset comprises all of the
cases in one of five summary levels: the nation (summary level 010),
states (summary level 040), counties (summary level 050), places
(summary level 160), and five-digit ZIP code tabulation areas (summary
level 860). Three files are supplied for each subset except the
last. There is a single, relatively large, file that contains all of
the tables in the data, plus two smaller files, each of which contains
approximately one half of the tables. For the five-digit ZIP code
tabulation areas, there is only one file, which contains all of the
tables.

Study Description

Citation

United States Department of Commerce, Bureau of the Census, and Inter-university Consortium for Political and Social Research. CENSUS OF POPULATION AND HOUSING, 2000 [UNITED STATES]: SELECTED SUBSETS FROM SUMMARY FILE 1. ICPSR ed. Washington, DC: U.S. Dept. of Commerce, Bureau of the Census, and Ann Arbor, MI: Inter-university Consortium for Political and Social Research [producers], 2002. Ann Arbor, MI: Inter-university Consortium for Political and Social Research, [distributor], 2002. http://doi.org/10.3886/ICPSR13285.v1

(1) The original Summary File 1, Advance National data
comprise 40 files. There is one column-delimited file that contains
geographic identifiers (the geographic header record file or "Geo"
file), plus 39 comma-delimited table files, each with a subset of
tables in the data. Initial steps in the production of the subsets
for this collection involved sorting the Geo file and the 39 table
files in ascending order of the common identification variable
LOGRECNO, reformatting the Geo file as a comma-delimited file, and
stripping the first five identification variables from each of the 39
table files (FILEID, STUSAB, CHARITER, CIFSN, and LOGRECNO). Next, the
reformatted Geo file was merged with the stripped table files, end to
end, so that corresponding records in the Geo and table files were
joined as a single record in the merged file. Finally, each subset was
generated by extracting from the merged file all cases with a given
value for SUMLEV, the variable that identifies the summary
level. Separate subsets were generated for summary levels 010, 040,
050, 160, and 860. (2) To allow for compatibility with SPSS (as of
August 2002), subsets with a record length greater than the SPSS limit
of 32,767 were "split" into two files, each with a record length less
than the limit. Three files are supplied for each of these subsets: a
"first half" file containing the Geo variables and tables P1-PCT12E, a
"second half" file containing the Geo variables and tables
PCT12F-H16I, and a complete file (for non SPSS use) that contains the
Geo variables and all of the tables, P1-H16I. (3) Each subset contains
all of the geographic component iterations in its summary level, if
any. (4) The implied decimal places of variables INTPTLAT (latitude)
and INTPTLON (longitude) were made explicit in the subsets. In
addition, the values of all Geo variables were enclosed in quotes,
except for variables AREALAND, AREAWATR, POP100, HU100, INTPTLAT, and
INTPTLON. (5) The data definition statements were tested with SAS 8,
SPSS 10, and Stata/SE 7.0. (5) The codebook is provided by the
principal investigator as a Portable Document Format (PDF) file. The
PDF file format was developed by Adobe Systems Incorporated and can be
accessed using PDF reader software, such as the Adobe Acrobat
Reader. Information on how to obtain a copy of the Acrobat Reader is
provided on the ICPSR Web site. (6) The codebook documents data
collection procedures, concepts, and individual variables in the
original Summary File data as well as the ICPSR-produced subsets, but
not the layout and structure of the subsets. That information is
contained in the data dictionary files provided with this
collection. In particular, the "Data Structure and Segmentation"
section in chapter 2 of the codebook and the variable locations shown
in chapter 7 do not apply to the subsets. Every subset file record
begins with the Geo variables in their original order. In a complete
subset file, the Geo variables are followed by the 6th to last
variables in table file 1, then the 6th to last variables in table
file 2, and so on up to the 6th to last variables in table file
39. Each "first half" file is a subset of a complete file: it begins
with the first variable in the Geo file and ends with the last
variable in table file 20. In a "second half" file, the Geo variables
are followed by the 6th to last variables in table file 21, then the
6th to last variables in table file 22, and so on up to the 6th to
last variables in table file 39.

Methodology

Data Source:

self-enumerated questionnaires

Version(s)

Original ICPSR Release:2002-10-02

Version History:

2006-01-18 File CB13285.ALL.PDF was removed from any previous datasets and flagged as a study-level file, so that it will accompany all downloads.