This chapter serves as a guide for data users to both the file and the technical documentation. Novice users trying to understand how to use the documentation and the file should read this chapter first. Please pay particular attention to the section titled Data Structure and Segmentation. This structure is a new approach for Census 2000.

Users of the DVD/CD-ROM can access the file information in two ways. The DVD/CD-ROM contains software that aggregates user-defined areas, allows for multiple geographic selections, and creates customized reports. (Note: ASCII CD-ROMs prepared upon release of individual state files do not contain supporting software. Software is only available on the DVD/CD-ROM products created after all files have been released.)

Users can also utilize off-the-shelf standard software packages to manipulate the data. The data on the DVD/CD-ROM are in a standard proprietary format that can easily be imported into other software packages.

Flat ASCII files by state are available for downloading via FTP from the American FactFinder Web site or from ftp://ftp2.census.gov/census2000/ . They are also available as an on-demand CD-ROM product. In ASCII products, the geographic header record file contains fixed fields while the data portion, including the geographic links, are in comma-delimited format.

File names follow a predefined structure. For Summary File 3 (SF 3), all geoheader files are named stgeor.uf3. The st is the United States Postal Service (USPS) two-character abbreviation for the state. US ( us ) is used for national files. The geo portion of the title is a constant. The r indicates the release number of the product. The r field is only used after the initial file release. In any subsequent releases, the r field is replaced by an alpha sequence letter (a, b, etc.). For example, the state geoheader file for South Dakota is named sdgeo.uf3 . If there were a re-release of this file, it would be named sdgeoa.uf3. The extension .uf3 is used for both the state files and the national file.

Data files are named stseqr where stis the USPS state code, seq is the file sequence number, and r is the file re-release indicator. SF 3 files have an extension of uf3. For example, file sd00010.uf3 is the tenth segment file (00010 ) in the South Dakota ( sd ) file set. The absence of the "r" file indicates the first release of the data. The extension, .uf3 indicates this is the SF 3 file. Both the US and state SF 3 files have the .uf3 extension

The geographic header record, Figure 2-5 at the end of this chapter, defines each field and provides its data dictionary reference name, size, starting position and data type. A slightly different presentation of the header record appears in the identification section of the Data Dictionary (Chapter 7). In Figure 2-5, the information in each summary level column is a guide to the presence or absence of additional geographic information on that specific summary level. For example, on the column for summary level 040, we see x for the first 11 fields, indicating that there will be information for those fields. In the county field, there is no x, indicating that there is no code for county in summary level 040. Since 040 is the summary level for state, this is perfectly logical.

In another example, we note the elementary school district field in the geographic header under Special Area Codes. In searching through the various summary levels of the header record, we see that the information (designated by an x in the field) is available only for summary level 750 (blocks within a hierarchy) and summary level 775 (blocks within a hierarchy for Puerto Rico).

The smallest level of geography available for SF 3 is the block group although a smaller level of geography, the block, is available in Summary File 1 and the Redistricting file. Figure 2-3 at the end of this chapter provides an example of the various geographic hierarchies used, building from the block. Take some time to review this chart to become familiar with the different hierarchies.

Begin reading the schematic from the bottom at the blocks entry. By following the lines you can see the hierarchy very quickly. For example, follow blocks to block groups, to census tracts, to counties. This path indicates that census tracts and their sublevels in the hierarchy are uniquely identified within a county and do not cross county boundaries.

Follow blocks to the school district hierarchy. This path tells you that school districts can cross county, place, and other sub-state boundaries, but do not cross state lines.

Figure 2-4 at the end of this chapter presents similar information for the American Indian areas/Alaska Native area/Hawaiian home land hierarchy. Again, read the schematic from the bottom, beginning with the lowest level of geography.

File identification (FILEID), state/U.S. abbreviation (STUSAB), summary levels (SUMLEV), and the geographic component codes (GEOCOMP) are critical elements in identifying the geographic level for each record. The STUSAB field identifies the highest level of geography for the file. In state files, it identifies the individual state. For SF 3 files, the following FILEID and STUSAB codes are used:

The Summary Level Sequence Chart (Chapter 4) identifies each geographic level and provides the code that is in the SUMLEV field. It is easy to determine the code for the desired geography if you remember that the last geographic area type listed in the sequence identifies the geography of the summary level; the prior codes simply identify the hierarchy. See the example below:

140 State-County-Census Tract

In summary level 140, the record contains data for a census tract within a county within a state. Census tracts are uniquely numbered within a county and do not cross county boundaries. Since counties do not cross state boundaries, this is a simple application. Thus, summary level 140 provides data for a complete census tract.

When reading the Summary Level Sequence Chart, it is important to recognize that dashes (-) separate the individual hierarchies while slashes separate different types of geography (such as place/remainder) within the same hierarchy.

The segmentation information discussed below applies to the ASCII version of the CD-ROM/DVD files, the FTP files downloaded from American FactFinder, and any tape-to-CD files that are custom created by the Census Bureau.

It is important to have some clarification on definitions. The data for an individual state are known as the file set. This is the package that an individual CD-ROM or state FTP directory will contain.

It is easiest to think of the file set as a logical file. However, this logical file consists of 77 physical files: the geographic header file and file01 through file76. This file design is a change from census files from earlier decades. The larger size of the tables made this necessary. By offering smaller files, users can work only with the file containing the table they need. Figure 2-2 provides the file/table details.

A unique logical record number (LOGRECNO in the geographic header) is assigned to all files for a specific geographic entity. This is done so all records for that specific entity can be linked together across files. Besides the logical record number, other identifying fields are also carried over from the geographic header file to the table files. These are file identification (FILEID), state/U.S. abbreviation (STUSAB), characteristic iteration (CHARITER), and characteristic iteration file sequence number (CIFSN).

See Figure 2-1 below for geographic header information for FILE01 through FILE76.

Figure 2-1. File Set Structure Schematic

Geographic header record file

File 01

File 02

Files 03-39

Record 1

FILEID

FILEID

FILEID

Link fields shown on Files 01 and 02 are repeated for all files

STUSAB

STUSAB

STUSAB

CHARITER

CHARITER

CHARITER

CIFSN

CIFSN

CIFSN

LOGRECNO

LOGRECNO

LOGRECNO

(Record 1)

(Record 1)

(Record 1)

Remainder of geographic header record for geographic area x

Tables P1-P5 (222 cells)

Tables P6-P18 (238 cells)

See Figure 2-2 for distribution of the tables across files

Record 2

FILEID

FILEID

FILEID

Link fields shown on Files 01 and 02 are repeated for all files

STUSAB

STUSAB

STUSAB

CHARITER

CHARITER

CHARITER

CIFSN

CIFSN

CIFSN

LOGRECNO

LOGRECNO

LOGRECNO

(Record 2)

(Record 2)

(Record 2)

Remainder of geographic header record for geographic area y

Tables P1-P5 (222 cells)

Tables P6-P18 (238 cells)

See Figure 2-2 for distribution of the tables across files

Record 3

FILEID

FILEID

FILEID

Link fields shown on Files 01 and 02 are repeated for all files

STUSAB

STUSAB

STUSAB

CHARITER

CHARITER

CHARITER

CIFSN

CIFSN

CIFSN

LOGRECNO

LOGRECNO

LOGRECNO

(Record 3)

(Record 3)

(Record 3)

Geographic header record for geographic area z

Tables P1-P5 (222 cells)

Tables P6-P18 (238 cells)

See Figure 2-2 for distribution of the tables across files

The geographic header record is standard across all electronic data products from Census 2000. It is in a fixed field format as described in the data dictionary. However, when geographic header fields are used to provide geographic linkage across files in files 01 through 76, they are in the same format as the rest of the filecomma delimited. Some header fields that appear in all 77 files (geographic header and 76 table files) are not used. For example, the character iteration (CHARITER) field is only used in Summary Files 2 and 4. In the SF 1 and SF 3 files, it is always coded as 000.

Figure 2-2. File/Table Segmentation

File name (CIFSN)

Number of data items

Starting matrix number

Ending matrix number

stgeo.uf3

st00001.uf3

248

P1

P14

st00002.uf3

218

P15

P24

st00003.uf3

241

P25

P37

st00004.uf3

227

P38

P46

st00005.uf3

220

P47

P50

st00006.uf3

250

P51

P67

st00007.uf3

213

P68

P91

st00008.uf3

245

P92

P138

st00009.uf3

203

P139

P145C

st00010.uf3

245

P145D

P145H

st00011.uf3

235

P145I

P146F

st00012.uf3

246

P146G

P147I

st00013.uf3

241

P148A

P149D

st00014.uf3

245

P149E

P150I

st00015.uf3

239

P151A

P154D

st00016.uf3

240

P154E

P159G

st00017.uf3

239

P159H

P160E

st00018.uf3

164

P160F

P160I

st00019.uf3

247

PCT1

PCT8

st00020.uf3

204

PCT9

PCT15

st00021.uf3

222

PCT16

PCT17

st00022.uf3

235

PCT18

PCT19

st00023.uf3

233

PCT20

PCT24

st00024.uf3

233

PCT25

PCT27

st00025.uf3

221

PCT28

PCT32

st00026.uf3

106

PCT33

PCT34

st00027.uf3

221

PCT35

PCT37

st00028.uf3

162

PCT38

PCT43

st00029.uf3

205

PCT44

PCT48

st00030.uf3

224

PCT49

PCT51

st00031.uf3

205

PCT52

PCT56

st00032.uf3

243

PCT57

PCT61

st00033.uf3

243

PCT62A

PCT63C

st00034.uf3

234

PCT63D

PCT64H

st00035.uf3

231

PCT64I

PCT66C

st00036.uf3

233

PCT66D

PCT67E

st00037.uf3

223

PCT67F

PCT68C

st00038.uf3

245

PCT68D

PCT68H

st00039.uf3

247

PCT68I

PCT69I

st00040.uf3

243

PCT70A

PCT70I

st00041.uf3

245

PCT71A

PCT71E

st00042.uf3

196

PCT71F

PCT71I

st00043.uf3

240

PCT72A

PCT72B

st00044.uf3

240

PCT72C

PCT72D

st00045.uf3

240

PCT72E

PCT72F

st00046.uf3

240

PCT72G

PCT72H

st00047.uf3

215

PCT72I

PCT73A

st00048.uf3

190

PCT73B

PCT73C

st00049.uf3

190

PCT73D

PCT73E

st00050.uf3

190

PCT73F

PCT73G

st00051.uf3

190

PCT73H

PCT73I

st00052.uf3

231

PCT74A

PCT75C

st00053.uf3

236

PCT75D

PCT75G

st00054.uf3

234

PCT75H

PCT76D

st00055.uf3

145

PCT76E

PCT76I

st00056.uf3

127

H1

H18

st00057.uf3

249

H19

H26

st00058.uf3

216

H27

H44

st00059.uf3

250

H45

H68

st00060.uf3

248

H69

H86

st00061.uf3

250

H87

H104

st00062.uf3

59

H105

H121

st00063.uf3

171

HCT1

HCT3

st00064.uf3

115

HCT4

HCT4

st00065.uf3

143

HCT5

HCT5

st00066.uf3

248

HCT6

HCT7

st00067.uf3

219

HCT8

HCT14

st00068.uf3

214

HCT15

HCT17

st00069.uf3

220

HCT18

HCT23

st00070.uf3

248

HCT24

HCT31C

st00071.uf3

246

HCT31D

HCT36D

st00072.uf3

246

HCT36E

HCT40I

st00073.uf3

243

HCT41A

HCT43I

st00074.uf3

224

HCT44A

HCT44G

st00075.uf3

247

HCT44H

HCT47F

st00076.uf3

96

HCT47G

HCT48I

Note: st represents the United States Postal Service two-character alphabetic state abbreviation.

The User Updates chapter (Chapter 9) informs data users about corrections, errata, and related explanatory information. These updates provide information about unique characteristics, changes, or corrections. Often this information becomes available too late to be reflected in the tables (matrices) or related documentation.
Census 2000 Notes and Errata, which contains user updates for individual files as well as the corrected counts issued by the Count Question Resolution program, is available on the Web at http://www.census.gov/prod/cen2000/notes/errata.pdf. User updates are also included in the bi-weekly electronic newsletter, Census Product Update. To receive the newsletter by e-mail, register at http://www.census.gov/mp/www/cpu.html; contact Customer Services Center, Marketing Services Office, U.S. Census Bureau on 301-763-4636; or send e-mail to webmaster@census.gov.
The User Updates chapter is included so that updated information provided from the Web site or from Customer Services can be filed in a standard location.