Excerpts from The User's Guide for the Oversample of Black-Headed Households 1910 United States Census of Population1

[NOTE: The Black Oversample has been reformatted using IPUMS codes. Variable-specific information is available in the Data Dictionary in IPUMS-98 Volume 1: User's Guide.]

Introduction

In order to facilitate analysis of the status of African Americans around the turn of the century in the U.S., a sample of African American-headed households was taken from the 1910 census manuscripts. This sample complements the 1/250 sample of the 1910 census manuscripts gathered and produced at the University of Pennsylvania. This African American oversample, was collected by the same personnel using the same procedures as the general 1/250 sample (Strong et al., 1989). The oversample of African Americans was funded by a grant to Douglas Ewbank (NICHD 1-ROl- HD18651); the creation of the Public Use Tape was funded by grants to S. Philip Morgan from the Research Foundation of the University of Pennsylvania and by NICHD (NICHD 1-R01- 0-25856).

Sampling of the 1910 census manuscripts was carried out using a computer-generated reel, sequence, page and line number. If the person at this micro-film location was a head of a household (and, in the case of the oversample, an African American household head) then the household was included in the sample. Ewbank drew the African American oversample in order to make infant mortality estimates by state. The oversample was drawn from counties with at least 10 percent of the population African American (negro, black or mulatto); and only from states where a reasonably large number of counties had this proportion of African Americans. (The restriction on using only counties with 10% of the population African American was imposed to reduce the costs of searching for sampling points that infrequently produced usable, i.e. African American -headed, households.) In addition, states with large concentrations of Blacks (Alabama, Georgia, South Carolina, Mississippi) were not oversampled because the 1/250 1910 PUS provides sufficient cases for most analyses.

The oversample can be combined with the 1/250 PUS sample by differential weighting of households (or individuals) by county of enumeration as described in a following section.

Number of African American households in the 1910 PUS and in the oversamples

The tables below show the distribution of households by race and "sample area". The sample area refers to whether the household was located in a county that was included only in the 1/250 sample (1/250), included in the 1/250 sample and in oversample 1 (over 1), or was included in the 1/250 sample and in oversample 2 (over 2). There are three tables: the first shows the frequencies in the African American oversample the households (N-5533) added by these oversamples. The second shows the frequencies in the 1/250 PUS sample. These are households available from the earlier released, nationally representative sample. The third table shows the frequencies in the combined data sets.

Table 1:
Frequencies of Households in the Oversample.

African American

Others

1/250

Over 1

Over 2

1/250

Total

Race of Head

-3

Unknown

-2

Illegible

-1

Blank

0

White

1

Black

3160

1390

4550

2

Mulatto

710

273

983

3

Indian

4

Japanese

5

Chinese

6

Hawaiian

7

Other

Overall

3870

1663

5533

Table 2
Frequencies of Households Only in the 1/250 PUS

African American

Others

Race of Head

1/250

Over 1

Over 2

1/250

Total

-3

Unknown

1

1

-2

Illegible

13

13

-1

Blank

38

38

0

White

78,721

78,721

1

Black

4,244

2,587

623

7,454

2

Mulatto

893

666

152

1,711

3

Indian

274

274

4

Japanese

232

232

5

Chinese

160

160

6

Hawaiian

45

45

7

Other

165

165

Overall

5,137

3,253

775

79,649

88,814

Table 3
Frequencies of Households in the Combined PUS and Oversample

African American

Others

Race of Head

1/250

Over 1

Over 2

1/250

Total

-3

Unknown

1

1

-2

Illegible

13

13

-1

Blank

38

38

0

White

78,721

78,721

1

Black

4,244

5,747

2013

12,004

2

Mulatto

893

1,376

425

2,694

3

Indian

274

274

4

Japanese

232

232

5

Chinese

160

160

6

Hawaiian

45

45

7

Other

165

165

Overall

5,137

7,123

2,438

79,649

94,347

Sampling areas

Items H05 to H07 in the codebook show the states from which this oversample was taken: Maryland, Virginia, North Carolina, Florida, Kentucky, Tennessee, Arkansas, Louisiana and Texas. The four states with the largest population of Blacks were South Carolina, Alabama, Mississippi and Georgia, and these were excluded from the oversample. Counties with over 10 percent African American populations in Maryland, Kentucky, Texas were sampled using a 0.01 sampling fraction. Such counties in other states (Virginia, North Carolina, Florida, Tennessee, Arkansas) were sampled using a 0.005 sampling fraction. Louisiana has some counties sampled at .005 and some at .01 fractions because it was used to test optimum sampling fractions.

The table below shows a sample produced by combining the 1910 PUS with the oversamples provided on this tape. Maryland is represented by 191 African American headed households in the 1910 PUS. These households are in counties that were also included in oversample 2. This oversample provides an additional 489 households. There are 13 Virginian households in the 1910 PUS which are in counties not oversampled (counties that contained fewer than 10% African Americans) and 574 are from counties oversampled. The oversample provides data on an additional 528 households. The total number of households added by oversample 1 is 3870 and by oversample 2 is 1663.

Table 4
Number of Households in 1910 P.U.S. and Oversample, by State

1/250 P.U.S.

Oversample

State or Region

Not oversamp

over samp 1

over samp 2

White

over samp 1

over samp 2

Total

Missg/Milit

4

209

213

North East

76

6,436

6,512

New York

157

9,014

9,171

New Jersey

82

2,278

2,360

Pennsylvania

203

6,813

7,016

Mid-West

622

28,548

29,170

Delaware

25

180

205

Maryland

191

1,011

489

1,691

D.C.

84

248

332

Virginia

13

574

1,216

528

2,331

West Virginia

54

990

1,044

North Carolina

14

573

1,229

695

2,511

South Carolina

766

551

1,317

Georgia

1,034

106

1,205

2,345

Florida

297

398

401

1,096

Kentucky

21

235

1,827

565

2,648

Tennessee

34

410

1,535

546

2,525

Alabama

808

974

1,782

Mississippi

898

644

1,542

Arkansas

11

388

906

441

1,746

Louisiana

405

243

833

512

609

2,602

Oklahoma

125

1,400

1,525

Texas

36

606

2,721

747

4,110

West

70

8,483

8,553

ALL

5,137

3,253

775

79,649

3,870

1,663

94,347

Weighting strategy

The original 1/250 PUS file was a self-weighted sample of all households of the U.S., whereas this sample is an oversample of only some counties. The variable H26 is on both this tape and the 1910 PUS tape and indicates which households (regardless of race of household head) were in counties that were oversampled. Note that this variable is not a weighting variable and (in the case of the 1910 PUS) note that it does not imply that this household (individual) was chosen in the oversample. But this variable can be used to assign appropriate weights as will be
described below.

The oversample, as stated above, was taken with two different sampling fractions: 0.005 and 0.01. A value is assigned to the household to indicate whether it was selected from a county which was oversampled, and if so, which oversample it was taken from. Clearly, there will be black-headed households in the main PUS which were taken from the oversample counties, but which were in neither part of the oversample. But the same weights (the inverse of the sampling fractions) should be assigned to all African American households in a given county regardless of the sample that produced this household record. The table below shows the frequencies of households in the main sample (by race) and the oversamples by the CNTYWT variable.

Table 5
Frequency of Households in the PUS and Oversample, by Race and County Weight

African American

Others

CNTYWT

1/250

Over 1

Over 2

1/250

Total

1

5,137

71,207

76,344

2

3,253

3,870

5,882

13,005

3

775

1,663

2,560

4,998

ALL

9,165

3,870

1,663

79,649

94,347

Table 6
Sampling Fractions

(1)

(2)

(3)

(4)

CNTYWT

1/250

Over 1

Over 2

Overall

1

.004

.004

2

.004

.005

.009

3

.004

.010

.014

If the household head is not African American then a weight of 1.0 should be assigned (regardless of the value of CNTYWT). If the household head is African American, then the sampling fractions for counties coded 1, 2 and 3 are .004, by the ratio of these sampling fraction and adjusted by some constant 'x'.

If we let the weight for CNTYWT 1 = 1, values for 2 = (.004/.009=) .444, and the weight to be assigned to CNTYWT 3 = (.004/.014=) .29. These ratios of weights must be maintained in order to maintain "equal probability of selection". But the weights can be shifted by different factors to achieve different aims. In order to produce a nationally representative sample of African Americans (where the mean weight is 1.0) using the 1910 PUS and these oversamples the following should be assigned to CNTYWT values 1-3, 1.633, .726 and .466. To provide a nationally representative sample (of all races), these weights would have to be adjusted so that the weighted N is equal to the observed N in the 1910 PUS N (adjust the above weights by a factor of .6236). Weighting could also be done at the state or regional level.

Note also that these samples are of households and that a few whites are found in black-headed households. As a result some white individuals will have weights other than 1.0.

Works Cited:

Michael A. Strong, et al., "Occupation, Industry and Class of Worker," User's Guide: Public Use Sample, 1910 United States Census of Population, Philadelphia: Population Studies Center, University of Pennsylvania, 1989.