Puerto Ricans in the United States (Volume II, Part I - Subject Reports)

Sample Design and Sampling Variability

Sample Design

For persons in housing units at the time of the 1960 Census, the sampling unit was the housing unit and all its occupants; for persons in group quarters, It was the person. On the first visit to an address, the enumerator assigned a sample key letter (A, B, C, or D) to each housing unit sequentially in the order in which lie first visited the units, whether or not he completed an interview. Each enumerator was given a random key letter to start his assignment, and the order of canvassing was indicated in advance, although these instructions allowed some latitude in the order of visiting addresses. Each housing unit which was assigned the key letter "A" was designated as a sample unit and all persons enumerated in the unit were included in the sample. In every group quarters, the sample consisted of every fourth person in the order listed.

Although the sampling procedure did not automatically insure an exact 25-percent sample of persons or housing units in each locality, the sample design was unbiased if carried through according to instructions; and, generally, for large areas the deviation from 25 percent was found to be quite small. Biases may have arisen, however, when the enumerator failed to follow his listing and sampling instructions exactly.

The statistics based on the sample of the 1960 Census returns are estimates that have been developed through the use of a ratio estimation procedure. This procedure was carried out for each of 44 groups of persons in each of the smallest areas for which sample data are published.1 (For a more complete discussion of the ratio estimation procedure, see 1960 Census of Population, Volume I, Characteristics of the Population, Part 1, United States Summary.)

These ratio estimates reduce the component of sampling error arising from the variation in the size of household and achieve some of the gains of stratification in the selection of the sample, with the strata being the groups for which separate ratio estimates are computed. The net effect is a reduction in the sampling error and bias of most statistics below what would be obtained by weighting the results of the 25-percent sample by a uniform factor of four. The reduction in sampling error is trivial for some items and substantial for others. A by-product of this estimation procedure, in general, is that estimates for this sample are consistent with the complete count with respect to the total population and for the subdivisions used as groups in the estimation procedure.

1>Estimates of characteristics from the sample for a given area are produced using the formula:

where x' is the estimate of the characteristic for the area obtained through the use of the ratio estimation procedure,
xi is the count of sample persons with the characteristic for the area in one (i) of the 44 groups,
yi the count of all sample persons for the area in the same one of the 44 groups, and
Yi is the count of persons in the complete count for the area in the same one of the 44 groups.

The figures from the 25-percent sample tabulations are subject to sampling variability, which can be estimated, roughly from the standard errors shown in tables C and D. These tables2 do not reflect the effect of response variance, processing variance, or bias arising in the collection, processing, and estimation steps. Estimates of the magnitude of some of these factors in the total error are being evaluated and will be published at a later date. The chances are about 2 out of 3 that the difference due to sampling variability between an estimate and the figure that would have been obtained from a complete count of the population is less than the standard error. The chances are about 19 out of 20 that the difference is less than twice the standard error and about 99 out of 100 that it is less than2 Â½ times the standard error. The amount by which the estimated standard error must be multiplied to obtain other odds deemed more appropriate can be found in most statistical text books.

Table C shows rough approximations to standard errors of estimated numbers up to 50,000. The relative sampling errors of larger estimated numbers are somewhat smaller than for 50,000. For estimated numbers above 50,000, however, the nonsampling errors, e.g., response errors and processing errors, may have an increasingly important effect on the total error. Table D shows rough standard errors of data in the form of percentages. Linear interpolation in tables C and D will provide approximate results that are satisfactory for most purposes.

For a discussion of the sampling variability of medians and means and of the method for obtaining standard errors of differences between two estimates, see 1960 Census of Population, Volume I, Characteristics of the Population, Part 1, United States Summary.

Illustration: Table 5 shows that there are 14,281 males in the United States of Puerto Rican birth and parentage in the income class$5,000 to$5,999.Table C shows that the standard error for an estimate of 14,28l is about263, which means that the chances are approximately2 out of3 that the results of a complete census would not differ by more than 263 from this estimated 14,281. It also follows that there is only about 1 chance in 100 that a complete-census result would differ by as much as 658, that is, by about 2 Â½ times the number estimated from table C.

2These estimates of sampling variability are based on partial information on variances calculated from a sample of the 1960 Census results. Further estimates are being calculated and will be made available at a later date.

Table C.Rough Approximation to Standard Error of Estimated Number
(Range of 2 chances out of 3)