Census

A census is the procedure of systematically acquiring and recording information about the members of a given population. It is a regularly occurring and official count of a particular population.[1] The term is used mostly in connection with national population and housing censuses; other common censuses include agriculture, business, and traffic censuses. The United Nations defines the essential features of population and housing censuses as "individual enumeration, universality within a defined territory, simultaneity and defined periodicity", and recommends that population censuses be taken at least every 10 years. United Nations recommendations also cover census topics to be collected, official definitions, classifications and other useful information to coordinate international practice.[2][3]

The word is of Latin origin; during the Roman Republic, the census was a list that kept track of all adult males fit for military service. The modern census is essential to international comparisons of any kind of statistics, and censuses collect data on many attributes of a population, not just how many people there are, although population estimates remain an important function of a census.

A census can be contrasted with sampling in which information is obtained only from a subset of a population, sometimes as an intercensal estimate. Modern census data are commonly used for research, business marketing, and planning, and as a baseline for sampling surveys. Census counts are necessary to adjust samples to be representative of a population by weighting them as is common in opinion polling. Similarly, stratification requires knowledge of the relative sizes of different population strata which can be derived from census enumerations. In some countries, census data are used to apportion electoral representation (sometimes controversially – e.g., Utah v. Evans).

Sampling

A census is often construed as the opposite of a sample as its intent is to count everyone in a population rather than a fraction. However, population censuses relies on a sampling frame to count the population. This is the only way to be sure that everyone has been included as otherwise those not responding would not be followed up on and individuals could be missed. The fundamental premise of a census is that the population is not known and a new estimate is to be made by the analysis of primary data. The use of a sampling frame is counterintuitive as it suggests that the population size is already known. However, a census is also used to collect attribute data on the individuals in the nation. This process of sampling marks the difference between historical census, which was a house to house process or the product of an imperial decree, and the modern statistical project. The sampling frame used by census is almost always an address register. Thus it is not known if there is anyone resident or how many people there are in each household. Depending on the mode of enumeration, a form is sent to the householder, an enumerator calls, or administrative records for the dwelling are accessed. As a preliminary to the dispatch of forms, census workers will check any address problems on the ground. While it may seem straightforward to use the postal service file for this purpose, this can be out of date and some dwellings may contain a number of independent households. A particular problem is what the U.S. Census Bureau calls "group quarters", a category that includes student residences, religious orders, homes for the elderly, people in prisons, etc. as these are not easily enumerated by a single householder. Within the U.S. they are often treated differently and visited by special teams of Census workers to ensure they are classified appropriately.

Residence definitions

Individuals are normally counted within households and information is typically collected about the household structure and the housing. For this reason international documents refer to censuses of population and housing. Normally the census response is made by a household, indicating details of individuals resident there. An important aspect of census enumerations is determining which individuals can be counted from which cannot be counted. Broadly, three definitions can be used: de facto residence; de jure residence; and, permanent residence. This is important to consider individuals who have multiple or temporary addresses. Every person should be identified uniquely as resident in one place but where they happen to be on Census Day, their de facto residence, may not be the best place to count them. Where an individual uses services may be more useful and this is at their usual, or de jure, residence. An individual may be represented at a permanent address, perhaps a family home for students or long term migrants. It is necessary to have a precise definition of residence to decide whether visitors to a country should be included in the population count. This is becoming more important as students travel abroad for education for a period of several years. Other groups causing problems of enumeration are new born babies, refugees, people away on holiday, people moving home around census day, and people without a fixed address. People having second homes because of working in another part of the country or retaining a holiday cottage are difficult to fix at a particular address sometimes causing double counting or houses being mistakenly identified as vacant. Another problem is where people use a different address at different times e.g. students living at their place of education in term time but returning to a family home during vacations or children whose parents have separated who effectively have two family homes. Census enumeration has always been based on finding people where they live as there is no systematic alternative - any list you could use to find people is derived from census activities in the first place. Recent UN guidelines provide recommendation on enumerating such complex households.[4]

Enumeration strategies

Historical censuses used crude enumeration assuming absolute accuracy. Modern approaches take into account the problems of overcount and undercount, and the coherence of census enumerations with other official sources of data.[5] This reflects a realist approach to measurement, acknowledging that under any definition of residence there is a true value of the population but this can never be measured with complete accuracy. An important aspect of the census process is to evaluate the quality of the data.[6]

Many countries use a post-enumeration survey to adjust the raw census counts.[7] This works in a similar manner to capture-recapture estimation for animal populations. In census circles this method is called dual system enumeration (DSE). A sample of households are visited by interviewers who record the details of the household as at census day. These data are then matched to census records and the number of people missed can be estimated by considering the number missed in the census or survey but counted in the other. This way counts can be adjusted for non-response varying between different demographic groups. An explanation using a fishing analogy can be found in "Trout, Catfish and Roach..."[8] which won an award from the Royal Statistical Society for excellence in official statistics in 2011.

Triple system enumeration has been proposed as an improvement as it would allow evaluation of the statistical dependence of pairs of sources. However, as the matching process is the most difficult aspect of census estimation this has never been implemented for a national enumeration. It would also be difficult to identify three different sources that were sufficiently different to make the triple system effort worthwhile. The DSE approach has another weakness in that it assumes there is no person counted twice (over count). In de facto residence definitions this would not be a problem but in de jure definitions individuals risk being recorded on more than one form leading to double counting. A particular problem here are students who often have a term time and family address.

Several countries have used a system which is known as short form/long form.[9] This is a sampling strategy which randomly chooses a proportion of people to send a more detailed questionnaire to (the long form). Everyone receives the short form questions. Thereby more data are collected but not imposing a burden on the whole population. This also reduces the burden on the statistical office. Indeed in the UK all residents were required to fill in the whole form but only a 10% sample were coded and analysed in detail, until 2001.[10] New technology means that all data are now scanned and processed. Recently there has been controversy in Canada about the cessation of the long form with the head, Munir Sheikh resigning.[11] The use of alternative enumeration strategies is increasing[12] but these are not so simple as many people assume. The Netherlands has been most advanced in adopting a census using administrative data. This allows a simulated census to be conducted by linking several different administrative databases at an agreed time. Data can be matched and an overall enumeration established accounting for where the different sources are discrepant. A validation survey is still conducted in a similar way to the post enumeration survey employed in a traditional census. Other countries which have a population register use this as a basis for all the census statistics needed by users. This is most common amongst Nordic countries but requires a large number of different registers to be combined including population, housing, employment and education. These registers are then combined and brought up to the standard of a statistical register by comparing the data in different sources and ensuring the quality is sufficient for official statistics to be produced.[13] A recent innovation is the French instigation of a rolling census programme with different regions enumerated each year such that the whole country is completely enumerated every 5 to 10 years.[citation needed] In Europe, in connection with the 2010 census round, a large number of countries adopted alternative census methodologies, often based on the combination of data from registers, surveys and other sources.[14]

Technology

Censuses have evolved in their use of technology with the latest censuses, the 2010 round, using many new types of computing. In Brazil, handheld devices were used by enumerators to locate residences on the ground. In many countries, census returns could be made via the Internet as well as in paper form. DSE is facilitated by computer matching techniques which can be automated, such as propensity score matching. In the UK, all census formats are scanned and stored electronically before being destroyed, replacing the need for physical archives. The record linking to perform an administrative census would not be possible without large databases being stored on computer systems.

New technology is not without problems in its introduction. The US census had intended to use the handheld computers but cost escalated and this was abandoned, with the contract being sold to Brazil. Online response is a good idea but one of the functions of census is to make sure everyone is counted accurately. A system which allowed people to enter their address without verification would be open to abuse. Therefore households have to be verified on the ground, typically by an enumerator visit or post out. Paper forms are still necessary for those without access to Internet connections. It is also plausible that the hidden nature of an administrative census means that users are not engaged with the importance of contributing their data to official statistics.

Uses of Census data

In the nineteenth century, the first censuses collected paper enumerations that had to be collated by hand so the statistical uses were very basic. The government owned the data and were able to publish statistics themselves on the state of the nation. Uses were to measure changes in the population and apportion representation. Population estimates could be compared to those of other countries.

By the beginning of the twentieth century, censuses were recording households and some indications of their employment. In some countries, census archives are released for public examination after many decades, allowing genealogists to track the ancestry of interested people. Archives provide a substantial historical record which may challenge established notions of tradition. It is also possible to understand the societal history through job titles and arrangements for the destitute and sick.

As governments assumed responsibility for schooling and welfare, large government departments made extensive use of census data. Actuarial estimates could be made to project populations and plan for provision in local government and regions. It was also possible for central government to allocate funding on the basis of census data. Even into the mid twentieth century, census data was only directly accessible to large government departments. However, computers meant that tabulations could be used directly by university researchers, large businesses and local government offices. They could use the detail of the data to answer new questions and add to local and specialist knowledge.

Now, census data are published in a wide variety of formats to be accessible to business, all levels of governance, media, students and teachers, charities and researchers, and any citizen who is interested. Data can be represented visually or analysed in complex statistical models, to show the difference between certain areas, or to understand the association between different personal characteristics. Census data offer a unique insight into small areas and small demographic groups which sample data would be unable to capture with precision.

Privacy

Although the census provides a useful way of obtaining statistical information about a population, such information can sometimes lead to abuses, political or otherwise, made possible by the linking of individuals' identities to anonymous census data.[15] This consideration is particularly important when individuals' census responses are made available in microdata form, but even aggregate-level data can result in privacy breaches when dealing with small areas and/or rare subpopulations.

For instance, when reporting data from a large city, it might be appropriate to give the average income for black males aged between 50 and 60. However, doing this for a town that only has two black males in this age group would be a breach of privacy because either of those persons, knowing his own income and the reported average, could determine the other man's income.

Typically, census data are processed to obscure such individual information. Some agencies do this by intentionally introducing small statistical errors to prevent the identification of individuals in marginal populations;[16] others swap variables for similar respondents. Whatever measures have been taken to reduce the privacy risk in census data, new technology in the form of better electronic analysis of data poses increasing challenges to the protection of sensitive individual information. This known as statistical disclosure control.

Another possibility is to present survey results by means of statistical models in the form of a multivariate distribution mixture.[17] The statistical information in the form of conditional distributions (histograms) can be derived interactively from the estimated mixture model without any further access to the original database. As the final product does not contain any protected microdata, the model based interactive software can be distributed without any confidentiality concerns.

Another method is simply to release no data at all, except very large scale data directly to the central government. Different release strategies between government have led to an international project (IPUMS) to co-ordinate access to microdata and corresponding metadata. Such projects also promote standardising metadata by projects such as SDMX so that best use can be made of the minimal data available.

Historical examples

Egypt

Ancient Greece

There are several accounts of ancient Greek and Mesopotamian city states carrying out censuses.[18] The question of which is first is clouded by very different approaches: counting only men, counting a pile of rocks etc. but such censuses took place 1600 BCE and earlier.[citation needed]

When the Romans took over Judea in 6CE, the legate Publius Sulpicius Quirinius organised a census for tax purposes. The Gospel of Luke links the birth of Jesus to this event. Luke 2.

China

In 2 CE during the Han Dynasty, China held a census still considered by scholars to be quite accurate.[19][20] The census is considered one of the world's earliest preserved censuses,[21] and found 57.67 million people registered in 12.36 million households.[22][23] The areas with the highest population densities were the Yellow River and Huai River valleys. Chengdu was the largest city in China, reaching a population of 282,147 people.[21] Another census dates to AD 144, when 49.73 million people were recorded living in 9.94 million households.

India

The oldest recorded census in India is thought to have occurred around 300 BCE during the reign of the Emperor Chandragupta Maurya under the leadership of Kautilya or Chanakya.[24]

Rome

The word "census" originated in ancient Rome from the Latin word censere ("to estimate"). The census played a crucial role in the administration of the Roman Empire, as it was used to determine taxes. With few interruptions, it was usually carried out every five years.[25] It provided a register of citizens and their property from which their duties and privileges could be listed. It is said to have been instituted by the Roman king Servius Tullius in the 6th century BCE,[26] at which time the number of arms-bearing citizens was supposedly counted at around 80,000.[27]

Inca Empire

In the 15th century, the Inca Empire had a unique way to record census information. The Incas did not have any written language but recorded information collected during censuses and other numeric information as well as non-numeric data on quipus, strings from llama or alpaca hair or cotton cords with numeric and other values encoded by knots in a base-10 positional system.

Spanish empire

On May 25, 1577, King Philip II of Spain ordered by royal cédula the preparation of a general description of Spain's holdings in the Indies. Instructions and a questionnaire, issued in 1577 by the Office of the Cronista Mayor, were distributed to local officials in the Viceroyalties of New Spain and Peru to direct the gathering of information. The questionnaire, composed of fifty items, was designed to elicit basic information about the nature of the land and the life of its peoples. The replies, known as "relaciones geográficas," were written between 1579 and 1585 and were returned to the Cronista Mayor in Spain by the Council of the Indies.

References

Bielenstein, Hans, (1978). "Wang Mang, the restoration of the Han dynasty, and Later Han." In The Cambridge History of China, vol. 1, eds. Denis Twitchett and John K. Fairbank, p. 223-90, Cambridge: Cambridge University Press.