I am specifically looking at data (1987, 1999, 2004, 2009) for India on total population, share of urban dwellers, and share of people working in the manufacturing sector.

However, I feel that especially in the data on total population inconsistencies are present (e.g. 1987: 370,000; 1999: 50,000; 2004: 420,000; 2009: 320,000)… anybody who had the same concerns? And any idea how to fix this?

I just looked into the India data and the population counts seem to come out relatively consistently when comparing them to “official” counts. Note that in IPUMS Terra the population level data comes directly from IPUMS International microdata. Additional notes about the India census data can be found here. If the issue seems to persist, feel free to send a more detailed description of your calculations to ipums@umn.edu.

After looking into this further, there is, unfortunately, no good fix for this issue. The root of the problem is that the India data in IPUMS International are not census data, but rather employment survey data. The sample sizes are very small (less than 1%), and the sample design did not consider administrative units smaller than states. The sample design was based on “state-regions,” a concept defined for the purposes of the survey. While the district in which the household is located is identified on each record, the sample is not representative at the district level. The small sample size and lack of representativeness produce the large year-to-year fluctuations you observed.

“State-Region” is available as a Source Variable in IPUMS International. Unfortunately, maps of state-regions are not available.

Because the district-level tabulations are unreliable, we plan to remove them from IPUMS Terra.