March 7, 2011

Census has just completed. We’ll soon know how many billion we are. Like in earlier censuses, we’ll also know how many of us are male, educated, with TV sets, etc. Going by the numerous questions we were asked, perhaps we would know many more demographic details this time.
A school teacher, Sinha ji, came to our house in Delhi to do the final round of data collection last week. His counterpart had already collected data a month before, but Sinha ji too had a similar questionnaire before him. “It takes about 40 minutes to fill the sheet. People do not give details in one go, or they hide facts,” he told. “They don’t even open their doors to us. I must finish four more sheets before calling it a day.” It was already 8 pm. luckily for him, ours was an exceptionally small family – a mother and a kid – and it took him just five minutes.
“What do you do when they don’t tell you the details?” I asked.
“We have to make intelligent guess.”
“But how can you guess, say, their income or their caste?”
“It is tricky. People do hide their caste. They do not even tell whether they have a bicycle. I know, everybody in this colony would have a bicycle.”
“But why should I have a cycle when I have a car and my child is yet too small?”
He didn’t have an answer. He gave an interesting twist to the discussion. “I have to do 20 households in the adjacent jhuggi colony also. I had to make four visits to get all the data on one hut.”
“What would someone do if he had to collect data from a hundred jhuggis?”
“A colleague of mine landed in exactly this situation, but luck favoured him. Someone had tutored the jhuggi-wallahs what replies to give to our queries.”
“But those replies would be fake.”
“We are doing our duty, ma’am.” He left.
In statistics, they say, errors tend to cancel each other when the sample size is big. In the biggest sampling in the world that our census is, let’s hope the macro picture is more or less correct despite people not opening door, not telling truth and giving patently fake answers.
The bigger question is, whether the data will be used for taking correct policy decisions or it will be a mammoth exercise for the sake of statistics.