Saturday, July 28, 2007

Defense of Data Analysis

Michael pushed, Suresh and Piotr pulled, the algorithmii in the northeast and beyond pored over data, and someone asked, why slice, dice, aggregate and group by. Is it actionable?

John Graunt wrote "Foundations of Vital Statistics" in 1661/2 (his notation), thus founding Statistics. He describes his data (birth/death announcements in Parish Hall records, starting circa Plague time), how they were collected (eg ringing of the bell brings the data collector to the site so they can determine the cause of death, published every thursday for a fee), his first analyses are simple groupbys (cause of death and number of fatalities due to each), and then he proceeds to observations, some simple, some that forces him to lament the lack of data and curiosities of data collection, and some elliptical, hanging without much support. The paper is remarkably modern (modulo the quirks of the language from 300 years ago), and culminates in the soul-searching data analysts do even now: "It may now be asked, to what purpose tends all this laborious buzzling and groping", and after aggressive answers, concludes with:

That a clear knowledge of all these particulars, and many more, whereat I have shot but at rovers, is necessary in order to good, certain, and easie Government, and even to balance Parties, and factions both in Church and State. But whether the knowledge thereof be necessary to many, or fit for others, then the Sovereign, and his chief Ministers, I leave to consideration.

Did Michael, Suresh and Piotr know their python scripts (?) will have political, religious and other implications?