Big Data in Bellagio: who counts, what counts, and how do we count?

One of the early discussions emerging at our ‘Big Data for Social Change’ at the Rockefeller Center in Bellagio surrounds how the act of capturing of big data impinges on our understanding of it. There are three strands in particular which have been flagged up. Firstly, who does the counting? As Marc Ventresca has showed, the shift from ecclesiastical to secular authority in the collection of data affected perceptions of society, for example shifting the focus to the individual from the collective. The national census is not an impassive, aloof process but rather a culturally and politically significant object, reflecting and reinforcing societal debate and conflict. This significance is reflected in the 1918 observation that, “the science of statistics is the chief instrumentality through which the progress of civilization is now measured, and by which its development hereafter will be largely controlled”.

Given this historical context, it seems obvious that big data might follow a similar fashion, with exactly who collects the data and asks the questions affecting our understanding of the analytics and answers. Big data is evidently a multi-stakeholder endeavour, but an uneven and arguably inequitable one: private sector companies capture and control huge swathes of information, much of it personal, whereas NGOs and academic institutions tend to lag behind in data analysis. Governments sit somewhere in the middle of this continuum: the NSA revelations showed that the largest governments are certainly not shying away from mass-scale data collection, but governments are surely yet to fully leverage the data collected for improved policy making. In short, then, who does the big data capture will likely weigh heavily on our ongoing discussions.

Secondly, what counts is necessarily a consequence of the method of data collection. Examples abound around the world – from gender identity in India, to ethnic identity in Rwanda and Burundi, to the collection of biometric data on refugees – showing how the questions asked almost inevitably affect the results and subsequent understanding. Existing social distinctions and categories may be strengthened and minority identities or viewpoints suffocated through this process, particularly while the use of data is at a nascent stage.

Finally, and related to the two previous points, exactly how we count is a crucial factor. The three V’s of big data are well known, but three more were suggested: those of viability, validity and verification. The distinction has been made between data collection and data analysis: which becomes significant when different groups are doing this separately. Danger thus lies at the point of analysis when inequities have been ‘baked in’ during the data collection are unknown and unknowable to the researcher conducting the analysis.

Obviously our discussions are at an early stage, but I’ll be doing my best to share other questions and answers which emerge over the course of the conference, as we move to a richer understanding of the opportunities and risks of big data for societies around the world.