Big Data in Governance in India: Case Studies

This research seeks to understand the most effective way of researching Big Data in the Global South. Towards this goal, the research planned for the development of a Global South big data Research Network that identifies the potential opportunities and harms of big data in the Global South and possible policy solutions and interventions.

This work has been made possible by a grant from the John D. and Catherine T. MacArthur Foundation. The conclusions, opinions, or points of view expressed in the report are those of the authors and do not necessarily represent the views of the John D. and Catherine T. MacArthur Foundation.

Introduction

The research was for a duration of 12 months and in form of an exploratory study which sought to understand the potential opportunity and harm of big data as well as to identify best practices and relevant policy recommendations. Each case study has been chosen based on the use of big data in the area and the opportunity that is present for policy recommendation and reform. Each case study will seek to answer a similar set of questions to allow for analysis across case studies.

What is Big Data

Big data has been ascribed a number of definitions and characteristics. Any study of big data must begin with first conceptualizing defining what big data is. Over the past few years, this term has been become a buzzword, used to refer to any number of characteristics of a dataset ranging from size to rate of accumulation to the technology in use.[1]

Many commentators have critiqued the term big data as a misnomer and misleading in its emphasis on size. We have done a survey of various definitions and understandings of big data and we document the significant ones below.

Computational Challenges

The condition of data sets being large and taxing the capacities of main memory, local disk, and remote disk have been seen as problems that big data solves. While this understanding of big data focusses only on one of its features—size, other characteristics posing a computational challenge to existing technologies have also been examined. The (US) National Institute of Science and Technology has defined big data as data which “exceed(s) the capacity or capability of current or conventional methods and systems.” [2]

These challenges are not merely a function of its size. Thomas Davenport provides a cohesive definition of big data in this context. According to him, big data is “data that is too big to fit on a single server, too unstructured to fit into a row-and-column database, or too continuously flowing to fit into a static data warehouse.” [3]

Data Characteristics

The most popular definition of big data was put forth in a report by Meta (now Gartner) in 2001, which looks at it in terms of the three 3V’s—volume[4], velocity and variety. It is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.[5]

Aside from volume, velocity and variety, other defining characteristics of big data articulated by different commentators are— exhaustiveness,[6] granularity (fine grained and uniquely indexical),[7] scalability,[8] veracity,[9] value[10] and variability.[11] It is highly unlikely that any data-sets satisfy all of the above characteristics. Therefore, it is important to determine what permutation and combination of these gamut of attributes lead us to classifying something as big data.

Qualitative Attributes

Prof. Rob Kitchin has argued that big data is qualitatively different from traditional, small data. Small data has used sampling techniques for collection of data and has been limited in scope, temporality and size, and are “inflexible in their administration and generation.”[12]

In this respect there are two qualitative attributes of big data which distinguish them from traditional data. First, the ability of big data technologies to accommodate unstructured and diverse datasets which hitherto were of no use to data processors is a defining feature. This allows the inclusion of many new forms of data from new and data heavy sources such as social media and digital footprints. The second attribute is the relationality of big data.[13]

This relies on the presence of common fields across datasets which allow for conjoining of different databases. This attribute is usually a feature of not the size but the complexity of data enabling high degree of permutations and interactions within and across data sets.

Patterns and Inferences

Instead of focussing on the ontological attributes or computational challenges of big data, Kenneth Cukier and Viktor Mayer Schöenberger define big data in terms of what it can achieve.[14]

They defined big data as the ability to harness information in novel ways to produce useful insights or goods and services of significant value. Building on this definition, Rohan Samarajiva has categorised big data into non-behavioral big data and behavioral big data. The latter leads to insights about human behavior.[15]

Samarajiva believes that transaction-generated data (commercial as well as non-commercial) in a networked infrastructure is what constitutes behavioral big data. Scope of Research The initial scope arrived at for this case-study on role of big data in governance in India focussed on the UID Project, the Digital India Programme and the Smart Cities Mission. Digital India is a programme launched by the Government of India to ensure that Government services are made available to citizens electronically by improving online infrastructure and by increasing Internet connectivity or by making the country digitally empowered in the field of technology.[16]

The Programme has nine components, two of which focus on e-governance schemes. Read More[PDF, 1948 Kb]

The views and opinions expressed on this page are those of their
individual authors. Unless the opposite is explicitly stated, or unless
the opposite may be reasonably inferred, CIS does not subscribe to these
views and opinions which belong to their individual authors. CIS does
not accept any responsibility, legal or otherwise, for the views and
opinions of these individual authors. For an official statement from CIS
on a particular issue, please contact us directly.

Follow our Works

Request for Collaboration

We invite researchers, practitioners, artists, and theoreticians, both organisationally and as individuals, to engage with us on topics related internet and society, and improve our collective understanding of this field. To discuss such possibilities, please write to Sunil Abraham, Executive Director, at sunil[at]cis-india[dot]org or Sumandro Chattapadhyay, Research Director, at sumandro[at]cis-india[dot]org, with an indication of the form and the content of the collaboration you might be interested in.

In general, we offer financial support for collaborative/invited works only through public calls.

About Us

The Centre for Internet and Society (CIS) is a non-profit organisation that undertakes interdisciplinary research on internet and digital technologies from policy and academic perspectives. The areas of focus include digital accessibility for persons with disabilities, access to knowledge, intellectual property rights, openness (including open data, free and open source software, open standards, open access, open educational resources, and open video), internet governance, telecommunication reform, digital privacy, and cyber-security. The academic research at CIS seeks to understand the reconfiguration of social processes and structures through the internet and digital media technologies, and vice versa.

Through its diverse initiatives, CIS explores, intervenes in, and advances contemporary discourse and practices around internet, technology and society in India, and elsewhere.