Our Blog

Developing Big Data Analysis for Public Benefit

Big data is a term for “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” [1]

The world now has the capacity to use big data for a range of applications.

In the nonprofit sector, as well as within government, we are in the early stages of exploring the uses of big data for our work in communities. Both government and the nonprofit sector have data — lots of data — but we are just at the start of being able to harness this data and use it to improve our work. This is why the open data movement is so important.

How can we use government and sector data, along with other data, to do our work better? How can big data assist us to tackle the challenges we face — poverty, growing inequality, climate change, physical and mental health, unemployment and underemployment, homelessness, and food security?

In businesses, big data has been used with success in commerce, for example: in helping to identify and match products with customers by aggregating information on consumers to better understand their purchasing patterns, lifestyles, etc. Big data is also used in other areas such as in the investment field and to curate music and films.

Big data has the potential to inform the nonprofit and government sectors. It is already being used in the health care system to inform best practices, funding decisions and treatment availability. However, the community-based nonprofit sector and much of government is lagging in its ability to open its data and to analyze that data for public benefit. We recognize that the nonprofit sector, government and private sector need to partner together to develop our collective capacity for big data use and analysis. We need to identify what data is useful for our work and explore together the possibilities of big data to inform, streamline and potentially transform how we do our work together.

While everyone agrees that ensuring the well being of our citizens and building strong resilient communities is the shared goal, the use of big data is not bias free. It is not merely a question of the aggregation and open publication of data. The initial collection and analysis of this data requires decisions based on the curated knowledge of experts, community values and priorities. These factors are critical as big data is only one piece of the puzzle. For discussion of the challenges of using big data to address social problems the big data for good discussion blog is a good place to start. [2]

Decisions regarding the use of data sets, the emphasis and value placed on different criteria and the accuracy of the findings cannot simply be left to proprietary interests with private financial not pubic benefit objectives. It is essential, therefore, that public sector big data (sector and government data) not only be published in the public domain, but that the data analysis and methodology must be open and include key stakeholders in its development.

The following principles can guide big data development for the public domain:

The default position on data in the nonprofit and government sectors should be open. Work needs to be undertaken to develop standards to make this data readable and accessible for use and to assist nonprofit and government data holders open up their data.

Big data use and analysis for public good purposes needs to be publicly owned and open. A nonprofit consortium of government, nonprofit sector, funders, academics/universities, private sector and data managers should develop big data mechanisms for the public domain.

The intellectual property generated from big data in the public domain must be held for the public good. Intellectual property generated in the development of big data for public purposes and public benefit cannot be held in proprietary ownership.

Social innovation is complexandbig data will be only one of many important contributors to finding new ways forward. Values, priorities, and local and individual circumstances will continue to pay a significant role.

Lynn Eakin

Lynn is ONN's Policy Advisor. Lynn has been providing consulting services to the nonprofit sector since 1989. Currently Lynn, one of the founding members of ONN, is involved in better positioning the sector to address the cross-cutting policy issues it faces. She continues to engage in sector research and is involved with ONN in identifying, developing and advocating for systemic reforms to improve the ability of the sector to undertake its important work. For more on Lynn’s background, see her website: www.lynneakin.com.

I have been part of conversations like this over the years. Data quality and openness are indeed important principles, and as a researcher I delight in thinking what we could learn. But I also know that big data fails not simply on a technical level of issues like data quality, but because it runs smack dab into some other public and conflicting protections, such as privacy and population/group vulnerabilities. Legislation and issues of consent stand in the way as a barrier. Over-surveillance is also a risk. Our sector will need to be able to demonstrate to the public that the use of their data does no harm and is for a good greater than our own narrower interests. We need to be able to stand any resultant critique.

I say all this as someone from an education background where testing and evaluating (a.k.a. grading) are key system drivers – and are fairly critiqued for being that. I say this having heard a cabinet minister urge the community sector to behave more like the health sector, to build a strong evidence base, to describe exposures and dosages, as though this is the way we worked.

We must also be quite sure what is it we would be trying to measure and to what end. Will some of us be weighing demographic populations against each other (too old, too far gone, etc.) as though it reflected solely on those populations? For an example of this, see how low-income schools are ranked and rated as though it was a level playing field. Or how about the the time a donor wanted to know what age group should receive his donation, for the biggest impact, you understand. All others be damned.

Or, more properly, will we use big data analysis as a test of the system, its performance and its failings? Will we be able to do this without creating a further burden on those already at the edge? Will we have both the skill and the will to do these things?

If we can do these things, then we will be able to make a case for an investment in big data.

Lynn has started an important discussion concerning the use of Big Data for the public good. There are undoubtedly numerous current and potential benefits in using Big Data, ranging from tracking municipal water bills to plan services and promote conservation, to analyzing patterns of HIV infection in Africa in order to better plan medical interventions.

Diane has introduced a note of caution for various practical reasons, and I would agree with her observations. But we also need to consider the nature of Big Data more broadly.
My first issue concerns definition. Lynn has used the widely-accepted one that identifies the nature of Big Data in its size, and in the complexity of analysis that is required. But in an age where our smartphones have more computing power than a supercomputer of the 1970s, does this definition tell us anything useful?

I find it more useful to think of Big Data as large data sets that are collected for one purpose, and then used (actually or potentially) for another. That is why one US General could say that ‘We use Big Data to kill people’, referring to the targeting of suspected insurgents with GIS data from their cell phones (eSet security newsletter, 2016). Another example: researchers from the US and China are studying data from transit smart cards used in Beijing to locate pickpockets and purse snatchers (The Economist, August 20 2016, ‘Cutpurse Capers’). Although the results of this project are ambiguous, the researchers are already planning to extend their studies to ‘asocial groups’ such as ‘alcoholics, drug-users, homeless people, and drug dealers’. We should not assume that the use of Big Data will be benign.

My second concern is with the difference between Big Data and Open Data. There is a natural public interest in the potential knowledge to be gained from access to currently-restricted data sets, particularly the vast holdings of administrative data by provincial and federal governments. But the issues around this Open Data are distinct. Most government data is locked away, and will remain so, for various reasons including legacy computer systems, and privacy legislation. Governments like to talk about Open Data and Open Government, but in Canada little is happening on this front. At the federal level the main users of Big Data / Open Data are the police and security services (Redden, ‘Big Data as a System of Knowledge, Investigating Canadian Governance’, 2015).

From a public policy perspective, perhaps the most useful initiative for community organizations would be to ally with the research community and establish some formal liaison with governments (all levels) to discuss the potential of Open Data. What data sets are potentially available (given legal and technical restrictions), what could be learned, and what costs would be involved? Some answers to these questions would move us along the path towards public benefit from Big Data.

CONTACT INFORMATION

Contact us

Your Name (required)

Your Email (required)

Your Message
[recaptcha]

SUPPORT

Information on this website is available in alternative formats on request. Please contact us at info@theonn.ca or at (416) 642-5786 for accommodation requests. Feedback about ONN and this website is always welcome.