3 Acknowledgements The author would like to gratefully acknowledge advice from the UN ESCAP Environment and Development Policy Section, especially Hala Razian. This report is a result of collaboration between UNESCAP, Environment and Development Policy Section, and the UN Global Pulse Lab Jakarta office. The author benefitted from inputs from the Pulse Lab Jakarta team. I would also thank the UN Global Pulse Lab Jakarta office for providing space and resources during the desk research. The author is also grateful for inputs and comments into the draft report by Katinka Weinberger, Rusyan Jill Mamiit and Marko Javorsek. The findings, interpretations, and conclusions expressed in this report solely reflect the author s views. 2

4 Glossary Algorithm A formula or step by step procedure for solving a problem. Anonymization The process of removing specific identifiers (often personal information) from a dataset. Big data A term for a large data set. Big data analytics A type of quantitative research that examines large amounts of data to uncover hidden patterns, unknown correlations and other useful information. Big data for development A concept that refers to the identification of sources of Big Data relevant to policy and planning of development programs. Citizen reporting or crowd sourced data Information actively produced or submitted by citizens through mobile phone based surveys, hotlines, user generated maps, etc; while not passively produced, this is a key information source for verification and feedback Data exhaust Passively collected transactional data from people s use of digital services like mobile phones, purchases, web searches, etc., these digital services create networked sensors of human behavior. Data mining A term refers to the activity of going through big data sets to look for relevant or pertinent information. Data philanthropy A term that describes a new form of partnership in which private sector companies share data for public benefit. Data cleaning/cleansing The detection and removal, or correction, of inaccurate records in a dataset. Data migration The transition of data from one format or system to another. Data science The gleaning of knowledge from data as a discipline that includes elements of programming, mathematics, modeling, engineering and visualization. Data silos Fixed or isolated data repositories that do not interact dynamically with other systems Exabyte A large unit of computer data storage, two to the sixtieth power bytes. The prefix exa means one billion, or one quintillion. In decimal terms, an exabyte is a billion gigabytes. 3

5 Geospatial analysis A form of data visualization that overlays data on maps to facilitate better understanding of the data. Real time data A data that covers/is relevant to a relatively short and recent period of time such as the average price of a commodity over a few days rather than a few weeks, and is made available within timeframe that allows action to be taken that may affect the conditions reflected in the data. Infographic A graphic visual representations of information, data or knowledge intended to present information quickly and clearly Mashup The use of data from more than one source to generate new insight. Status quo A term refers to the existing state of affairs, particularly with regards to social or political issues. Open data A term refers to data that is free from copyright and can be shared in the public domain. Open web data Web content such as news media and social media interactions (e.g. blogs, Twitter), news articles obituaries, e commerce, job postings; sensor of human intent, sentiments, perceptions. Online information Web content such as news media and social media interactions (e.g. blogs, Twitter), news articles obituaries, e commerce, job postings; this approach considers web usage and content as a sensor of human intent, sentiments, perceptions, and want. Petabyte A measure of memory or storage capacity and is 2 to the 50th power bytes or, in decimal, approximately a thousand terabytes. Predictive analytics/modeling The analysis of contemporary and historic trends using data and modeling to predict future occurrences. Physical sensors Satellite or infrared imagery of changing landscapes, traffic patterns, light emissions, urban development and topographic changes, etc.; remote sensing of changes in human activity. Quantitative data analysis The use of complex mathematical or statistical modeling to explain, or predict, financial and business behavior. Sentiment analysis (opinion mining) The use of text analysis and natural language processing to assess the attitudes of a speaker or author, or a group. 4

6 Structured data Data arranged in an organized data model, like a spreadsheet or relational database. Semantics A term refers to the study of meaning. It focuses on the relation between signifiers, like words, phrases, signs, and symbols, and what they stand for; their denotation. Tweet A post via the Twitter social networking site restricted to a string up to 140 characters Unstructured data Data that cannot be stored in a relational database and can be more challenging to analyze from documents and tweets to photos and videos. 5

7 Executive Summary This stocktaking report attempts to provide an overview of big data, its use in the policymaking context, the stakeholders and their roles and provides some suggested actionable steps as a discussion stimulus for the Big Data and the 2030 Agenda for Sustainable Development: Achieving the Development Goals in the Asia and the Pacific Region meeting in Bangkok on December Critical data for global, regional and national development policymaking are still lacking. Many governments still do not have access to adequate data on their entire populations. This is particularly true for the poorest and most marginalized, the very people that leaders will need to focus on if they are to achieve zero extreme poverty and zero emissions, and to leave no one behind in the next 15 years. This is true, too, for the international community, who will not be able to support the most vulnerable and marginalized people without an overhaul of the current ways of gathering data. While most data is technically public, accessing it is not always easy, and mining it for relevant insights can require technical expertise and training that organizations and governments with limited resources can t always afford. Making good use of big data will require collaboration of various actors including data scientists and practitioners, leveraging their strengths to understand the technical possibilities as well as the context within which insights can be practically implemented. Recent discussion suggests to move away from seeing Big Data in isolation, but to rather focus on the ecosystem of Big Data. According to this concept, Big Data is not just data no matter how big or different it is considered to be; big data is first and foremost about the analytics, the tools and methods that are used to yield insights, the frameworks, standards, stakeholders involved and then, knowledge. Effective application of Big Data would also require changes in the decision making process, which customarily relies on traditional statistics. Given the high frequency of Big Data, a more responsive mechanism will need to be put in place that allows the government to process the information and act quickly in response. However, this stock take finds that big data is not (yet) playing a crucial role in policy making. If at all, it is used at the agenda setting stage and/or evaluation stage of policy making. One of the reasons might be because the ecosystem is not yet functioning and crucial elements, such as standards and frameworks are still missing. National governments and other policy makers are just starting to systematically engage with big data for policy making. The proposed steps are based on the recommendations of the UN Independent Advisory Group, and are meant to help building and maintaining the Big Data ecosystem for better development policy making: Establish and manage a coordination mechanism with the key UN stakeholders and other international partners; 6

8 Develop a consensus on principles and standards among the UNESCAP member countries; Kick off and institutionalize a Regional Multi Stakeholder Mechanism to share innovations; Mobilize regional resources for capacity development for the less advanced UNESCAP member countries; Enhance in house big data analytics capacity. Depending on the discussions during the workshop and agreements between stakeholders, certain recommended actions could be prioritized and elaborated further. 7

9 1.Introduction Big data applications may offer the ability to collect and analyze real time information from across ESCAP s 62 member States for policies that relate to the 2030 Agenda s 17 goals and their 169 targets. The scope of this information is vast, and big data applications can facilitate policy making in the region that would otherwise require dedicated intensive and continuous human and financial resources. This stocktaking report, commissioned by ESCAP, attempts to provide an overview of big data, its use in the policy making context, the stakeholders and their roles in making the most out of the opportunities that big data presents. For illustrative purposes, the report then presents a selection of best practices using big data in the policy making process. The report then will, built on existing work in this field, provide some practical ideas on how to further progress the 2030 Agenda and policy making around it using big data. The recommendations of this report also shall inform ESCAP s strategic planning for the development of targeted capacity building program activities, and the Asia Pacific Sustainable Development Roadmap. The discussion of big data is quite complex, ranging from practical or technical challenges to legal and regulatory limitations. The below figure (Figure 1) illustrates the 3 different dimensions of big data and policies. While this report touches on the policy for data in the gaps and constraints section, the focus of this report is mainly on the inner circle: data for policy. The case studies complement the center piece evidence informed policy making. The purpose of this report is to support ESCAP s work of providing rigorous analysis and peer learning; and translating these findings into policy dialogues and recommendations. It focuses on big data in the policy context and in the context of the 2030 Agenda. Despite improvements, critical data for global, regional and national development policymaking are still lacking. Large data gaps remain in several development areas. Poor data quality, lack of timely data and unavailability of disaggregated data on important dimensions are among the major challenges. As many as 350 million people worldwide are not covered by household surveys. There could be as many as a quarter more people living on less than $1.25 a day than current estimates suggest, because they have been missed out of official surveys [1]. 8

10 Figure 1: Big Data and Policy As a result, many national and local governments continue to rely on outdated data or data of insufficient quality to make planning and decisions. Good quality, relevant, accessible and timely data enables governments to extend targeted services into communities, and to implement policies more efficiently. Many governments still do not have access to adequate data on their entire populations, and particularly true for the poorest and most marginalized, the very people that leaders will need to focus on if they are to leave no one behind in the next 15 years [2]. This is true, too, for the international community, who will not be able to support the most vulnerable and marginalized people without an overhaul of the current ways of gathering data. Box 1: Real world policy constraints: the ODI survey To confirm some of the anecdotal evidence about the lack of good data in developing country ministries, the Overseas Development Institute (ODI) interviewed a series of policy makers based in line ministries to understand how they viewed capacity constraints in their respective countries. Findings highlighted the problems with stability and continuity of data collection, particularly in countries in conflict where often data and institutional memory are lost during the war, impacting time series analysis. A further challenge was more political in nature, especially around a limited understanding of how the public sector 9

11 and civil servants can work with data and how data serves them, which may cause resistance to utilization of data effectively. Political issues are sometimes misconstrued by development actors as capacity issues [3]. Data are not just about measuring changes; they also facilitate and catalyze that change. Of course, good quality numbers will not change people s lives in themselves. But to target the poorest systematically, to lift and keep them out of poverty, even the most willing governments cannot efficiently deliver services if they do not know who those people are, where they live and what they need. Nor do they know where their resources will have the greatest impact. Policy making takes place in an increasingly rich data environment, which poses both promises and challenges to policy makers. Data offers a chance for policy making and implementation to be more citizen focused, taking account of citizens needs, preferences and actual experience of public services, as recorded on social media platforms. As citizens express policy opinions on social networking sites such as Twitter and Facebook; rate or rank services or agencies on government applications; or enter discussions on a range of social enterprise and NGO sites, they generate a whole range of data that government agencies might harvest to good use. Policy makers also have access to a huge range of data on citizens actual behaviour, as recorded digitally whenever citizens interact with government administration or undertake some act of civic engagement, such as signing a petition. Data mined from social media or administrative operations in this way also provide a range of new data, which can enable government agencies to monitor and improvetheir own performance, for example through log usage data of their own electronic presence or transactions recorded on internal information systems, which are increasingly interlinked. Governments can use data from social media for selfimprovement, by understanding what people are saying about government, and which policies, services or providers are attracting negative opinions and complaints, enabling identification of a failing school, hospital or contractor, for example. They can solicit such data via their own sites, or those of social enterprises. And they can find out what people are concerned about or looking for, from the Google Search API or Google trends, which record the search patterns of a huge proportion of Internet users [4]. The recent report of the UN Secretary General s Independent Expert Advisory Group (IEAG) [5] defines the data revolution for sustainable development as the integration of data coming from new technologies with traditional data in order to produce relevant high quality information with more details and at higher frequencies to foster and monitor sustainable development. This revolution also entails the increase in accessibility to data through much more openness and transparency, and ultimately more empowered people for better policies, better decisions and greater participation and accountability, leading to better outcomes for the people and the planet. 10

12 2. Big Data and the Data Revolution Big Data is not a single 'thing' it is a collection of data sources, technologies and methodologies that have emerged from, and to, exploit the exponential growth in data creation over the past decade [6]. Big data is a buzzword; used to describe a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques. Data is a growing element of our lives. More and more data is being produced and becoming known in the popular literature as big data, its usage is becoming more pervasive, and its potential for policy making and international development is just beginning to be explored [7] From the 3 Vs to the 3 Cs of Big Data Big data can be defined as large volumes of high velocity, complex, and variable data that require advanced techniques and technologies to enable the capture, storage, distribution, management and analysis of the information. Big data can be characterized by 3Vs: the extreme volume of data, the wide variety of types of data and the velocity at which the data can be processed [8,9,10]. Although big data doesn't refer to any specific quantity, the term is often used when speaking about petabytes and exabytes of data, much of which cannot be integrated easily. It is worth to mention that most recently, some data scientists and researchers have introduced a fourth characteristic, veracity, or data assurance. That is, the big data analytics and outcomes are error free and credible. However, veracity is still a goal and not (yet) a reality [8]. Annex 1 describes the most common types of big data. 11

13 Figure 2: The 3 V s [11] Data sets grow in size in part because they are increasingly being gathered by inexpensive and numerous information sensing, mobile devices, remote sensing, software logs, cameras, microphones, radio frequency identification (RFID) readers, and wireless sensor networks [12], [13],[14]. The world's technological per capita capacity to store information has roughly doubled every 40 months since the 1980s [15]; as of 2012, every day 2.5 exabytes ( ) of data were created. As of 2014, every day 2.3 zettabytes ( ) of data were created by Super power high tech Corporation worldwide [16]. Letouzé, one of the Big Data for Development pioneers, has developed the 3 Cs of Big Data presenting another perspective. The 3Cs stand for Big Data Crumbs, Big Data Capacities and Big Data Community ; it fundamentally frames Big Data as an ecosystem, a complex system actually, not as data sources, sets or streams. And it is both in reference and opposition to the 3 Vs of Big Data [17]. According to his concept, Big Data is not just data no matter how big or different it is considered to be; this is why and where Big Data as a field an ecosystem. Gary King s Harvard presentation on Big Data is not about the data also and perhaps highlights that big data is first and foremost about the analytics, the tools and methods that are used to yield insights, turn the data into information, then, perhaps, knowledge [18]. The 2nd C of Big Data, for Capacities, is largely about that the tools and methods, the hardware and software requirements and developments, and the human skills. There is 12

14 a need to both consider and develop capacities, without which crumbs are irrelevant. But it s not just about skills and chips; it s also about how the whole question is framed. This is of course related to the concept of Data Literacy, and the need to become sophisticated users and commentators. The 3rd C of community refers to the set of actors both producers and users of these crumbs and capacities; it s really the human element potentially it s the whole world. Figure 3: The 3 C s Diagram [17] And the resulting concentric circles with community as the larger set are a complex ecosystem with feedback loops between them. For example new tools and algorithms produce new kinds of data, which may in turn lead to the creation of new startups and capacity needs. Letouzé and others [17,18] argue that the basic point is that Big Data is not big data; and that questions like how can national statistical office use Big Data don t mean much or rather they miss the point. The real important question is why and how an NSO (National Statistical Office) should engage with Big Data as an ecosystem, partner with some of its actors, become one of its actors, and help shape the future of this ecosystem, including its ethical, legal, technical and political frameworks. This question can then be expanded to the sustainable development actors interested in using big data and to become part of the Big Data Ecosystem. This would also involve the role of development actors as facilitators, knowledge brokers and convening powers. 13

15 This report is structured in a similar way: from a narrow focus on Big Data to promoting the establishment of a systems approach to Big Data. The focus of this report will thus focus (i) on the actors and their role in the ecosystem; (ii) the potential role Big Data can play in the Policy Cycle, and (iii) Steps towards the Ecosystem s approach and UNESCAP s potential role Use of Big Data The sheer volume of data generated, stored, and mined for insights has become economically relevant to businesses, government, and consumers. In the context of policy making, big data can be used to enhance awareness (e.g. capturing population sentiments), understanding (e.g. explaining changes in food prices), and/or forecasting (e.g. predicting human migration patterns). In most countries, public sector bodies also gather enormous amounts of data from censuses, tax returns, and public health surveys, for example. Much of this data is technically public, but accessing it is not always easy, and mining it for relevant insights can require technical expertise and training that organizations and governments with limited resources can t always afford. Making good use of big data will require collaboration of various actors including data scientists and practitioners, leveraging their strengths to understand the technical possibilities as well as the context within which insights can be practically implemented. Box 2: Twitter Example: Use of Mobile Technology for Perception Assessment Since 2010, Indonesia has witnessed substantial increases in food prices: the price of rice increased 51% between December 2009 and February With more than 20 million Twitter user accounts in Jakarta, a wealth of data is being produced daily. Pulse Lab Jakarta analyzed Twitter conversations discussing food price increases between March 2011 and April Taxonomies, that are groups of words and phrases with related meanings, were developed in the Bahasa Indonesia language to identify relevant content. A classification algorithm was trained to categorize the extracted tweets as positive, negative, confused, or neutral to analyze their sentiment. Using simple timeseries analysis, the researchers quantified the correlation between the volume of foodrelated Twitter conversations and official food inflation statistics. A relationship was found between retrospective official food inflation statistics and the number of tweets speaking about food price increases. Moreover, upon analyzing fuel price tweets, it was found that perceptions of food and fuel prices were related. This big data example was created by the Global Pulse to demonstrate the relevance within the policy context to Government of Indonesia. [19]. The public sector cannot fully exploit Big Data without leadership from the private sector [20]. The conversation around Data Philanthropy a term which describes a new form of partnership in which private sector companies share data for public benefit has 14

16 advanced since its emergence at the World Economic Forum in Davos in Discussions about the concept of Data Philanthropy, or private sector data sharing, have gained momentum and moved forward, reaching a broader audience. In an article about the issue, Fast Company s Co. Exist, summarized: '(t)he next movement in charitable giving and corporate citizenship may be for corporations and governments to donate data, which could be used to help track diseases, avert economic crises, relieve traffic congestion, and aid development. The public sector isn t, however, the only one to gain from Data Philanthropy: companies donating data can get advantage from it too, especially those companies interested in the sustainable economy. These companies could enhance their role in corporate social responsibilities thus shaping their branding. Also, their role as stakeholders might change as they will get to influence policies and public opinion in a broader way than related to their very own business [21]. Big data is showing promise to improve, and perhaps substantively change, public sector and the international development sector in novel ways. Of general interest is the fact that big data often is produced at a much more disaggregated level, e.g. individual instead of a country level. Whereas aggregated data glosses over the often wide ranging disparities within a population, disaggregated data allows decision makers more objectively to consider those portions of the population who were previously neglected Big Data and Open Data In the context of policy making, it is worth to elaborate on the interface between big data and the new phenomenon of open data they are closely related but are not the same. Open data brings a perspective that can make big data more useful, more democratic, and less threatening. While big data is defined by size, open data is defined by its use. But those judgments are subjective and dependent on technology: today's big data may not seem so big in a few years when data analysis and computing technology improve. All definitions of open data include two basic features: the data must be publicly available for anyone to use, and it must be licensed in a way that allows for its reuse. Open data should also be relatively easy to use, although there are gradations of "openness". 15

17 Figure 3: The Interface of Big Data and Open Data [22] The diagram in Figure 3 maps the relationship between big data and open data, and how they relate to the broad concept of open government. There are a few important points to note: a) Big data that is not open is not democratic: Section one of the diagram includes all kinds of big data that is kept from the public like the data that large retailers hold on their customers, or national security data. This kind of big data gives an advantage to the people who control it. b) Open data does not have to be big data to matter: Modest amounts of data, as shown in section four, can have a big impact when it is made public. Data from local governments, for example, can help citizens participate in local budgeting, choose healthcare, analyze the quality of local services, or build apps that help people navigate public transport. c) Big, open data doesn't have to come from government: This is shown in section three. More and more scientists are sharing their research in a new, collaborative research model. Other researchers are using big data collected from social media most of which is open to the public to analyze public opinion and market trends. But, when governments turn big data into open data, it is especially powerful: Government agencies have the capacity and funds to gather very large amounts of data, and opening up those datasets can have major social and economic benefits. Both big data and open data can transform business, government, and society and a combination of the two is especially potent. Big data gives unprecedented power to understand, analyze, and ultimately change the world we live in. Open data ensures that power will be shared bearing huge potential to transform the way policies are made. 16

18 3. The Big Data Ecosystem Unlike in other areas, the stakeholders in the Big Data sphere are not yet well connected and some processes need to be in place to bring them together. Making good use of big data will require collaboration of various actors including data scientists and practitioners, leveraging their strengths to understand the technical possibilities as well as the context within which insights can be practically implemented [22]. Policy stakeholders act at the international, regional, national and local level. When looking at the government actors, no single type of responsible authority emerges as a clear leader in the implementation of innovative data for policy initiatives, with the clear implication that there are opportunities for many different stakeholders and actors Types of Stakeholders The EU data for policy report [23] distinguishes between the following types of stakeholders: global and European policy makers; national policy makers; regional policy makers; statistical offices; science and R&D organisations; data brokers; private providers of data analytics and visualisation tools; civil society and the policy analysis/evaluation community. For the purpose of outlining the relevant stakeholders, this report adopts the EU stakeholder categories. In the EU for example, Big Data is stimulated to promote jobs and economic growth, to promote industrial leadership and an open society (open data). It is connected to the many societal challenges that the European Commission has defined, among which are health, demographic change and wellbeing, smart, green and integrated transport and climate action, environment, resource efficiency and raw materials. However, no projects could be found in which the European Commission uses big data itself for direct use in its own policy cycle. On a national policy making scale, big data is often used in the areas of transport, where innovative sensor data provides relevant information. Moreover, it is useful in detecting fraud, reducing crime and improving national security, both via defence and intelligence. National policy makers potentially possess a lot of data that could be used for informed policymaking using big data analyses. Opening up these data could be a first step (open data). Furthermore, the organisations of these policy makers have significant financial means to set up projects and improve big data for policy. At the regional level big data could address policy issues concerning traffic, road safety, critical infrastructure, waste management, safety and security and public health. In contrast to national policymaking, data for regional policy focuses more on the policy implementation instead of agenda setting. The statistical offices use big data to acquire better official statistics for policy means. These may concern all sorts of policy areas. Societal challenges that could be addressed are, for example, energy efficiency, infrastructure, smart transport and demographic change. The most relevant resources that the statistical offices have are knowledge and 17

19 skills related to statistically analysing large sets of data. They may also have the needed technological infrastructure to store and process big data. They have financial means to acquire and analyse data for official policy. Still, they may have to expand the experience and IT knowledge and equipment needed for big data. Most pilots are performed in cooperation with external institutes. The main benefits of big data for the statistical offices is improving the accuracy, timeliness and relevance of their statistics and reducing costs. For example, using social media data and having access to data about offline and offline retail revenues is less expensive than large scale surveys (reusing and matching data versus collecting data). The science community supports policy makers in all policy areas, on all governmental levels and in all steps of the policy cycles. Concerning science policy, the main policy questions have included how to promote an environment, which protects intellectual property and supports the most effective organization of disciplines and teams and resources. Steering the large resources devoted to research into the most useful and beneficial channels can be of great benefit to society, and this area has been one where there is great sophistication in the analysis and much data available. The science community has knowledge and skills related to statistically analysing large sets of data and using an evidence based approach when researching the data driven approaches. They often also have the needed technological infrastructure to store and process big data. They have financial means to conduct research. Moreover they have the possibility to connect multiple disciplines in their research (as the data centres demonstrate). Lastly, they possess or have access to a vast amount of large data sets (e.g. climate data, civil engineering data, social and behavioural data) and can thus more easily connect different data sets. Data brokers could provide their data for all kinds of societal challenges and/or policy areas. Those are usually companies that collect information, including personal information about users, from a wide variety of sources for the purpose of reselling such information. An example is healthcare, in which Google Flue Trends is active. Data brokers often do not analyse or actually use the data; they often only provide it for the other actors. Data brokers have as their main resource data sets on specific groups or on societies as large. Furthermore, they have knowledge of and skills in data collection and analysis, for which they have dedicated tools. As most of the data is commercially traded, they have the financial means and incentives to invest in the improvement of data collection, storage and analysis Roles of Stakeholders in the Ecosystem 18

20 Table 1: Roles of Stakeholders in Data Ecosystem Governments Multi National Organizations Statistical Bodies R&D Bodies Civil Society Private Providers Data x x x x x Financial Resources x x x x x Standards and Regulatory Frameworks x x Skills and Knowledge Brokering, Facilitation, Capacity Strengthening IT Infrastructure x (x) x x x x x x x x x Governments should empower public institutions to respond to the data revolution and put in place regulatory frameworks that ensure robust data privacy and data protection, and promote the release of data as open data by data producers, and strengthen capacity for continuous data innovation. Multinational organizations, donors, governments and semi public institutions should invest in data, providing resources to countries and regions where statistical and technical capacity is weak. They should develop infrastructures and implement standards to continuously improve and maintain data quality and usability; keep data open and useable by all. They should also finance analytical research in forward looking and experimental subjects. International and regional organizations should work with other stakeholders to set and enforce common standards for data collection, production, anonymization, sharing and use to ensure that new data flows are safely and ethically transformed into global public goods, and maintain a system of quality control and audit for all systems and all data producers and users. They also should support countries in their capacity building efforts. Statistical systems should be empowered, resourced and independent, to quickly adapt to the new world of data to collect, process, disseminate and use high quality, open, disaggregated and geo coded data, both quantitative and qualitative. All public, private and civil society data producers should share data and the methods used to process them, according to globally, regionally, or nationally brokered agreements and norms. They should publish data, geospatial 19

21 information and statistics in open formats and with open terms of use, following global common principles and technical standards, to maintain quality and openness and protect privacy. Governments, civil society, academia and the philanthropic sector should work together to raise awareness of publicly available data, to strengthen the data and statistical literacy ( numeracy ) of citizens, the media, and other infomediaries, ensuring that all people have capacity to input into and evaluate the quality of data and use them for their own decisions, as well as to fully participate in initiatives to foster citizenship in the information age. The private sector should report on its activities using common global standards for integrating data on its economic, environmental and human rights activities and impacts, building on and strengthening the collaboration already established among institutions that set standards for business reporting. Civil society organizations and individuals should hold governments and companies accountable using evidence on the impact of their actions, provide feedback to data producers, develop data literacy and help communities and individuals to generate and use data, to ensure accountability and make better decisions for themselves. Academics and scientists should carry out analyses based on data coming from multiple sources providing long term perspectives, knowledge and data resources to guide sustainable development at global, regional, national, and local scales. They should make demographic and scientific data as open as possible for public and private use in sustainable development; provide feedback and independent advice and expertise to support accountability and more effective decision making, and provide leadership in education, outreach, and capacity building efforts. Therefore, the different stakeholders for big data, which includes owners and users, should ideally emerge into a global data system, or big data ecosystem, to support policy making. However, the challenge will be in how to bring these different stakeholders and systems together to make the data revolution happen. These stakeholders are operating within their systems and procedures and it is important that fora and platforms are being established and managed effectively to make the big data system work. Effective application of Big Data for Development would also require changes in the decision making process, which customarily relies on traditional statistics. Given the high frequency of Big Data, a more responsive mechanism will need to be put in place that allows the government to process the information and act quickly in response. Also, since Big Data is often unstructured and relatively imprecise (compared to official statistics), government officials also have to learn how to effectively interpret and make use of the information provided by Big Data. This requires capacity building to turn decision makers into more sophisticated data users. 20

22 4. Big Data and Policy Making Big data strategies for development can be important tools to formulate policies that also help successfully implementing the SDGs. However, many emerging economies or developing countries are still struggling with collecting and managing much smaller data sets and statistics. While a lot of smaller data exists [24], it is often not integrated, patchy and of low quality. Also, these statistics are often top down and are missing a feedback loop to communities. The big data discussion might overlook the very fact that capacity constraints are one challenge that needs to be systematically addressed as part of the big data discussion Best Practices The discussion of data driven approaches to support policy making commonly distinguishes between two main types and uses of data. The first is the use of public data sets (administrative (open) data and statistics about populations, economic indicators, education etc.) that typically contain descriptive statistics, which are now used on a larger scale, used more intensively, and linked. The second is data from social media, sensors and mobile phones, which are typically new sources for policy making. Best practices are still evolving where innovative approaches complement existing uses of data for policy. According to a study for the EU, the most common uses of big data in policy making include pilots where new sources of data are being used for agendasetting and policy implementation; use of open data for transparency, accountability and participation and using administrative and statistical data for monitoring the outputs and impact of policies. Below (Box 4) an example of a state of the art tool (APPA) that is revolutionizing elements of policy making. Countries in the Asia and the Pacific region, including among others Singapore, Indonesia, Republic of Korea, and the Philippines, as well as the US and Japan are already successfully innovating with and opening up data to solve complex policy problems, increase allocative efficiency and improve democratic processes [25]. Data analysis in the process component of the Policy Circle is more complex than in problem identification because policymakers weigh their decisions on a number of criteria. Data analysis expands from the technical aspects of an issue and focuses on the political costs and benefits of policy reform [3] to posit that policymakers tend to make their decisions based on a number of criteria, including: 1) the technical merits of the issue; 2) the potential affects of the policy on political relationships within the bureaucracy and between groups in government and their beneficiaries; 3) the potential impact of the policy change on the regime s stability and support; 4) the perceived severity of the problem and whether or not the government is in crisis; and 5) pressure, support, or opposition from international aid agencies [26]. 21

23 Rather, big data is an additional means that has huge potential to improve policies. Interestingly, an EU study [27] finds that mostly big data is used at the early stage of the policy cycle, by making use of data and foresight, agenda setting, problem analysis and for identification and design of policy options. According to the study, less than a third of initiatives have a focus on the middle stage policy cycles for the implementation of policies and interim evaluation. Also, this stock take finds that big data is not (yet) playing a crucial role in policy making. If at all, it is used at the agenda setting stage and/or evaluation stage of policy making. One of the reasons might be that because the ecosystem is not yet functioning and crucial elements, such as standards and frameworks are still missing. National governments and other policy makers are just starting to systematically engage with big data for policy making. 4.2 The Policy Cycle There are opportunities for full scale implementation of data driven approaches across all stages of the policy cycle, including evaluation and impact assessment. The following section identifies some data driven approaches in each step of the policy cycle: Figure 5: Policy Making Process [28] 22

24 Policy Cycle Step 1 Agenda Setting: The agenda setting stage is one of the major steps in the policy making cycle. Once a problem requiring a policy solution has been identified, the process of policy development includes how the problem is framed by various stakeholders (issues framing), which problems make it onto the policymaking agenda, and how the policy (or law) is formulated. Together, these steps, determine whether a problem or policy proposal is acted on. Activities in policy development include advocacy and policy dialogue by stakeholders and data analysis to support each step of the process. Issue framing influences stakeholders ability of getting the issue on the policymakers agenda so that a problem is recognized and policy response is debated. Issue framing often sets the terms for policy debate. Agenda setting refers to actually getting the problem on the formal policy agenda of issues to be addressed by presidents, cabinet members, Parliament, Congress, or ministers of health, finance, education, or other relevant ministries. Stakeholders outside of government can suggest issues to be addressed by policymakers, but government policymakers must become engaged in the process for a problem to be formally addressed through policy. Government policymaking bodies can only do so much in its available time period, such as the calendar day, the term of office, or the legislative session. The items, which make it to the agenda pass through a competitive selection process, and not all problems will be addressed. Inevitably, some will be neglected, which means that some constituency will be denied. Among the potential agenda items are holdovers from the last time period or a reexamination of policies already implemented which may be failing [29]. At any given time, policymakers are paying serious attention to relatively few of all possible issues or problems facing them as national or subnational policymakers. In decentralized systems, sometimes issues are placed on the agenda of various levels of government simultaneously to coordinate policymaking. In order not to make things overwhelming, it is key to begin with questions that need to be answered in the policy making process, not with data. Once the setting for the analysis is defined, the focus of the research can move to the behaviors of interest and the consequent data generation process. Key exemplary strategies described in the boxes potentially can move the policy arena forward in a productive way. They are by no means exhaustive. Also, literature is actually missing on how exactly big data has influenced policy making vs. traditional data. Policy Cycle Step 2 Policy Formulation: Policy formulation is the part of the process by which proposed actions are articulated, debated, and drafted into language for a law or policy. Written policies and laws go through many drafts before they are final. Wording that is not acceptable to policymakers key to passing laws or policies is revised. Policy formulation includes setting goals and outcomes of the policy or policies [30]. The goals and objectives may be general or narrow but should articulate the relevant activities and indicators by which they will be achieved and measured. The goals of a policy could include, for example, the creation of greater employment opportunities, improved 23

25 health status, or increased access to reproductive health services. Policy outcomes could include for example ensuring access to ARV treatment for HIV in the workplace or access to emergency obstetric care for pregnant women. Goals and outcomes can be assessed through a number of lenses, including gender and equity considerations. Activities Related to the Process Advocacy, Policy Dialogue, and Data Analysis. While issues framing, agenda setting, and policy formulation are stages that policies go through, each of these stages can include a number of activities, namely advocacy, policy dialogue, and analysis of evidence related to the problem and policy responses. The interpretation of this information will include various policy stakeholders these include the legislature, CSO s and other relevant stakeholders. The executive will have to produce actionable insights with the possible objective of influencing the behaviors of interest considered. This also includes mapping the landscape understanding the policy arena s issues and current challenges. Key players and stakeholders in the policy arena and their relationships to each other need to be identified and mobilized. Big data now allow creating multiple scenarios to understand how the policy landscape may evolve. Also, community participation can be enhanced with mobile technology. Policy Cycle Step 3 Policy Adoption: The policy adoption process is typically still applying the conventional policy institutionalization methods drafting laws and regulations. However, the dissemination of new policies can be faster and wider with the Internet, apps etc. The potential to the compliance and take up of new policies can increase dramatically. Of course, all this information is useless unless it is used to generate insights that leaders can act on. Fortunately, advances in analysis and visualisation tools (interactive charts, infographics, deep zooming applications, etc.) mean it is now feasible to bring granular and up to date evidence to bear on leadership challenges. This applies across the board from analysing and optimising the impact of policies, through to gathering and acting on feedback from citizens on certain policies. In many instances, important sources of big data for learning live outside traditional organisational boundaries [31]. Policy Cycle Step 4 Policy Implementation: Procedures, guidelines and resources need to be made available for policy implementation. SIM Government (Box 3) is one of the few examples available where big data is used for policy implementation. Box 3: SIMGovernment Like the popular computer game SimCity, APPA creates a SimGovernment for policy makers to build possible policies and then test the effects of those policies in a realistic environment. As the amount of data grows and the analytic techniques become more sophisticated, it is possible to measure the impact of policies on other issue landscapes. For example, policy makers could model how a new health policy will affect environmental and educational issues, along with health issues. 24

FREQUENTLY ASKED QUESTIONS July 2015 THE DATA REVOLUTION FAQs opendatawatch.com WHAT IS THE DATA The data revolution is about both the supply of and demand for data. Recent years have seen an exponential

Capturing Meaningful Competitive Intelligence from the Social Media Movement Social media has evolved from a creative marketing medium and networking resource to a goldmine for robust competitive intelligence

June 2013 BIG DATA FOR DEVELOPMENT: A PRIMER Harnessing Big Data For Real-Time Awareness WHAT IS BIG DATA? Big Data is an umbrella term referring to the large amounts of digital data continually generated

Statistical Commission Forty-seventh session 8 11 March 2016 Item 3(c) of the provisional agenda Big Data for official statistics Background document Available in English only Report of the 2015 Big Data

A private sector assessment combines quantitative and qualitative methods to increase knowledge about the private health sector. In the analytic phase, the team organizes and examines information amassed

SOCIAL MEDIA LISTENING AND ANALYSIS Spring 2014 EXECUTIVE SUMMARY In this digital age, social media has quickly become one of the most important communication channels. The shift to online conversation

INTRODUCTORY NOTE TO THE G20 ANTI-CORRUPTION OPEN DATA PRINCIPLES Open Data in the G20 In 2014, the G20 s Anti-corruption Working Group (ACWG) established open data as one of the issues that merit particular

1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

Governance as Stewardship: Decentralization and Sustainable Human Development by Jerry VanSant Research Triangle Institute USA EXECUTIVE SUMMARY Introduction The United Nations Development Programme (UNDP)

EXECUTIVE SUMMARY Big Data is not an uncommon term in the technology industry anymore. It s of big interest to many leading IT providers and archiving companies. But what is Big Data? While many have formed

IBM Software Business Analytics Social Analytics Social Business Analytics Gaining business value from social media 2 Social Business Analytics Contents 2 Overview 3 Analytics as a competitive advantage

A FRAMEWORK FOR NATIONAL HEALTH POLICIES, STRATEGIES AND PLANS June 2010 A FRAMEWORK FOR NATIONAL HEALTH POLICIES, STRATEGIES AND PLANS June 2010 This paper reviews current practice in and the potential

Commonwealth of Australia 2014 This work is copyright. In addition to any use permitted under the Copyright Act 1968, all material contained within this work is provided under a Creative Commons Attribution

The International Centre for Security Analysis The Policy Institute at King s King s College London Workshop Series on Open Source Research Methodology in Support of Non-Proliferation Workshop 1: Exploiting

Customer Service Plan 10/26/11 Executive Summary The United States has a long history of extending a helping hand to those people overseas struggling to make a better life, recover from a disaster or striving

White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively

WHITE PAPER Turning Insight Into Action The Journey to Social Media Intelligence Turning Insight Into Action The Journey to Social Media Intelligence From Data to Decisions Social media generates an enormous

Preamble G8 Open Data Charter 1. The world is witnessing the growth of a global movement facilitated by technology and social media and fuelled by information one that contains enormous potential to create

CORRALLING THE WILD, WILD WEST OF SOCIAL MEDIA INTELLIGENCE Michael Diederich, Microsoft CMG Research & Insights Introduction The rise of social media platforms like Facebook and Twitter has created new

International Open Data Charter September 2015 INTERNATIONAL OPEN DATA CHARTER Open data is digital data that is made available with the technical and legal characteristics necessary for it to be freely

Unlocking The Value of the Deep Web Harvesting Big Data that Google Doesn t Reach Introduction Every day, untold millions search the web with Google, Bing and other search engines. The volumes truly are

W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer

Knowledgent White Paper Series Big Data and Healthcare Payers WHITE PAPER Summary With the implementation of the Affordable Care Act, the transition to a more member-centric relationship model, and other

Harnessing Big Data to Improve Customer Service By Marty Tibbitts The goal is to apply analytics methods that move beyond customer satisfaction to nurturing customer loyalty by more deeply understanding

UNIVERSITY OF MIAMI SCHOOL OF BUSINESS ADMINISTRATION MISSION, VISION & STRATEGIC PRIORITIES Approved by SBA General Faculty (April 2012) Introduction In 1926, we embarked on a noble experiment the creation

SOCIAL MEDIA LISTENING AND ANALYSIS Spring 2014 Our Understanding The rise of social media has transformed the way citizens engage with their government. Each day, nearly 2 billion people talk about and

Financial Services Grabbing Value from Big Data: The New Game Changer for Financial Services How financial services companies can harness the innovative power of big data 2 Grabbing Value from Big Data:

Human mobility and displacement tracking The importance of collective efforts to efficiently and ethically collect, analyse and disseminate information on the dynamics of human mobility in crises Mobility

YOUTH AND ICT HIGHLIGHTS Almost half the world's population is under the age of 25 and nearly a quarter are aged 12 to 24. Of those aged 12-24, nearly 40% live on less than two dollars a day. Youth employment

BIG Data An Introductory Overview IT & Business Management Solutions What is Big Data? Having been a dominating industry buzzword for the past few years, there is no contesting that Big Data is attracting

3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

Social Media for Competitive and Market Intelligence Stockholm 18.11.2015 Susanna Tirkkonen Head of social media consultancy susanna.tirkkonen@m-brain.com @susannatirkkone Agenda 1. Five main challenges

NATIONAL INFORMATION BOARD Personalised Health and Care 2020 WORK STREAM 1.2 ROADMAP Enable me to make the right health and care choices Providing citizens with access to an assessed set of NHS and social

5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

MAKING SENSE OF THE CONNECTED CONSUMERS: THE BUSINESS PERSPECTIVE Professor Feng Li Business School, Newcastle University, UK Feng.Li@ncl.ac.uk; +44 (0)191 222 7976 We have entered the digital economy,

The Real Questions about Social Media Monitoring/Web Listening Should this new marketing discipline be called social media monitoring or web listening? Or any of the other 10 terms identified in this paper?

The Impact of Big Data on Social Research David Rhind Sharon Witherspoon 1 www.nuffieldfoundation.org The landscape to be covered What is Big Data? Just consultants hype? Key questions for SRA Technology

Public Data Visualization and Social Connectivity Introduction For a society that seeks inclusive participation from citizens towards governance, open data is key. Giving public data to citizens in the

United Nations E/CN.3/2016/6* Economic and Social Council Distr.: General 17 December 2015 Original: English Statistical Commission Forty-seventh session 8-11 March 2016 Item 3 (c) of the provisional agenda**

The future agenda for development cooperation: voices of Dutch society Contribution prepared for the High Level Panel on the Post 2015 development agenda - March 2013 Prepared by NCDO, in cooperation with

IBM Software Thought Leadership White Paper July 2011 The case for Centralized Customer Decisioning A white paper written by James Taylor, Decision Management Solutions. This paper was produced in part

Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI vnn@umich.edu What

RESEARCH REPORT Executive Summary: Navigant Research Leaderboard Report: Assessment of Strategy and Execution for 16 NOTE: This document is a free excerpt of a larger report. If you are interested in purchasing

Why is BIG Data Important? March 2012 1 Why is BIG Data Important? A Navint Partners White Paper May 2012 Why is BIG Data Important? March 2012 2 What is Big Data? Big data is a term that refers to data

Leap Ahead Procurement Goes Social In the past five years, social media has become an integral part of the lives of people everywhere. In response, companies have been embracing social media with equal

DEVELOP INSIGHT DRIVEN CUSTOMER EXPERIENCES USING BIG DATA AND ADAVANCED ANALYTICS by Dave Nash and Mazen Ghalayini; Contributions by Valentin Grasparil This whitepaper is the second in a 3-part series

Big Data better business benefits Paul Edwards, HouseMark 2 December 2014 What I ll cover.. Explain what big data is Uses for Big Data and the potential for social housing What Big Data means for HouseMark

Customer Segmentation in the Age of Big Data By: Michael Million Moving Beyond Traditional Segmentation Traditional customer segmentation is at the heart of every marketing organization, giving companies

A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities The first article of this series presented the capability model for business analytics that is illustrated in Figure One.

BIG DATA: IT MAY BE BIG BUT IS IT SMART? Turning Big Data into winning strategies A GfK Point-of-view 1 Big Data is complex Typical Big Data characteristics?#! %& Variety (data in many forms) Data in different

2013 Use of social media channels has exploded over the last several years. As an example of this growth, Facebook and Twitter now have over one billion and 500 million users worldwide, respectively, while