Final Big Data strategy

Comments (0)

Transcript of Final Big Data strategy

Big Data from a statistical view

MissionVisionObjectivesEnhance the reach and use of statistical informationEnsure that the privacy of individuals is protectedMaintain public trust in the integrity of the office Continue to drive costs lower and be resource efficient through effective partnering with stakeholdersTo become one of the world leaders in the utilization of Big Data for statistical productionBy aligning ourselves and collaborating with the front runners in statistical modernization, we will ensure world class best practice in our Big Data research, development and application for statistical process improvementBig dataVolume (Size)Velocity (speed)Structured data (numeric in traditional databases)Semi-structured data(may contain tags or other meta-data to organise it)Unstructured data(e.g. text documents, emails, videos)

Global Partnership for Sustainable Development Data (GPSSD) could promote several initiatives:

World forums for sustainable developmentUser forums to ensure feedback loopsPartnerships & coordination for data sharingSDGs analysis & visualization platformsTo support the data revolution for sustainable development a proposal should be built on the following pillars:

Private sector participation (leverage resources and creativity of public sector)Capacity developmentGlobal data literacy (investment to increase global data literacy)A comprehensive strategy towards a new Global Consensus on Data needs to be developed to enable cooperation, including:

Development & adoption of specific principles related to the data revolutionAccelerate the development and adoption of legal, technical, geospatial & statistical standardsAn establishment of a Network of data innovation networks is recommended including:

Leverage emerging data sources for SDG monitoring (i.e data from private sector)Develop systems for global data sharing (e.g Roambi)Fill research gaps (i.e. engage research centres, innovators and govts. in the development of available data analytics tools

The Revolution of DataTo assist in the exchange of information and promote and protect the rights of individuals by:promoting & adopting specific principles related to the data revolutionAccelerate the development and adoption of legal, technical, geospatial and statistical standards (transparency in the exchange of data and metadata)Modernizing InitiativesSWOT for Big Data in Stats SAReduce measurement errorsReduce response burden for retailers

Reduce cost of data collection

The Big Data promise to the strategy...IncreaseIncrease in information = more accurate statistical results Requires investmentThe African CharterPromote & protect human and people's rights through equality, freedom, justice & dignity. These cannot be dissiociated from economic & social rights Agenda 2063Aspires a prosperous African continent based on inclusive growth and sustainable developmentAims to build a strong, united continent that is an influencial global player & partnerPromotes good governance & democracyWhy Big Data?The MTSF...Links to SDGsBig Data promises to...Provide easier and more accurate to access and is readily availableData & technological infrastructure that requires investment but will be cheaper in the long runNear-real time data for more accurate and relevant statisticsInstitutional stability.Authorised by law to collect data.Products comply with high standards.Being in control of large survey projects.Trained staff members are already close to the "data science" skills.The culture may be less tuned to Big Data era.Long and slow programming and budget cycles.No or little control over total budget to invest in Big Data.lacking methods for providing reliable official staistics based on Big Data sources.New scalable infrastructure required.Insufficient human capacity to work with Big Data as compared to private sector.

OSC organizations are trusted third parties.Information based on Big Data may be certified by OSC organisation.Big Data attracts young minds.OSC being out-competed by other big data actors.OSC organisations may be perceived to loose relevance.Continued investment in "Legacy Systems".StrengthsWeaknessesOpportunitiesThreatsThe Statistical community face great challenges, to achieve the 17 SDGs, statistics need to be transformed to meet data demands.

The current Stats SA Strategy aims to:Expand its statistical information baseDevelop new and innovative statistical products & servicesRevolutionize data systemsPriority to expand, modernize and increase the affordability and accessibility of information.There's a growing demand for statistics data of all geographic levels, that is specific and evidence-based to inform sustainable planning and development of SAThe National Development Plan (NDP)SA needs to sharpen its innovative edge and contribute to global scientific & technological advancements:Need for investment in research & development and better use of existing resourcesEnhance cooperation between public science & technology institutions & private sector (to facilitate innovation)All South Africans should be able to acquire & use information and be able to use it (transparency)To manage information, communications & technology environment need to be better structured to ensure SA does not fall victim to "digital divide"International examples, local applicationsThe High Level GroupPrice Scanner DataFor Stats SA...After the introduction of scanner data, response increased to 20%Mainly collected by using web questionnaires Questionnaire response was 13% Statistics South Africa in 2015 has registered a scanner data project.Explorative project, with the aim of determining the suitability of using scanner data in South Africa. StatsSA needs to be prepared for transmission errors from one database to another that may occur (could affect statistics analysis and results)Satellite Imagery for Agricultural StatisticsAgriculture statistics in AustraliaSatellite imagery & drones used for data collection (Remote Sensing). Pursuing classification of satellite data at the crop type level...in South Africafor Stats SA...Detailed data on sales of consumer goods, obtained by scanning the bar codes for individual products at electronic points of sale in retail outlets.Statistics Sweden started data collection late 2011. It was done in parallel with the traditional collection method (Paper/web questionnaire). FTP account (site) established for data transmission channel.Tourism & Migration StatisticsMobile positioning data for travel, tourism and population statisticsIn South Africa...Simseek is an application that tracks any device that uses a sim card.How Stats SA can use this dataStatistics Estonia use Location based services to:Study the human movements in, out and around EstoniaGet real time access to data in the fieldTrack inbound & overnight travels at smaller scaleEstimate the population Social MediaHow Social Media can be used:To keep track of current affairs through trend analysis. Sentiments/ people's behaviour (i.e. towards economic changes). 9 in 10 people access internet every day. (94% of the 16.9 million population). 8.8 million active social media users. Twitter in Netherlands is used by 3.5 million active users and 1.5 million daily users. (2013)In 2014 Statistics Netherlands started to investigate the potential of using twitter messages for official statistics. Dutch twitter messages were studied from two perspectives: Content and Sentiment.The sentiment in messages was found to be highly correlated with consumer confidence. (Sentiments regarding the economic situation).UNITED NATIONS 2014Paper written by Sangita Dubey & Pietro Gennari (2014): Now-casting Food Consumer Price Indexes With Big Data: Public-Private ComplimentariesThe Billion Prices ProjectStarted as an academic research in 2006 by MIT (Massachusetts Institute of Technology).To study inflation and pricing behaviour of online items globally.Collect online prices by the use of web-scrapping.Research findings were presented at the 2014 U.N conference on Big Data for official statistics Retailers that trade online There's potential use of the data however only a small population use online trading. The possibilities...BPP Explained...Online and official prices indexesThe results....To oversee development of frameworks, sharing of information, tools & methods and coordinates work relating to the use of big data for purposes of official statistics.Was set up by the Bureau of the Conference of European Statisticians in 2010.Scanner data for calculating Consumer Price IndexIn Estonia, mobile positioning data has been used since 2006.By tracking the location of mobile devices geographically.By the use of built-in GPS and phone data (Transactions, Call & sms activities, mobile antennas).Research project, started in 2014Monitors the movement of mobile phones.Potential source of census data, migration patterns, tourism and travel statistics. Census dataTravel and tourism Migration dataOngoing research, started in 2013To explore the possibility of using satellite imagery data for agriculture and crop area statistics.Social Media as a potential data source for official statistics National Landcover dataset 2014.72 classes (landcover & landuse). Cultivation type analysis of agricultural fields and the area for each type.SANSA SPOT Imagery.Time series analysis (2 seasonal mosaics).20112014Geography division, EA polygon boundaries (Area size)Landcover change detection. Agricultural statistics to include small-scale & subsistence agriculture. Advanced methods, tools & infrastructure to represent, store, manipulate, integrate & analyse complex dataSpecific Actions:Enhance data integration methods, tools & infrastructureModernise statistical practice for non-traditional data sourcesIntroduce new approaches to data modelling & analysisEvaluate & deploy high performance computing platformsA diverse pool of government, private & open data sources available for statistical purposesSpecific actions:Facilitate the sharing of private data for public goodTrail the targeted use of external data provision servicesSafe and appropriate public access to microdata sets & statistical solutions derived from an array of data sources Specific action:Develop microdata access solutions & lead national adoption of privacy preserving data analyticsA skilled workforce to be able to intepret information needs & communicate the insights gathered from rich dataSpecific action:Build & Share competency in data scienceStrong multidisciplinary partnerships across government, industry, academia & the statistical communitySpecific actions:Support & levarage external system development initiativesEstablish a broad collaboration network to advance specific Big Data initiatives2014 InitiativesTransactional sources (from banks/ telecommunication providers/ retail outlets)Sensor data sources.Social network source, image/video-based sources. 2015 InitiativesSatellite imagesTrade dataSocial data from twitterEnterprise websitesMobile dataMotive for exploring Scanner DataFocus on low response burden EffectivenessImproving data qualityRapidly changing prices and more complex pricing structures.Consumer behaviour is changingIncreasing availability of electronically dataScanner data has been introduced step-by-step in the Norwegian CPI.Statistics Norway has a policy not to pay for any data used in official statistics in accordaance with the Statistics Act of 1989.