2 Advances in information technologies are transforming the fabric of our society and data represents a transformative new currency for science, engineering, education and commerce. Image Credit: CCC and SIGACT CATCS

3 Where do the data come from? Why do we have a national initiative?

4 The Big Data Landscape I: Big Science Science gathers data at an ever-increasing rate across all scales and complexities of natural phenomena Sloan Digital Sky Survey in 2000 collected more data in its 1 st few weeks than had been amassed in the entire history of astronomy Within a decade, over 140 terabytes of information collected Large Hadron Collider generates scores of petabytes a year The proposed Large Synoptic Survey Telescope (3.3 gigapixel digital camera) will generate 40 terabytes of data nightly By 2015, the world will generate the equivalent of approximately 93 million Libraries of Congress

7 Communications Volume & Traffic Diversity VoIP Video Twitter Broadband 663M registered Skype users in Represents 20% of long distance minutes world-wide. If Skype were a carrier, it would be the 3rd largest in the world (behind China Mobile and Vodaphone). Largest provider of cross-border communication. Recent estimates as high as 60% of internet traffic is video and music sharing; 35 hours of new videos are uploaded every minute in 2011; 2 billion views per day. Currently 175 million registered users. 20% of global internet users have residential broadband; d; 68% in US subscribe be to broadband. d Mobile 5.3 billion mobile phone subscribers; 85% of new handsets will be able to access the mobile web; 1 in 5 has access to fast service, 3G or better; IM, MMS, SMS expected to exceed 10 trillion message by 2013.

8 The Big Data Landscape IV: The Long Tail of fscience Hundreds of thousands of scientists and engineers work individually or in small, distributed, disconnected groups all generating data that collectively represent an enormous, largely untapped scientific resource From running simulations, experiments, etc. Making heterogeneous data across many areas of science more homogeneous could give way to breakthroughs across all areas of science and engineering Estimated 40 exabytes of unique new information generated worldwide in 2010 Only 5% of the information created is structured, however, in a standard format of words or numbers; the rest are unstructured text, voice, images, etc.

10 Not Just Volumes of Data The science of big data is not just about volumes and velocity of data, but also Heterogeneity and diversity Levels of granularity Media formats Scientific ifi disciplines i Complexity Uncertainty Incompleteness Representation types

11 Why is Big Data Important? Critical to transforming how science is done and to accelerating the pace of discovery in almost every science and engineering discipline Transformative implications for commerce and economy Potential for addressing some of society s most pressing challenges Image Credit: Chi Birmingham

13 The Age of Data: From Data to Knowledge to Action Data-driven discovery is revolutionizing i i scientific exploration and engineering innovations Automatic extraction of new knowledge about the physical, biological and cyber world continues to accelerate Multi-cores, concurrent and parallel algorithms, virtualization and advanced server architectures will enable data mining and machine learning, and discovery and visualization of Big Data

15 From Data to Knowledge to Action Researchers seek to fundamentally transform understanding of spinning giants The task involves assembling data from more storm variables-such as updraft, downdraft and vorticity or g p, y regions of spin--than what can be observed from ground tornado chasers or even actually produced in the atmosphere. For example, researchers need to better understand how changes in wind direction with height cause the updrafts in a storm to rotate, preceding the formation of a tornado. To solve the quandary, Amy McGovern, an associate professor in OU's School of Computer Science, and her team create tornado models with super computers that can process vast amounts of data. McGovern and her colleagues use the models to analyze how storm variables interact in order to identify tornadic and nontornadic storms.

16 Examples of Research Challenges More data are being collected than we can store Analyze the data as it becomes available Decide what to archive and what to discard Many data sets are too large to download Analyze the data wherever it resides Many data sets are too poorly organized to be usable Better organize and retrieve data Many data sets are heterogeneous in type, structure, semantics, organization, granularity, accessibility Integrate and customize access to federate data Utility of data is limited by our ability to interpret and use it Extract and visualize actionable knowledge Evaluate results Large and linked datasets may be exploited to identify individuals Design management and analysis with built-in i privacy preserving characteristics

17 A National Imperative PCAST calls on the Federal government to increase R&D investments for collecting, storing, preserving, managing, g, analyzing, and sharing the increasing quantities of data. Furthermore, PCAST observed that the potential to gain new insights to move from data to knowledge to action has tremendous potential to transform all areas of national priority. Source: PCAST (December 2010), Report to the President and Congress: Designing a Digital Future a periodic congressionally-mandated review of the Federal Networking and Information Technology Research and Development (NITRD) Program.

22 Strategy to Address Big Data Foundational research to develop new techniques and technologies to derive knowledge from data New cyberinfrastructure to manage, curate, and serve data to research communities ii Policy New approaches for education and workforce development New types of inter disciplinary collaborations, grand challenges, and competitions

29 Ideation Contest Launch Opportunity to expand the innovation ecosystem Joint among NASA, NSF and DOE Office of Science A contest focused on How to make heterogeneous data seem more homogeneous? 5 judges 5 criteria Launched on Challenge.gov and the Top Coder platform on Oct. 3 with a two week window topcoder com/coeci/nitrd/

30 Ongoing Big Data Programs at NSF Dear Colleague Letters: Encourage CIF21 IGERTs to educate and support a new generation of researchers able to address fundamental Big Data challenges: Data-Intensive t Education-Related t d Research Funding Opportunities announcing an Ideas Lab, for which cross disciplinary participation will be solicited, to generate transformative ideas for using large datasets to enhance the effectiveness of teaching and learning environments: Data Citation to the Geosciences Community to encourage transparency and increased opportunities for the use and analysis of data sets:

31 Earthcube: GEO Science Infrastructure EAGER awards announced as part of White House Big Data Launch Integrates geosciences data and high-performance computing technologies in an open, adaptable and sustainable framework to enable transformative research and education in Earth System Science Innovative Model: Community designed, community owned, community governed Interdisciplinary research: Building and sustaining new communities Workshops to bring together (GEO, SBE, CISE) communities EAGER awards to seed new research

32 A Complex Policy Setting Researchers want data. Public policy requires access to data. Public policy also requires protection of privacy and intellectual property and other sensitive information. Much more to be done: Policy on data management and data access.

34 Opportunities for the Future Our investments in research and education have already returned exceptional dividends to the Nation. Many of tomorrow s breakthroughs will occur as a result of new techniques and technologies for advancing Big Data science and engineering. In turn, Big Data scientific discovery and technological innovation are at the core of our response to national and societal challenges from environment, energy, transportation, sustainability, and healthcare to cyber security and national defense.

Good morning. It is a pleasure to be with you here today to talk about the value and promise of Big Data. 1 Advances in information technologies are transforming the fabric of our society and data represent

CC-NIE PI Workshop Plenary Farnam Jahanian April 30, 2014 Image Credit: Exploratorium. Pervasive Impact We are at the center of an ongoing societal transformation and will be for decades to come. Advances

NITRD and Big Data George O. Strawn NITRD Caveat auditor The opinions expressed in this talk are those of the speaker, not the U.S. government Outline What is Big Data? Who is NITRD? NITRD's Big Data Research

Big Data George O. Strawn NITRD Caveat auditor The opinions expressed in this talk are those of the speaker, not the U.S. government Outline What is Big Data? NITRD's Big Data Research Initiative Big Data

Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

COGNITIVE SCIENCE AND NEUROSCIENCE Overview Cognitive Science and Neuroscience is a multi-year effort that includes NSF s participation in the Administration s Brain Research through Advancing Innovative

The Past, Present, and Future of Data Science Education Kirk Borne @KirkDBorne http://kirkborne.net George Mason University School of Physics, Astronomy, & Computational Sciences Outline Research and Application

! Efficiency in scientific discovery through curation, analyses and interpretation of massive datasets! Uptake level and concentration on Big Data opportunities are varied across disciplines The nature

Government Technology Trends to Watch in 2014: Big Data OVERVIEW The federal government manages a wide variety of civilian, defense and intelligence programs and services, which both produce and require

RISK AND RESILIENCE $58,000,000 +$38,000,000 / 190.0% Overview The economic competiveness and societal well-being of the United States depend on the affordability, availability, quality, and reliability

Challenges in e-science: Research in a Digital World Thom Dunning National Center for Supercomputing Applications National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

The Research Data Revolution 2015 Harvard/Purdue Data Symposium Sayeed Choudhury Data Conservancy (DC) One of five awards through US National Science Foundation s (NSF) DataNet program $10 million award

SECURE AND TRUSTWORTHY CYBERSPACE (SaTC) Overview The Secure and Trustworthy Cyberspace (SaTC) investment is aimed at building a cybersecure society and providing a strong competitive edge in the Nation

Overcoming the Technical and Policy Constraints That Limit Large-Scale Data Integration Revised Proposal from The National Academies Summary An NRC-appointed committee will plan and organize a cross-disciplinary

The Packard Fellowships for Science and Engineering 2016 Guidelines The Packard Fellowships for Science and Engineering program invests in future leaders who have the freedom to take risks, explore new

Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI vnn@umich.edu What

The National Consortium for Data Science (NCDS) A Public-Private Partnership to Advance Data Science Ashok Krishnamurthy PhD Deputy Director, RENCI University of North Carolina, Chapel Hill What is NCDS?

1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

Introducing the federal cybersecurity R&D strategic plan Douglas Maughan, Bill Newhouse, and Tomas Vagoun In December 2011, the White House Office of Science and Technology Policy (OSTP) released the document,

MEDICAL DATA MINING Timothy Hays, PhD Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 2 Healthcare in America Is a VERY Large Domain with Enormous Opportunities for Data

Community of Science: Strategies for Coordinating Integration of Data USGS Community for Data Integration Kevin T. Gallagher USGS Core Science Systems January 11, 2013 U.S. Department of the Interior U.S.

UC AND THE NATIONAL RESEARCH COUNCIL RATINGS OF GRADUATE PROGRAMS In the Fall of 1995, the University of California was the subject of some stunning news when the National Research Council (NRC) announced

Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing. Dr Liz Lyon, UKOLN, University of Bath Introduction and Objectives UKOLN is undertaking

Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

Astrophysics with Terabyte Datasets Alex Szalay, JHU and Jim Gray, Microsoft Research Living in an Exponential World Astronomers have a few hundred TB now 1 pixel (byte) / sq arc second ~ 4TB Multi-spectral,

Stampede supercomputer enables discoveries throughout science and engineering 20 June 2014, by Aaron Dubrow Volume rendering of the entropy in a full 3-D GRMHD simulation of a differentially rotating and

Accelerating Cross-Sectoral Collaboration on Data in Climate, Education and Health A Workshop on Data Sharing and Emerging Data Collaboratives U.S. General Services Administration Building 1800 F Street,

The Tonnabytes Big Data Challenge: Transforming Science and Education Kirk Borne George Mason University Ever since we first began to explore our world humans have asked questions and have collected evidence

Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Real Time

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India Call for Papers Colossal Data Analysis and Networking has emerged as a de facto

Survey of Canadian and International Data Management Initiatives By Diego Argáez and Kathleen Shearer on behalf of the CARL Data Management Working Group (Working paper) April 28, 2008 Introduction Today,

White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability

THE FEDERAL BIG DATA RESEARCH AND DEVELOPMENT STRATEGIC PLAN THE NETWORKING AND INFORMATION TECHNOLOGY RESEARCH AND DEVELOPMENT PROGRAM April 2016 MAY 2016 About this Document This report was developed

EXECUTIVE SUMMARY Big Data is not an uncommon term in the technology industry anymore. It s of big interest to many leading IT providers and archiving companies. But what is Big Data? While many have formed