Memorandum: Unleashing the Power of Data to Serve the American People
To: The American People
From: Dr. DJ Patil, Deputy U.S. CTO for Data Policy and Chief Data Scientist
Date: February 20, 2015

Overview: What Is Data Science, and Why Does It Matter?

The data age has arrived. From crowd-sourced product reviews to real-time traffic alerts, “big data” has become a regular part of our daily lives. In 2013, researchers estimated that there were about 4 zettabytes of data worldwide: That’s approximately the total volume of information that would be created if every person in the United States took a digital photo every second of every day for over four months! The vast majority of existing data has been generated in the past few years, and today’s explosive pace of data growth is set to continue. In this setting, data science -- the ability to extract knowledge and insights from large and complex data sets -- is fundamentally important.

While there is a rich history of companies using data to their competitive advantage, the disproportionate beneficiaries of big data and data science have been Internet technologies like social media, search, and e-commerce. Yet transformative uses of data in other spheres are just around the corner. Precision medicine and other forms of smarter health care delivery, individualized education, and the “Internet of Things” (which refers to devices like cars or thermostats communicating with each other using embedded sensors linked through wired and wireless networks) are just a few of the ways in which innovative data science applications will transform our future.

The Obama administration has embraced the use of data to improve the operation of the U.S. government and the interactions that people have with it. On May 9, 2013, President Obama signed Executive Order 13642, which made open and machine-readable data the new default for government information. Over the past few years, the Administration has launched a number of Open Data Initiatives aimed at scaling up open data efforts across the government, helping make troves of valuable data -- data that taxpayers have already paid for -- easily accessible to anyone. In fact, I used data made available by the National Oceanic and Atmospheric Administration to improve numerical methods of weather forecasting as part of my doctoral work. So I know firsthand just how valuable this data can be -- it helped get me through school!

Given the substantial benefits that responsibly and creatively deployed data can provide to us and our nation, it is essential that we work together to push the frontiers of data science. Given the importance this Administration has placed on data, along with the momentum that has been created, now is a unique time to establish a legacy of data supporting the public good. That is why, after a long time in the private sector, I am returning to the federal government as the Deputy Chief Technology Officer for Data Policy and Chief Data Scientist.

Organizations are increasingly realizing that in order to maximize their benefit from data, they require dedicated leadership with the relevant skills. Many corporations, local governments, federal agencies, and others have already created such a role, which is usually called the Chief Data Officer (CDO) or the Chief Data Scientist (CDS). The role of an organization’s CDO or CDS is to help their organization acquire, process, and leverage data in a timely fashion to create efficiencies, iterate on and develop new products, and navigate the competitive landscape.

The Role of the First-Ever U.S. Chief Data Scientist

Similarly, my role as the U.S. CDS will be to responsibly source, process, and leverage data in a timely fashion to enable transparency, provide security, and foster innovation for the benefit of the American public, in order to maximize the nation’s return on its investment in data.

So what specifically am I here to do? As I start, I plan to focus on these four activities:

Providing vision on how to provide maximum social return on federal data.

Working with agencies to establish best practices for data management and ensure long-term sustainability of databases.

Recruiting and retaining the best minds in data science for public service to address these data science objectives and act as conduits among the government, academia, and industry.

As I work to fulfill these duties across the Administration, I’ll be focusing on several priority areas, including:

Precision medicine. Medical and genomic data provides an incredible opportunity to transition from a “one-size-fits-all” approach to health care towards a truly personalized system, one that takes into account individual differences in people’s genes, environments, and lifestyles in order to optimally prevent and treat disease. We will work through collaborative public and private efforts carried out under the President’s new Precision Medicine Initiative to catalyze a new era of responsible and secure data-based health care.

Usable data products. The President’s Executive Order 13642 on machine-readable data gives us a tremendous opportunity to productively connect unique data sets. The challenge is that open data is necessary, but not always sufficient, to create value and drive innovation. For example, the binary 0s and 1s that allow a computer to generate an MRI are of little use to a patient -- it is the computationally rendered MRI image that communicates the information locked inside of that binary data. We will work to deliver not just raw datasets, but also value-added “data products” that integrate and usefully present information from multiple sources.

Responsible data science. We will work carefully and thoughtfully to ensure data science policy protects privacy and considers societal, ethical, and moral consequences.

Data will continue to transform the way we live and work. I am eager to get started as the first U.S. CDS, and I look forward to providing regular updates on our progress.