3
Dhanurjay Patil in the News First White House Data Chief Discusses His Top Priorities (March 17, 2015) – DJ Patil talks about how to get more our of public and private information while protecting that data from abuse. On Demand Webinar Featuring New Federal Chief Data Scientist DJ Patil and Hilary Mason (February 24, 2015) – Co-authors of a book you can download for free from O'Reilly called Data Driven. The President Speaks At Hadoop World: Introduces DJ Patil as Nation’s First Chief Data Scientist (February 21, 2015) – For the first time in the history of Hadoop World, the President of the United States gave a keynote. DJ Patil Scores the Sexiest Job in D.C. (February 20, 2015) – He co-wrote a paper that appeared in the October 2012 Harvard Business Review. Unleashing the Power of Data to Serve the American People (February 20, 2015) – “to responsibly source, process, and leverage data in a timely fashion to enable transparency, provide security, and foster innovation for the benefit of the American public, in order to maximize the nation’s return on its investment in data.” The White House Names Dr. DJ Patil as the First U.S. Chief Data Scientist (February 18, 2015) – Data science leadership on the Administration’s momentum on open data and data science. Data Driven: Creating a Data Culture (with Hilary Mason) (January 5, 2015) – Succeeding with data requires real cultural change and building a data culture is the key to success in the 21st century. "Tim O’Reilly: The World’s 7 Most Powerful Data Scientists (November 2, 2011) – DJ Patil and Jeffrey Hammerbacher are numbers 2 and 3 3

4
U.S. Data Chief Aims to Empower Citizens with Information Previous experience: Salesforce, EBay, Skype, PayPal and LinkedIn. He’s also done work for NOAA to improve weather forecasting and with the Defense Department. His data science team at LinkedIn Corp. — the first of its kind—built the “People You May Know” button that helped launch the social platform by nudging users to link to their professional contacts. MEETUP In Silicon Valley, Mr. Patil is known as an evangelist for all things data The government’s role in making sure data isn’t misused and helping citizens take advantage of massive federal databases. Mr. Patil said he was drawn to Washington by several White House initiatives. In the past two years, the administration established real-time dashboards that measure how multimillion-dollar information technology infrastructure projects are coming along; an online “blue button” that individuals can click to download their health records held in different silos across the government; and a “green button” that allows citizens to download statistics on their energy use. National directory of farmer’s markets to a compendium of consumer complaints However, it remains to be seen whether Mr. Patil will tackle the thorniest issues raised by big data. Privacy advocates have decried the Wild West atmosphere in which companies collect user data without permission, sell it to advertisers and others, and use behavioral science to discern intimate desires and habits. His goal is to design simple, powerful features like a People You May Know button, for government data, he said. After all, not everyone will go to data.gov to download a spreadsheet. “The best data products often don’t show you any data,” he said. “They facilitate an end goal. They help you reach something more efficient.” One of Mr. Patil’s first projects involves tracking the reasons people visit government websites and the steps they take to move through them. He aims to reduce those steps and anticipate people’s needs. Another is an initiative called “Precision Medicine,” announced in the president’s State of the Union address in January. Precision Medicine will involve about a million citizens who agree to participate in a longitudinal health study. Volunteers will wear digital health trackers and have their genes sequenced. Officials will use the resulting data to find patterns between lifestyle factors and genetic predispositions. Mr. Patil hopes the information will offer clues to who is likely to get sick or respond to certain treatments. 4

5
Back-of-the-Napkin Math, with DJ Patil: #PiDay Challenge “Imagine you have a rope snug all the way around the equator. Now you need to add some rope so that the rope is 2 inches above the ground all the way around. How much rope do you need to add?” The really cool thing about this problem is that it goes counter to our intuition. We think of the Earth’s equator as huge and therefore the amount of rope needed to be added must be large. But let’s do the math. The equator of the Earth is a circle and the circumference is 2 x pi x r (or 2πr). The radius with the height 2 inches off the ground would be r+2 inches, and the circumference would then be 2 x pi x (r+2) (or 2π(r+2)). Let’s subtract these to get how much rope we would need and we get 2 x pi x r - 2 x pi x (r+2). Doing some quick math, you get 2 x pi x (r+2) - 2 x pi x r = 2 x pi x r – 2 x 2 x pi - 2 x pi x r = 4 x pi. There you have it. All you would have to add is 4 x pi inches or approximately inches of rope! Crazy, but true. That’s why you need math! https://www.whitehouse.gov/blog/2015/03/16/back-napkin-math-dj-patil-piday-challenge 5

6
First White House Data Chief Discusses His Top Priorities At the top of my list right now is the Precision Medicine Initiative. Science has enabled us to unlock the human genome. Now we want to combine that with the power of data science, which uses new techniques like machine learning as well as the explosion of data now available about individual patients, whether through their phones or other sensors in their environment. The challenge is putting this together to come up with new ways to think about health care and medical treatments.human genome – Semantic Medline and Natural Medicine for Disease and Wellness Meetup My second priority is opening up more data and making it available for people [both the government and general public] to build an ecosystem of research, mobile apps and visualizations on top of that information. – Semantic Community and Federal Big Data Working Group Meetup The third main priority is inserting more data capacity into agencies throughout the government. We’re seeing a rise of data scientists and chief data officers at the National Institutes of Health as well as within [the Department of] Health and Human Services. The Commerce Department announced its first chief data officer [Ian Kalin] last week. We have to decide how to use the best of what we see in data science and statistics groups throughout the government to develop new services.National Institutes of HealthHealth and Human Servicesannounced its first chief data officer – Federal Big Data Working Group Meetup and Eastern Foundry 6

7
On the Case at Mount Sinai, It’s Dr. Data 1 Jeffrey Hammerbacher is a number cruncher — a Harvard math major who went from a job as a Wall Street quant to a key role at Facebook to a founder of a successful data start-up. But five years ago, he was given a diagnosis of bipolar disorder, a crisis that fueled in him a fierce curiosity in medicine — about how the body and brain work and why they sometimes fail. The more he read and talked to experts, the more he became convinced that medicine needed people like him: skilled practitioners of data science who could guide scientific discovery and decision-making. 7

8
On the Case at Mount Sinai, It’s Dr. Data 2 Now Mr. Hammerbacher, 32, is on the faculty of the Icahn School of Medicine at Mount Sinai, despite the fact that he has no academic training in medicine or biology. He is there because the school has begun an ambitious, well-funded initiative to apply data science to medicine. His group’s objective is to alter how doctors treat patients someday. For example, Mount Sinai medical researchers have done promising work on personalized cancer treatments. It involves the genetic sequencing of a patient’s healthy cells and cancer tumor. Once the misbehaving gene cluster is identified and analyzed, it is targeted with tailored therapies, drugs or vaccines that stimulate the body’s defenses. 8

9
Data Science for Natural Medicines and Epigenetics 9

10
Natural Medicine for Disease and Wellness Meetup 10

11
The Birth of Demand-Driven Open Data Previous experience: – Built an online marketplace for medical services called Symbiosis Health. I made use of three datasets across different HHS organizations. – But I did so with great difficulty. Each had deficiencies which I thought should be easy to fix. It might be providing more frequent refreshes, adding a field that enables joins to another dataset, providing a data dictionary or consolidating data sources. If only I could have told someone at HHS what we needed! Project champions: – Keith Tucker and Cynthia Colton, Enterprise Data Inventory (EDI) Leads in the Office of the Chief Information Officer (OCIO). – Damon Davis, Health Data Initiative and HealthData.gov Lead. What is it: – A framework of tools and methods to provide a systematic, ongoing and transparent mechanism for industry and academia to tell HHS what data they need. – Lean Startup approach to open data to minimize up front development, acquiring customers before you build the product. Bigger picture: – HHS’s existing Health Data Initiative (HDI) and HealthData.gov – DDOD to serve as the community section of HealthData.gov. Get involved in two ways: – Get the word out to your network about the opportunities provided by DDOD – Add use cases to 11

12
Demand-Driven Open Data DDOD is a mechanism to tell data owners what's most valuable to you: – Demand-Driven Open Data (DDOD) is a framework of tools and methodologies to provide a systematic, ongoing and transparent mechanism for you to tell public data owners what's most valuable. – All work is entered, prioritized, implemented, and validated in the form of "use cases". This approach allows for all projects to have a known value even before work begins. It is the Lean Startup approach to open data initiatives. Use Cases: – Use cases initially get entered and discussed in as Github issues (https://github.com/demand-driven-open-data/ddod-intake/issues) and linked to related wiki entrieshttps://github.com/demand-driven-open-data/ddod-intake/issues Specifications: – Detailed specifications for each use case are described in the intake wiki (https://github.com/demand-driven-open-data/ddod-intake/wiki) and linked to related issue entrieshttps://github.com/demand-driven-open-data/ddod-intake/wiki 12

13
Key Questions Will Precision Medicine include Natural Medicine for Disease and Wellness? – Precision Medicine: Medical and genomic data provides an incredible opportunity to transition from a “one-size-fits-all” approach to health care towards a truly personalized system, one that takes into account individual differences in people’s genes, environments, and lifestyles in order to optimally prevent and treat disease. We will work through collaborative public and private efforts carried out under the President’s new Precision Medicine Initiative to catalyze a new era of responsible and secure data-based health care. How does Demand-Driven Open Data fit with DJ Patil’s Top Three Priorities? – Usable Data Products: The President’s Executive Order on machine-readable data gives us a tremendous opportunity to productively connect unique data sets. The challenge is that open data is necessary, but not always sufficient, to create value and drive innovation. For example, the binary 0s and 1s that allow a computer to generate an MRI are of little use to a patient — it is the computationally rendered MRI image that communicates the information locked inside of that binary data. We will work to deliver not just raw datasets, but also value- added “data products” that integrate and usefully present information from multiple sources. Who will do the Responsible Data Science? – Responsible Data Science: We will work carefully and thoughtfully to ensure data science policy protects privacy and considers societal, ethical, and moral consequences. Data will continue to transform the way we live and work. 13

14
EPA Big Data Analytics: Turning Data Into Value In support of CDS DJ Patil, I am developing a Data Science for EPA Big Data Analytics Data Product and Meetup in cooperation with EPA using EPA Ecosystem Data to answer not only EPA's Ethan McMahon's excellent questions (see next slide), but address the broader matter of: – Integration provides the right data to the right system or person in real time. – Analytics lets users develop insights using vast amounts of data to understand the past and anticipate the future. – Event processing combines the knowledge gained from analytics with real-time information to identify patterns of events and act to bring about the best outcomes. 14

15
EPA's Ethan McMahon's Excellent Questions EPA is planning to stand up a big data analytics service within the agency. We’d appreciate ideas from the ESIP community in a few areas: – 1. What problems have you tried to solve using data analytics and/or visualization? – 2. Are there any strategies or best practices you used to manage data within or between enterprise data systems? – 3. What techniques make sense for integrating large or varied data from multiple sources? – 4. What technologies have you used and how did you select them? – 5. Did you use any particular training resources for using big data analytics systems, and if so which ones? – 6. What lessons would be helpful for us to learn as we set up this service? We’re open to your ideas and we’re ready to share what we have learned. Please respond to me directly 15