Start small to succeed with big data

A new report on big data in government urges federal agencies to start small, but start now -- and calls for the creation of a Chief Data Officer position in each agency and also governmentwide.

The report, released Oct. 3 by the TechAmerica Foundation at a Capitol Hill briefing held with the Congressional High-Tech Caucus attempts to define "big data" and its value, and offers 10 case studies to illustrate how big-data projects can serve critical government missions. Half of those examples showcase federal projects -- at NASA, the IRS, the National Archive and Records Administration, the National Oceanic and Atmospheric Administration, and the Centers for Medicare & Medicaid Services. (Click on the cover image below to read the full report.)

"We really hope [the report] creates a sense of urgency for government stakeholders to do something today," said EMC’s Bethann Pepoli in an interview prior to the launch event. "It helps government users understand what big data is, but more importantly, how to begin creating a strategy that isn’t based on every piece of data they have." Pepoli, the CTO of EMC State and Local Government Division, co-led the TechAmerica working group charged with defining big data and identifying clear use cases for the report.

Steve Lucas, executive vice president of SAP’s global Database & Technology divison and a co-chair of TechAmerica's Federal Big Data Commission, agreed. Government has been at the forefront of creating and sharing big data, he said in an Oct. 2 interview to preview the report. "If you think about what we take for granted today -- population data, weather data ... we have the federal government to thank for it," he said. And now, he argued, with "all of these agencies sharing thousands and thousands of data sets," and the cost of storage and analysis plummeting, "you've got almost a perfect convergence" for really putting the data to use. "It is not a research experiment. This is something anyone can tackle today."

The report, titled "Demystifying Big Data: A Practical Guide to Transforming the Business of Government," urges agency leaders to identify two to four "key business or mission requirements that Big Data can address," and to craft focused projects to address those needs instead of attempting to implement a comprehensive big data strategy. Not only will this avoid doing big data for big data's sake, Pepoli said, but it reflects the reality that data quality is often the central challenge for any project: "There is no way you’re going to be able to tackle every data quality issue for every data source" in an agency, she said.

Agencies are also encouraged to take a full inventory of data assets, both within the agency and throughout the government. Lucas acknowledged that many agencies are still hesitant to rely on outside data sources, whether for budget and turf reasons or for fear that a project might be build around a data set that subsequently changes or disappears. "There's probably a little too much bias toward what you have in your pocket, he said. "But the reality is this: we know that any real breakthrough... is always some kind of a mashup."

"At a legislative level," Lucas said, "we need to make it easier for people to be able to rely on each other, intra-agency."

At the governmentwide level, the report also calls for the removal of procurement-related barriers, and urges the Office of Science and Technology Policy to "encourage further research into new techniques and tools, and explore the application of those tools to important problems across varied research domains."

Pepoli said that common toolkits and proven strategies are key to encouraging agencies to dive into big data projects. She pointed to the IRS case study: "The fraud use case applies to a number of agencies," she said. "The framework they’ve put in place to protect the identities and protect fraud…can certainly apply to any business problem."

And on the report's call for yet another C-level role at agencies, Lucas said that big data warrants a dedicated champion, and that too many CIOs and CTOs lack the time or focus.

"I think they do need it, I really do," he said. "The reality is, if you're a CIO and you're really delivering information to your business... then maybe you get a pass. But we've [too often] moved from a focus on the information to just the technology."

The report cites the FCC's chief data officer position as a model for other agencies, and also recommends "appointing a single official within the OMB to bring cohesive focus and discipline to leveraging the government's data assets."

Lucas stopped short of calling for these roles to be legislated, in the way that CIOs are mandated under the Clinger-Cohen Act, but said: "I am firmly convinced that at an agency level, there needs to be a chief data officer or [top] data scientist."

About the Author

Troy K. Schneider is editor-in-chief of FCW and GCN.

Prior to joining 1105 Media in 2012, Schneider was the New America Foundation’s Director of Media & Technology, and before that was Managing Director for Electronic Publishing at the Atlantic Media Company. The founding editor of NationalJournal.com, Schneider also helped launch the political site PoliticsNow.com in the mid-1990s, and worked on the earliest online efforts of the Los Angeles Times and Newsday. He began his career in print journalism, and has written for a wide range of publications, including The New York Times, WashingtonPost.com, Slate, Politico, National Journal, Governing, and many of the other titles listed above.

Schneider is a graduate of Indiana University, where his emphases were journalism, business and religious studies.