Open information may mean more errors get published

By Joseph Marks

May 23, 2011

Making huge troves of raw government data freely available fast may mean accepting that much of that data are only as good as the people and processes behind them, a panel of government officials said Monday.

For example, in response to massive public demand for information during the 2010 BP oil spill, Energy Department and Environmental Protection Agency workers were struggling to push out data on ocean samples within 24 hours of when they were gathered.

That not only put a massive strain on the agency, diverting money and resources from other priorities, but also, in some cases, led to different types of data being improperly mixed in a way that was more confusing than helpful, said Tim Crawford, a senior adviser at EPA.

The best the agency could do, Crawford said, was to label the data sets to make clear what had gone through a full agency review and what hadn't.

"Being able to classify that information as basically a beta version, saying, 'This is at your own risk; we don't stand behind the values you see here,' that's very important," Crawford said. "Then you go down the list and say, 'Well, it's been looked at a little bit; it's in the stream; it's certified at various levels.' "

EPA is working on a standard protocol for describing how thoroughly data have been reviewed before they're released, Crawford said.

Crawford spoke at a panel discussion titled Ensuring Federal Data's Accuracy at the Excellence in Government conference hosted by Government Executive Media Group.

Data from EPA and other agencies are released most often these days on Data.gov, an Obama administration open government initiative that marked its second anniversary Sunday.

The site now holds roughly 3,000 government-generated data sets, but has been criticized by open government groups, which say it's less a tool for government transparency than a data dump for information like "the population count of wild horses and burros."

In some instances, Data.gov has become a single repository for information that was already routinely publicly released, such as the FBI's uniform crime reports , but never before pulled into a single location.

In those cases, the Data.gov platform, which includes comment and response sections, can be a check on data errors, allowing the FBI and other agencies to improve their data through crowd sourcing, according to Sanjeev Bhagowalia, a deputy associate administrator who works on Data.gov for the General Services Administration.

"You can now say, 'Hey, wait a second. You say in this particular place there are only five crimes, but I went to the police precinct and I was able to see 25 crimes,' " Bhagowalia said.

Bhagowalia said GSA is considering melding data.gov with the often arduous Freedom of Information Act process so that noncontroversial requested information can simply be posted to the website.