The White House's opaque transparency

On Jan. 21, 2009, his first full day in the White House, President Barack Obama laid the groundwork for what would become his administration’s Open Government Directive, a concrete plan to guide federal departments and agencies in creating “an unprecedented level of transparency in government.”

Since its inception, this directive has set several goals for the U.S. government, challenging bureaucrats to change business as usual in Washington. A primary focus of this initiative is the new website Data.gov, the federal government’s new online information hub, where all executive departments and agencies are required to post at least three “high-value” data sets in order to better inform the public about government activity.

But nearly one year after this initiative was set in motion, the data sets have yet to live up to their promise, open government advocates say. While thousands of data sets have been put online, many are either already public information or are posted as undigested, raw data, making them mostly unusable to journalists and the public. And even those with the technical expertise to decipher and analyze the raw data complain that not enough useful information has been posted on Data.gov to truly deliver the level of transparency the president sought to achieve.

Breaking new ground for government

In a memo sent from the White House on Jan. 21, Obama directed the Office of Management and Budget to coordinate with various departments in developing an Open Government Directive, which would guide government agencies in implementing the three core principles of open government: “transparency, public participation, and collaboration.”

In early December 2009, Peter Orszag, who was the office’s director at the time, released the 11-page document, ordering all federal departments and independent agencies to meet the president’s challenge for increased openness. Essential to the “transparency” component of the directive was for each executive department and agency to develop a plan to inventory high-value information that is already available for download, improve public access to the information, and identify high-value data currently unavailable and establish a timeline for its publication.

Orszag also provided departments with a precise definition as to what makes a data set “high value,” explaining in the directive that it is “information that can be used to increase agency accountability and responsiveness; improve public knowledge of the agency and its operations; further the core mission of the agency; create economic opportunity; or respond to need and demand as identified through public consultation.”

Thus, with such a standard of quality and transparency placed on the data sets, many government watchdog groups and other principal users of the data have reacted with disappointment and frustration to the low quality of many postings on Data.gov by federal departments and agencies in the past year.

During a recent conference in Washington, D.C., Sunlight Foundation co-founder Ellen Miller highlighted several examples of what she and other open government advocates feel are low-quality data.

“It turns out that the government has some interesting ideas about what counts as ‘high value’ information. The Department of the Interior seems to feel that population counts of wild horses and burros are ‘high value,’ but records of safety violations like the ones that seem to have led to the Upper Branch Mine disaster are not,” she said.

The fact that the government has committed to putting the data online is a large first step toward greater transparency. But advocates argue that until all departments and agencies commit to providing higher-quality data on a continual basis and in a format usable by the public, the Open Government Directive will fail to live up to the president’s lofty promises.

Data redundancy

For reporters, who normally play an integral part in promoting government transparency on behalf of the public, the main problem with the data sets is that little information has been posted to Data.gov that cannot already be obtained easily elsewhere or that has high journalistic value.

One federal department that has particularly disappointed journalists and watchdog groups alike is the Department of Defense. The three data sets the Pentagon chose to put online as part of the directive’s mandate included Freedom of Information Act logs for the Office of the Secretary of Defense and Joint Staff, information regarding the Federal Voting Assistance Program and service member demographic data.

While the Pentagon explained that this information would greatly contribute to a more open dialogue with the public, the consensus from military reporters was that most of that information can already be easily retrieved and that much higher-value data exists that has not yet been provided.

For instance, the military demographic data is already readily available to the public and someone seeking such information would not even need to file a FOIA request to obtain it, according to Brendan McGarry of Military Times. He also said that although posting the FOIA logs online is a measure of progress, it is something that every department should already be doing.

“I think it’s a good step. Some data is better than no data. But the reality is there should be a whole lot more than there is, and there’s no reason in this day and age that they can’t do that,” he said.

Nancy Youssef, a military reporter for the McClatchy news service and head of the Pentagon Press Association, also believes higher-value data needs to be posted online and is troubled by the Pentagon’s lack of regard for reporters’ rights to information.

“Too often, things that should be public aren’t, and we treat information we get as some sort of privilege versus a right that we have . . . We’re entitled to it,” she said.

One journalist who has made use of information posted on Data.gov, Saqib Rahim of Energy & Environment Daily, said that while the data that the Department of Energy has made available is useful for technical research, actually making use of the data is difficult for reporters working on strict deadline or in a 24-hour news cycle.

“I doubt most reporters would use this often,” he said by e-mail. “The databases are easy to use, but the content is dense and difficult to read. You need more time.”

Some assembly required

Another reason few journalists have made use of Data.gov is because the site was not designed to be used by them, said Sarah Cohen, a former investigative reporter at The Washington Post and current Knight Professor of Journalism at Duke University. Cohen said Data.gov was actually designed to benefit researchers and Web developers with the skills to analyze the data on behalf of the public.

“It’s been a really good thing for community people, for people who are seeking information, particularly Web developers. But for reporters, there’s virtually nothing on Data.gov or in most of the open government areas that wasn’t already available from the agencies either on request or on their website,” she said.

So even if much of the data posted online were not already available to reporters, some say the computing expertise required to make use of the data sets precludes any sort of journalistic benefit they could have.

“You have to be an expert at this point to use what they’ve released,” said Sean Moulton, director of federal information policy at OMB Watch. He explained that without proper technical skills, the data sets “are not very easy to work with.”

Moulton also explained that those who benefit most from the data sets are groups such as OMB Watch, which can manipulate the data sets in order to build websites and tools the public can use to understand the information.

Matt Waite, a data-focused journalist with the St. Petersburg Times and founder of PolitiFact, said that these specially created tools allow the public to directly locate information that is specific to their needs, a level of data delivery that traditional journalism, which focuses on the big picture cannot provide. Waite has used government data in the past to build the website Edmoney.org, which tracks how government stimulus funding is being spent in public schools.

Using the data, “we could never hope to write a story that would tell people what they wanted to know about something very personal to them,” but “with a data-driven interactive [tool] like this, you can use the data to create individual pages for each thing that some person may be interested in,” such as information on a local school district, Waite said.

“It requires a different skill set than just being able to write a good story,” he said.

Thus, although the data sets may be difficult for the average person to use, they still have the potential to provide a tangible benefit to society.

‘Gaming the system’

If open government advocates are correct, then the core problem with the high-value data sets lies not in their difficulty to process, but in what advocates see as the low quality of the data in general and the lack of an across-the-board effort to make higher-quality data available to researchers and Web developers.

One explanation for the general low quality of much of the data is that the Open Government Directive was written too vaguely, allowing agencies to pass off mundane information as “high value,” according to John Wonderlich, policy director for the Sunlight Foundation.

“I think the initial difficulty with the directive is that it was so aspirational and some of the standards that it set were a bit vague, and I think that’s especially true of the standard for high-value data,” he said.

An example of the directive’s vague wording, Wonderlich said, is that one of the five criteria of the term “high-value” as it is defined in the document is data that “further the core mission of the agency.”

“To me, any data that an agency collects should be data that helps an agency pursue its mission. So it’s such a broad category as to permit an agency to put up anything and call it high-value data,” he said. “It’s difficult to set a minimum standard when the terms are so vague.”

Amy Bennett, program associate with OpenTheGovernment.org, has also run into a great deal of what she calls “gaming of the system” by federal departments and agencies. She gave the example of the Environmental Protection Agency, which posts data sets by individual state rather than nationwide, creating the appearance that the agency has thousands of individual data sets posted online.

“It looks like there are a ton of records, [but] the numbers are clearly not the entire story,” she said.

Another shortcoming in information quality is that the data sets sometimes do not come with the relevant microdata attached to the downloaded file, which are necessary to explain to programmers how exactly to interpret and analyze the data, according to Cohen.

“There’s no documentation about what it means, or where it came from, or who to call, and you don’t even know when it was last updated,” she said. “So, it’s a nice attempt, but the high-value data sets were frankly ridiculous in most agencies. They weren’t by any means the most requested information from those agencies. They had very little value. They were almost random.”

Wonderlich said the Sunlight Foundation has found problems with the data quality, as well. His group primarily works with data from USAspending.gov, another site affiliated with the Open Government Directive that focuses on government expenditures. While using the site, Wonderlich said he has often run into “data quality issues” and government requirements for “unnecessarily or unlawfully restrictive licenses” in order to access the information. When comparing the reported data on USAspending.gov to the actual figures, Wonderlich said he found “a stunningly high amount of inaccuracy and misreported data.”

Beyond the problems of “gaming the system” and serious gaps in information, advocates also question whether the information that has been posted to Data.gov is truly that which is most requested by the public.

“What I’m not seeing out of agencies in general is . . . an analysis of what kinds of information people ask for,” Cohen said. “They are still putting up what they want people to have in the form they want people to have it.”

McGarry said that it should not be up to officials to determine what data sets are “high-value,” but rather it should be left up to journalists and others who actually need to use the information.

“It seems like they’re playing a little game where the agency official is the one calling the shots on what’s newsworthy, or what’s in their judgment as a good data dump, whereas they should really have no bearing on what’s made available,” he said. In order for federal departments and agencies to truly understand what constitutes a meaningful data set to the public, the government needs to do a better job tracking whether data sets are actually being used, Bennett said.

“It’s hard because the government doesn’t want to seem like it is controlling or tracking the use of all the data. But I think it is important for agencies to ask what kind of an impact what they’re putting out there is having in the real world,” she said.

Bennett said that, thus far, government attempts to track data usage have been based solely on visits to the site and not on the actual impact the data is having. “The initial chart they had up that [measured] how agencies were doing was pretty incomplete, because it was just numbers — how many data sets are posted [or] how many times they were downloaded,” Bennett said, adding that this kind of rough analysis does not indicate whether the data were reused or if they had any real impact on public research.

“They’re certainly trying, but a bit more guidance would help,” she said.

Agency difficulties in reporting the data

In response to the general public criticism of Data.gov, Steve Midgley, deputy director of the Office of Educational Technology at the Department of Education, said officials within the department are working to make more data available and to correct problems of information quality. However, he said, the high number of data sets and their distribution across numerous locations within such a large organization have slowed improvement efforts.

“We are working really actively right now to get as many [data sets] as possible out into the public space. One of the issues, of course, is there are a lot of data sets out there in every department, which members of the department know about but are not centrally organized and understood,” Midgley said. “Part of the Open Government Initiative is to get all of these decentralized data sets identified and understood in a central way so that we can programmatically release them based on priority and suitability.”

For the Department of Education, putting certain data sets online can be particularly difficult because of a requirement in the Privacy Act stating that “personally identifiable information” must be protected from public view, Midgley said.

“We have to go through a review process to take [the data] from the division inside the department that owns it, and then analyze it to ensure it doesn’t have [personally identifiable information] in it, or alter it in such a way as to eliminate the person identifiable,” he said.

Another time-consuming obstacle, Midgley explained, is the review process data sets must undergo to ensure they do not compromise national security. The Department of Education has sent several new data sets to Data.gov, which need an operational review by the Office of Management and Budget and the National Security Council before they can be released, he said.

A variation in quality across departments

While data quality and reporting issues in general continue to frustrate open government advocates, several federal departments have performed significantly better than others in making quality data available and have even earned praise from the same groups that criticize the effort overall.

Although she generally had harsh criticism for the data sets, The Sunlight Foundation’s Ellen Miller pointed out that the quality of data varied between departments, highlighting ambitious efforts by NASA, the Department of Health and Human Services and the White House to make quality data available in a timely manner.

Cohen also explained that despite the many flaws, there have also been some improvements in transparency through this project. “There are a few data sets that are now being made public that weren’t before,” she said, adding that certain groups, such as the Department of Labor, are making a much greater effort than others.

The Labor Department has made greater strides to provide information requested by the public, especially after drawing criticism from groups like the Sunlight Foundation regarding the department’s failure to provide data related to an explosion at Upper Big Branch Mine in West Virginia last April.

According to David Roberts, director of new media at the Department of Labor, after the disaster at the mine, “We got a number of inquiries about the data — the enforcement records of specific mining companies — and as a result of being able to publish the entire enforcement data [online], we were able to direct those queries straight to the source,” he said. The official also provided an explanation as to why the department has put some information on Data.gov that was already available to the public.

“A lot of the information was already available and already being provided on an ad hoc basis, but by making the data sets available, we’re doing two things: we’re increasing access to the availability of the data, we’re also decreasing the constraints on our resources internally,” he said.

The Department of Education also noted that it has made efforts to provide tools for the public to understand the raw data that has been posted to Data.gov and that it had been providing some analytic services for years before Obama took office.

“I wouldn’t say we’re perfect on [the data], particularly on it being difficult to figure out if you’re not already an expert, but I would say that we’re doing pretty good on the data sets that are out there. Our big concern is that we don’t have enough data sets out there yet and that’s what we’re working hard on right now,” Midgley said.

While many agencies are making a concerted effort to provide the public with useful information, the failures of many others to commit to the president’s promises are bringing the initiative down overall, open government advocates have said.

This wide range in data quality across departments has made pinpointing general policy remedies to the problems of Data.gov a difficult task. However, advocates are optimistic that despite its many flaws, the Open Government Directive will continue to move toward greater transparency.

Future plans for Data.gov

While most open government advocates agree that Data.gov is still far from meeting the goals of the Open Government Directive, many believe that at least some of the executive departments and agencies have plans in place to improve the quality of data that is made available to the public.

Sean Moulton of OMB Watch said that despite its many flaws, Data.gov has the potential to be a good first step for many agencies, one that could truly change the way the government operates. Whereas people would be forced to use FOIA to get government information before, “now the data is just out there” Moulton said.

“So, it’s been a change in the process . . . It’s a different way for the government to operate, to actually push out the information and just leave it there. And anyone can get it.”

Moulton said that while some agencies have begun this type of “proactive disclosure,” he hopes to see more “top-level” information posted online in the same way.

While open government advocates and officials remain optimistic about the future of the directive, the march toward transparency is currently mired in a tug-of-war between Obama’s highly ambitious rhetoric and the usual resistance to change in Washington, Wonderlich said.

The Open Government Directive “is a clear commitment to creating a more open executive branch. But there are a lot of individual problems. The president’s promises as a candidate are running into bureaucratic problems and the fact that there are so many agencies,” he said. “Some of them are taking the directive as an excuse to run with it and some of them are dragging their feet.”