Data.gov Gets Updated: A Closer Look

Version 2.0 of government's data portal relies heavily on open-source platforms, but finding usable data can still be like looking for buried treasure.

5 Big Wishes For Big Data Deployments

(click image for larger view and for slideshow)

The White House's release this week of a newly designed version of its government data portal, Data.gov, was greeted with predictable fanfare and generally underwhelming reviews. The site's introduction highlights the administration's latest steps to make government data more readily available to the public, but it also marks another step forward for the government's use of open-source software.

U.S. Deputy CTO Nick Sinai and Senior Advisor Ryan Panchadsaram, in announcing the new design of Data.gov, were upfront in declaring, "Next.Data.gov is far from complete (think of it as a very early beta)."

On the surface, the new portal is in fact a fresh start aimed at making it easier for data enthusiasts and application developers to find, visualize and reuse government data. Under the hood, however, it also embraces the government's expanding reliance on open-source software.

The new Data.gov site, for instance will be making use of Apache Solr for search, CKAN (Comprehensive Knowledge Archive Network) for its data management platform and WordPress for its content management system. It even makes use of open-source fonts.

The preview of the new site is a clear response to the President's technology-driven management agenda announced earlier this month and a White House executive order issued in May for agencies to make their data more accessible, including being machine readable by default.

But in many respects, the new site is also likely to disappoint die-hard data users as being not much more than a shiny new showroom attached to the same old government data warehouse, a warehouse still in need of operating improvements and accessible data.

The new edition of Data.gov does offer some improvements over the original site, which began in 2009 as a basic clearinghouse for federal agency datasets, many of which were not designed for the general population. Although Data.gov has been routinely criticized for this lack of accessibility, it deserves credit for spawning more than a dozen special interest communities around health, education, energy, safety and other data. It has also led, thanks to the vision of federal CTO Todd Park, to a collection of cottage industries that are putting government data to work for private enterprise.

The new site design clearly reflects an injection of new
thinking from various sources. Among them are the Office of Science and Technology Policy, the General Services Administration, and private
sector experts working through the Presidential
Innovation Fellows program.

For instance, Next.Data.gov has fused the usual social media
tools into the site to capture streams of comments that highlight how private enterprise and the public at large are taking advantage of government data. The design brings the public's ideas about government data front and center.

The result is slicker promotion of some of the government's best data harvesting successes, such as the release of a database showing the significant variations in what hospitals and healthcare providers charge across the nation for 100 most common inpatient services and 30 most common outpatient services.

The new site also uses D3.js, a Web-based JavaScript library that helps manipulate and visualize data in documents. This makes it possible to look beyond the card catalogue view that
Data.gov generally provided, and actually view the data in Data.gov's repository dynamically.

The new version also has winnowed down Data.gov's offering, pulling together what would seem to be the most usable subset of Data.gov's total inventory of data sets and application programming interfaces. The new site features 75,713 datasets and 100 APIs compared to the 184,259 datasets 295 government APIs previously listed on Data.gov.

But the real test of the new design is whether users can find and make ready use of the government's vast data resources. And the early results suggest a lot more needs to be done to reduce the number of steps it takes to find and actually extract what in many cases remains buried data treasure.

Find out what government IT teams need to know to deliver new, more agile enterprise networks and services. Also in the new, all-digital Next-Gen Networks issue of InformationWeek Government: How the Navy cut the price tag for its newly awarded Next Generation Enterprise Network contract to HP by more than a billion dollars. (Free registration required.)

You said it. I have yet to find an organization less helpful or more cringe-worthy than .gov sites. I find them nearly impossible to navigate, woefully outdated and the number of broken links is astounding. Any upgrades are great steps in my book, but they have a long way to go. I'm encouraged that this issue is receiving any attention at all.

"But in many respects, the new site is also likely to disappoint die-hard data users as being not much more than a shiny new showroom attached to the same old government data warehouse, a warehouse still in need of operating improvements and accessible data."

Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.

Why should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.