Project

Background

The wealth of freely available, structured information on the Web is constantly growing.
This is especially true for public data from and about governments and administrations.
Data-providing projects, such as DBPedia and Freebase from the linked open data community,
as well as structured data from domain-specific sites, such as senate.gov, USASpending.gov,
or epp.eurostat.ec.europa.eu, make it possible to integrate data from multiple sources and thus
create new data sets with added value. The recent appointment of Tim Berners-Lee to lead a review
on how the UK government can open up access to official information reinforces this trend.
However, the integration of such data sources is far from trivial: Apart from technical difficulties
of accessing the data, structural and semantic differences in the data must be overcome.
In particular, the various data sets must be standardized, transformed to a common structure,
cleaned and finally consolidated into a single, consistent and complete data set.

The Project

GovWILD started as a joint project between Hasso Plattner Institute and IBM's Almaden Research Lab.
It integrates Open Government Data about politicians, parties, government agencies, funds, companies,
and industrial leaders into a clean and consistent data set. Individual components extract data,
scrub it, identify common entities across multiple sources, transform data to a common structure and
finally fuse conflicting data into a value-added and rich data set. We have already integrated data
from several EU and US sources. This interlinked data is visualized on a Web interface to be explored
by citizens and is available for download and further analysis. It can be used to uncover hidden connections
between individuals in government and industry, to aggregate financial data, and to deep-dive into the
network of politics and industry.