Linked Leaks: A Smart Dive into Analyzing the Panama Papers

What do David Cameron, Pedro Almodovar and Leo Messi have in common? No, the Argentinian footballer doesn’t star in the Spanish director’s latest movie. Neither does the UK prime minister. Those three people — alongside thousands of other rich and powerful celebrities, business executives and politicians — have been linked to companies in the Panama Papers leak in recent weeks.

‘The Biggest Leak in History’

When the news of 2.6TB of data on shell companies broke in early April, it immediately became viral and has been trending ever since. Revenue agencies and government officials around the world pledged to fight tax avoidance in tax havens which, though not illegal, are the secret coffers the rich and powerful one-percenters have been using to reduce their tax rates.

A month later, on May 9, the International Consortium of Investigative Journalists (ICIJ), which broke the news, released a searchable database of more than 300,000 entities from the Panama Papers and Offshore Leaks investigations.

The names of David Cameron and Lionel Messi do not appear in the Panama Papers. In the wake of the leak, though, Cameron admitted that before becoming prime minister in 2010, he had owned shares in a tax-haven fund set up by his late father. Messi is believed to have avoided taxes via the company Mega Star Enterprises which he reportedly owns together with his father Jorge Horacio Messi. Almodovar said at the Cannes Film Festival that he was one of the least important names cited in the Panama Papers.

Panama Papers Dataset Enriched by Linked Data Portal

For two months now journalists and the general public have been wondering who’s also in the Panama Papers and which shareholders are connected with which corporations in which countries. A simple search of a single name or organization in a database, however, may prove tedious and enormously time-consuming.

Using the ICIJ database content and other open data sources, we, at semantic technology developer Ontotext, created the Linked Leaks linked data knowledge graph database of the Panama Papers. Thus the linked data project comes into play to enrich the data with semantics, link the dataset to other Linked Open Datasets, and provide richer findings while searching through the Panama Papers.

The knowledge graph portal also encourages data analytics enthusiasts, journalists and developers to dive into and dig for additional information in the Panama Papers. Playing with Linked Leaks allows for various types of analytics queries to discover relationships between companies, shareholders, countries and chains of control. The Linked Leaks demonstration service gives an all-new perspective of the Panama Papers, linking the leaked data to open-data information about countries and geographical regions.

Linked Leaks, which contain more than 22 million RDF statements, also serve as a kind of ‘Investigative Reporting Workbench’, allowing for asking smart questions in SPARQL and showcasing the role of Linked Data in Investigative data journalism. Analytics enthusiasts can also freely download the Linked Leaks data in RDF for on-premise analytics and for building applications using the data.

Putting the Panama Papers in Context

The Linked Leaks knowledge graph, published according to the Linked Open Data principles, has already been developed to link the Panama Papers to information on countries and geographical regions from the DBpediaand GeoNamesresources, and links to more datasets will be added.

These datasets help all sorts of discovery and analytics queries, for example: companies related to a given shareholder (person or organization), including control relationships; companies that control other companies in the same country, through company in an offshore zone; or most popular offshore jurisdictions.

‘The Game of Queries’ in Linked Leaks

By asking smart questions in SPARQL in Linked Leaks, everyone can get richer findings to their investigative search of the Panama Papers.

On to Q2, query #2: Country pairs by ownership statistics, to search and answer the question ‘Owners from which countries most often own entities in which other countries?’. The results are: the number-one pair is China-Hong Kong, followed by Hong Kong-British Virgin Islands and Taiwan-Samoa.

While Q5 is run to show Countries in Eastern Europe by number of owners and uses DBpedia and GeoNames resources. This query benefits from sameAs mapping to DBpedia and GeoNames and the basic information from those resources about countries, loaded in graph leaks. The query showed that Russia has more owners of offshore entities than all other countries in Eastern Europe combined.

As you can see, many sorts of interlinked cross-queries can be asked in the Linked Leaks graph database. Ontotext is just starting to explore the possibilities and opportunities of asking smart questions about the Panama Papers and is working to further enrich the Linked Leaks with new relations, additional mappings and new sample queries to fine-tune the raw data interpretation and analysis. We at Ontotext also plan to map this data to the Financial Industry Business Ontology (FIBO), so that one can query and analyze the data using its semantics.

Participating in the Relationship Discovery

We now challenge you to dive in the Panama Papers with Linked Leaks and explore the datasets with your own smart queries. Follow #LinkedLeaks @Twitter and post your #LinkedLeaks questions and queries!

Related Posts

As more and more companies and startups are creating business and social value out of open data, the open data trend-setting governments and local authorities are not sitting idle and are opening up data sets and actively encouraging citizens, developers, and firms to innovate with open data.

‘Data is the new oil’, once said Neelie Kroes, former Vice-President of the European Commission responsible for the Digital Agenda, aptly describing how the growing amounts of data are changing businesses and our lives. The year…

Often considered too technical and hard to implement Linked Open Data is actually not something outside business and free exchange as usual – it is connectivity, but on a data level. Global connectivity transformed the way we…