Full 2012 Danish company taxes

Two weeks ago I put out a preliminary release of the 2012 taxes. The full dataset with 245,836 companies is now available in this Google Fusion Table. I haven’t done any of the analysis I did last year. Other than a list of top payers, I’ll leave the rest up to you guys. One area of interest might be to compare the new data with the data from last year: What new companies have showed up that have lots of profit or pay lots of tax? What high-paying companies went away or stopped paying anything? Who’s taxes and profits changed the most? You can find last year’s data via the blog post on the 2011 data.

A better way to release tax data

Computerworld has a interesting article with details from the process of making the 2011 data available. The article is mostly about the fact that various Danish business lobbies managed to convince the tax authority to bundle fossil extraction taxes and taxes on profit into one number. This had the effect of cleverly obfuscating the fact that Maersk doesn’t seem to pay very much tax on profits earned in the parts of their business that aren’t about extracting oil.

More interesting for us, the article also mentions that the tax authority spent about kr. 1 mio to develop the site that makes the data available. That seems generous for a single web page that can look up values in a database, and does so slowly. But it’s probably par for the course in public procurement.

The frustrating part is that the data could be released in a much more simple and useful way: Simply dump it out in an Excel spreadsheet (and a .csv for people that don’t want to use Excel) and make it available for download. I have created these files from the data I’ve scraped, and they’re about 27MB (5MB compressed). It would be trivial and cheap to make the data set available in this format.

Right now, there’s probably a good number of programmers and journalists (me included) that have build scrapers and are hammering the tax website to get all the data. With downloadable spreadsheets, getting all the data would be fast and easy, and it would save a lot of wasted effort. I’m sure employees at the Tax Authority create spreadsheets like this all the time anyway, to do internal analysis and reporting.

Depending on whether data-changes (as seen with last years data) are a fluke or normal, it might be necessary to release new spreadsheets when the data is updated. This would be great too though, because it would make the tracking changes easier and more transparent.