The Danish business lobby complained loudly when the move to publish current tax records was announced. I agree that the release of this information by a center-left government is an example of political demagoguery and that’s yucky, but apart from that, I don’t think there are any good reasons why this information should not be public. It’s also worth noting that publicly listed companies are already required to publish financial statements and non-public ones are required to submit yearly financials to the government which then helpfully resells them to anyone interested.

It’s good that this information is now completely public: Limited liability companies and the privileges and protections offered by these are an awesome invention. In return for those privileges, it’s fair for society to demand information about how a company is being run to see how those privileges are being put to use.

The authorities announced their intention to publish tax records in the summer of 2012 and it has apparently taken them 6 months to build a very limited interface on top of their database. The interface lets you look up individual companies by id (“CVR nummer”) or name and inspect their records. You have to know the name or id of any company that you’re interested in because there’s no way to browse or explore the data. Answering a simple question such as “Which company paid the most taxes in 2011?” is impossible using the interface.

Having said that, I think it’s great whenever governments release data and I commend the Danish tax authorities for making this data available. And even with very limited interfaces like this, it’s generally possible to scrape all data and analyze it in greater detail and that is what I’ve done.

So what’s in there

The tax data-set contains information on 243,711 companies. Note that this data does not contain the names and ids of all companies operating in Denmark in 2011. Some types of corporations (I/S corporations and sole proprietorships for example) have their profits taxed as personal income for the individuals that own them. That means they won’t show up in the data.

UPDATE 12/30-12: Magnus Bjerg pointed out that some companies are duplicated in the data. This seems to be the case at least for all (roughly 48) companies that pay tariffs for extraction of oil and gas. Here are some examples: Shell 1 and Shell 2 and Maersk 1 and Maersk 2. The numbers for these companies look very similar but are not exactly the same. The duplicated companies with different identifiers are likely due to Skat messing up CVR ids and SE ids. Additional details on SE ids can be found here here. My guess is that Skat pulled standard taxes and fossil fuel taxes from two different registries and forgot to merge and check for duplicates.

Here are the Danish companies that reported the greatest profits in 2011. These companies also paid the most taxes:

Benford’s law states that numbers in many real-world sources of data are much more likely to start with the digit 1 (30% of numbers) than with the digit 9 (less than 5% of numbers). Here’s the frequency distribution of first-digits of the numbers for profits, losses and taxes as reported by Danish companies plotted against the frequencies predicted by Benford:

The digit distributions perfectly match those predicted by Benford’s law. That’s great news: If Danish companies were systematically doctoring their tax returns and coming up with fake profit numbers, then those numbers would likely be more uniformly distributed and wouldn’t match Benford’s predictions. This is because crooked accountants trying to come up with random-looking numbers will tend to choose numbers starting with digits like 9 too often and numbers starting with the digit 1 too rarely.

UPDATE 12/30-12: It’s important to stress that the fact that the tax numbers conform to Benfords law does not imply that companies always pay the taxes they are due. It does suggest, however, that Danish companies–as a rule–do not put made-up numbers on their tax returns.

Technical details

To scrape the tax website I found two ways to access tax information for a company:

Spoof the POST request generated by the UpdatePanel that gets updated when you hit the “søg” button

The former is the simplest approach, but the latter is preferable for a scraper because much less HTML is transferred from the server when updating the panel compared to requesting the page anew for each company.

To get details on a company, one has to know it’s identifier. Unfortunately there’s no authoritative list of CVR identifiers, although the government has promised to publish such a list in 2013. The contents of the entire Danish CVR register was leaked in 2011, so one could presumably harvest identifiers from that data. The most fool-proof method though, is to just brute-force through all possible identifiers. CVR identifiers consist of 7 digits with an 8th checksum-digit. The process of computing the checksum is documented publicly. Here’s my implementation of the checksum computation. Please let me know if you think it’s wrong:

My guess is that the lowest serial (without the checksum) is 1,000,000 because that’s the lowest serial that will yield an 8-digit identifier. The largest serial is likely 9,999,999. I could be wrong though, so if you have any insights please let me know. Roughly one in eleven serials are discarded because the checksum is 10, which is invalid. That leaves about 8 million identifiers to be tried. It’s wasteful to have to submit 8 million requests to get records for a couple of hundred thousand companies, but one can hope that 8 million requests will get the governments attention and that they’ll start publishing data more efficiently.

4 thoughts on “Tax records for Danish companies”

Very interesting article.

I don’t feel satisfied with using Benford’s law for claiming their legitimate. A comparison with figures from other Nordic countries, and the assumption they are not here for prestige is more relevant. In my humble opinion.