The problem is that some publishers feel that, by including snippets of text from their web-sites, content aggregators such as Google News take eyeballs away from them.

No matter that studies show the opposite — that including a site in such aggregation services actually increases their traffic. This is matter for each publisher to make its own decision about. If they don’t want their content to be aggregated, then they shouldn’t be forced to participate in aggregators.

So I have a proposal: let’s solve the problem with the scalpel of technology rather than the sledgehammer of legislation.

We should introduce a simple standard whereby any publisher that doesn’t want its content to be included places a simple text-file called “robots.txt” at the top level of its web-site. If that site doesn’t want to be included at all, it would simply place these lines into the file:

User-agent: *
Disallow: /

And a publisher that only wants to omit certain areas of its web-site — the /news/ area, for example — could say “Disallow: /news/” instead, for finer grained control.

Then all we’d need to do is persuade aggregators to implement code that reads and respects this “robots.txt” file, and publishers that don’t want to be included would have a nice, simple way to opt out.

18 responses to “The EU’s proposed “link tax”: a modest proposal”

I think the problem is that the specific politicians involved here either (A) have no idea at all how the digital economy works, or (B) have been bought by big publishers who are prepared to do a little harm to themselves in order to prevent upstart competitors from being seen at all.

Oh, nothing can be done technologically about bad-faith aggregators, of course. You could write a crawling robot that simply ignores robots.txt. But reputable organisations would no more do that than they would simply post unauthorised copies of all your sites pages — something else that technology can’t prevent.

But reputable organisations would no more do that than they would simply post unauthorised copies of all your sites pages — something else that technology can’t prevent

Right, but there are legal remedies to that (an action for copyright infringement and an injunction, most obviously).

So given there is no actual technological solution to aggregators, just as there is no technological solution to copying, isn’t it fair to suggest there should be legal remedies to aggregators just like there are legal remedies to copying?

I would imagine there is also a legal remedy to a badly-behaved aggregator who harvests snippets despite having been told by a robots.txt file not to do so.

I don’t think there is.

But if there is, is that not exactly equivalent to a Google tax, which you can levy by configuring your ‘robots.txt’ file to only allow access to those user-agents who have paid for your content, and then suing for injunctions against those who either ignore the file or who change their user-agent to get around it?

So how can you argue against a ‘Google tax’ when it seems you basically are explaining the method for implementing it?

@Mike, “I think the problem is that the specific politicians involved here either (A) have no idea at all how the digital economy works, or (B) have been bought”

I have no quibbles with that: it’s just pretty much the definition of “politician”, usually both of the above. But, really, technology and politics really don’t mix at all because they still haven’t figured out how to tax it, given that it’s managed to elude regulation so far and will most likely continue to do so regardless of attempts of assorted miscreants to restrict it.

But I think there is a fundamental breakdown in the transfer of knowledge between politicians (pretty much all of whom have studied law or some vague, pointless, wanky humanities subject like “recent history” which weirdly allows them to run the economy) and technology types such as scientists and engineers. I’m not sure how it is in other locales, but hereabouts there does seem to be a really weird and unhealthy dichotomy and there always has been, though it is being thrown into rather sharp focus with special parts of the EU attempting to make creative legislation based on absolutely no understanding of recent technology nor any interest in obtaining it. Shout loud enough and somehow it’ll all magically work, or at least that seems to be the typical methodology. Even better if they involve the tabloids.

But I suspect that’s all a sideshow to distract popular opinion from the minor element of “they’ve been bought”, which is the larger part of the process, even though there’d be more money in actually understanding and embracing the technology. But much of that money will be in other people’s pockets and therefore irrelevant.

A lot of actions and reasons from the EU are motivated with the goal to strengthen the single European market (“Europäischen Binnenmarkt stärken”). The European news publishers are part of the market which they try to strengthen and the lobby seems to have a lot of influence in Brussels. On the other hand Google is a US-based company which makes a lot of money. Thus, there is some reason for the EU to try to become some portion of the money also in the European market…

However, I personally share that this “Google Tax” is very wrong. Moreover, the news publishers are usually not using a HTTP header to block crawlers as you described, on contrary, they are using schema.org or similar mechanisms to make their articles optimized for search engines.

They are actively optimising their sites for web crawlers, then complaining when web crawlers crawl their sites

Well, if you want to sell a product, then it makes sense to optimise your delivery mechanism for that product.

The problem comes if people then use that optimised delivery method to take your product, but don’t pay you for it.

The problem is that some publishers feel that, by including snippets of text from their web-sites, content aggregators such as Google News take eyeballs away from them.

Are you sure that’s their problem? Are you sure it’s not that they think that Google News is using their content, but they are not being paid for it — that ‘eyeballs’ are irrelevant, and the argument is that Google is profiting because of their work (hard to deny, as without it Google News wouldn’t have any content) and therefore should be paying for it?

However, that seems not the be to goal of the publishers. They tried in Germany:
* Step 1: Make a law that newspaper publishers should receive money from search machines listing their news (2012)
* Step 2.a: Try to force Google that they are obliged to list the news and therefore to pay money –> They failed with that plan: http://www.bundeskartellamt.de/SharedDocs/Meldung/EN/Pressemitteilungen/2014/22_08_2014_VG_Media.html?nn=3591568
* Step 2.b: Grant Google a free license to list the news in the end (because it is too important for the publishers to be there)

Does this sounds like a good plan to do all over in the European Union?

Right. Publishers’ attitude to aggregators (and let’s all remember it’s not just Google) seems to be this: “Thank you for the free advertising you give us; we want you to also give us free money. What? You won’t? Then stop indexing us!” …. six months later: “How dare you stop giving us free indexing?!”

As traditional print media is more and more in trouble (not just because of reduced subscription numbers but also because advertising shifts to online media) publishers want to extract money from Google.

However that’s not all there is to it: content aggregators provide an opportunity for smaller newspapers to be noticed, and that’s something the big publishers don’t like. Even if all the law is accomplishes is to abolish the Google news service in Europe and not lead to a payout, it still helps publishers to avoid competition by lesser known newspapers.

That’s why something like robots.txt wouldn’t work – it would only help if publishers didn’t want their content to be on Google, in reality they don’t want to see other publishers’ content on Google, and only a censorship law can help with that.