May it please the Italian court. Google does not ban newspapers that opt-out of Google News from being in Google at all, as a new complaint from Italian publishers alleges.

It’s not actually a court that’s going to ponder this allegation. It’s Italy’s AGCM, which appears to be akin the the US Federal Trade Commission, a government organization to evaluate antitrust behavior. The Italian Federation Of Newspaper and Periodical Publishers (FIEG) has asked the AGCM to investigate Google.

The agency has posted news (PDF in Italian, later here maybe in English) that it will examine whether Google is harming Italian papers by not being transparent enough about how news rankings work and by banning news sites from all of Google if they refuse to be in Google News.

The New York Times, Bloomberg (Google’s Italian offices were actually searched!) and the Dow Jones have stories (Dow Jones says the investigation will conclude by Oct. 2010). The New York Times piece is better on the specifics, and that’s where I’ll do my counterpoints to the charges.

Allegation: Google Doesn’t Tell Us How To Rank Better

From Carlo Malinconico, president of FIEG, in the New York Times article:

Because Google does not disclose the criteria for ranking news articles or search results, he said, newspapers are unable to hone their content to try to earn more revenue from online advertising. Ad revenue on the Web is directly proportional to the size of the audience, which is heavily influenced by search or Google News rankings.

Google actually does disclose a variety of ranking factors. It (along with Yahoo and Microsoft) does not provide the exact recipe for rankings, for the simple reason that if it did, many people expect that it would be havoc. Many sites would reverse engineer the system and get whatever crud they wanted into the top results.

That also means that the Italian papers wouldn’t gain anything. Other people with greater search engine optimization (SEO) skills and efforts would easily outrank them. Ironically, by keeping its exact ranking system secret, Google protects quality publishers rather than harms them.

In addition, by running a separate news search area, Google actually helps the newspapers. The news area allows only selected news publishers to be included in a special section where people may be seeking news articles. In turn, those papers receive traffic that’s denied to many other web sites.

Many newspapers acknowledge that they get lots of traffic from Google News but still somehow think Google owes them money on top of the visitors they get for free. Ironically, that doesn’t seem the case with the Italian papers. They’re complaining (at least according to the New York Times summary) that without Google’s secret sauce to determine news rankings, they can’t get even more than they already receive.

Finally, no newspaper editor of any quality would allow an external interest to walk into their newsroom and demand to know exactly how to guarantee a front page article about whatever they want. But that’s what the Italian papers seem to desire. Google has an editorial process for producing rankings, one that’s done using automation — but the papers seem to want to bypass those editorial decisions.

For more about how Google rewards newspapers, see these two key articles from me:

“Publishers provide much of the content on the Internet, but they get nearly nothing for it,” he said. “This is not fair, in our opinion. Our feeling is we lose more than we gain.”

If newspapers execs are scratching their heads about why more people involved with the internet aren’t stepping up to defend their viewpoints, it’s attitudes like in the quote above.

I don’t have the stats — I’m not sure if the stats are out there — but I doubt news content is most of the web or even “much” of the content on the web.

Certainly I know from having covered the search space for so long — and having talked with thousands of people over the years at conferences — that news publishers are well in the minority of those publishing online.

Italian publishers contend that they are punished if they drop out of Google News, saying they are then automatically excluded from the search engine, too. Google denies this.

It’s statements like this that make you wish the Italian antitrust board decides to toss out the entire investigation just for sloppiness. It also makes you wonder if any of the execs from any of the newspapers that are part of FIEG even talk to the technical people within their news organizations. I guarantee you at least one or more of those Italian papers has an SEO on staff whose job is to generate traffic from Google. And those SEOs would be rolling their eyes at this statement.

By default, most sites in Google Web Search are NOT in Google News Search. Google has millions — perhaps billions of sites that get included in Google Web Search. Over 25,000, Google says, are in Google News. By default, then, it is clearly the case you can be in Google Web Search and not be in Google News.

For papers lucky enough to be included in Google News, they can opt-out:

If you don’t want your site to be included in Google News, please let us know and we’ll remove it from our index.

Keep in mind that the removal process normally takes a few days and that your articles already included in Google News will expire after 30 days.

Note that as best I can tell, using that form doesn’t even require installing a special file on a web server. The publisher just says drop me, and they get dropped — only from Google News. The form doesn’t say “you’ll be dropped from all of Google.”

Don’t want to use the form? Then you can use a robots.txt file, as Google says:

Googlebot obeys robots.txt files and robots Meta tags for Google Search as well as for Google News.

Note how specific it is. The robots.txt file — a way to automatically opt some or all of your content out of being listed in Google — operates for Google Web Search and Google News independently. It’s like there are two light switches. You can turn off the Google News switch but leave the Google Web Search switch on.

[Postscript: Google points out to me that those using the robots.txt system of blocking will indeed be knocked out of Google Web Search. I should have caught this. In the past, Google had a separate crawler used for its news indexing service that was independent from its web search crawler. You could block one or both. The system has been unified around a single crawler without the separate “switches,” which is unfortunate — it’s one reason why the news publishers are potentially confused. As explained, they can be totally out of Google News yet stay within Google Web Search if they contact Google directly. It’s only if they use the automated robots.txt option (or a similar meta robots option) that they’d get knocked out of both. It also means they can’t selectively hold some articles back — allow them in Google News but not in Google Web Search. I suspect this might emerge as part of the issues when people read the complaint in more detail. Being in Italian, which I don’t speak, I’m replying on the English-language reports.

Postscript 2: Google checked further and says there was never a way to automatically block just its news spider. I could have sworn there was. Hopefully, they’ll add that support in the future.]

As a reminder, a group of Belgian papers back in 2006 sued Google to be excluded from all of its search properties (rather than just opting out using robots.txt). In the end, they won $77 million in damages (though I think this is still on appeal). But the papers also ultimately came crawling back to Google for inclusion.

The saga of newspapers versus Google is far from over. But in this round, I’d say the newspaper industry will have further shot itself in the foot.

As we explained to the FIEG when we met them earlier this year, Google News has over 25,000 sources from around the world. All of these news providers–like any website publisher–are in complete control when it comes to whether they want to be found on Google services. So if a news publisher doesn’t want to be found on Google.com, Google.it or any other reputable search engines, it can prevent indexation automatically via a universally accepted Internet standard called robots.txt. Publishers also have a range of other ways of controlling how their content appears (or doesn’t). One such option is for a publisher to continue to appear in Google web search, but not in Google News. In that case, all they need to do is contact us to be removed. In fact, we met with several Italian publishers and representatives of FIEG just this summer to explain these options.