Last week, a group of newspaper and magazine publishers signed a declaration stating that "Universal access to websites does not necessarily mean access at no cost," and that they "no longer wish to be forced to give away property without having granted permission."

We agree, and that's how things stand today. The truth is that news publishers, like all other content owners, are in complete control when it comes not only to what content they make available on the web, but also who can access it and at what price. This is the very backbone of the web -- there are many confidential company web sites, university databases, and private files of individuals that cannot be accessed through search engines. If they could, the web would be much less useful.

For more than a decade, search engines have routinely checked for permissions before fetching pages from a web site. Millions of webmasters around the world, including news publishers, use a technical standard known as the Robots Exclusion Protocol (REP) to tell search engines whether or not their sites, or even just a particular web page, can be crawled. Webmasters who do not wish their sites to be indexed can and do use the following two lines to deny permission:

User-agent: *Disallow: /

If a webmaster wants to stop us from indexing a specific page, he or she can do so by adding '<meta name="googlebot" content="noindex">' to the page. In short, if you don't want to show up in Google search results, it doesn't require more than one or two lines of code. And REP isn't specific to Google; all major search engines honor its commands. We're continuing to talk with the news industry -- and other web publishers -- to develop even more granular ways for them to instruct us on how to use their content. For example, publishers whose material goes into a paid archive after a set period of time can add a simple unavailable_after specification on a page, telling search engines to remove that page from their indexes after a certain date.

Today, more than 25,000 news organizations across the globe make their content available in Google News and other web search engines. They do so because they want their work to be found and read -- Google delivers more than a billion consumer visits to newspaper web sites each month. These visits offer the publishers a business opportunity, the chance to hook a reader with compelling content, to make money with advertisements or to offer online subscriptions. If at any point a web publisher feels as though we're not delivering value to them and wants us to stop indexing their content, they're able to do so quickly and effectively.Some proposals we've seen from news publishers are well-intentioned, but would fundamentally change -- for the worse -- the way the web works. Our guiding principle is that whatever technical standards we introduce must work for the whole web (big publishers and small), not just for one subset or field. There's a simple reason behind this. The Internet has opened up enormous possibilities for education, learning, and commerce so it's important that search engines makes it easy for those who want to share their content to do so -- while also providing robust controls for those who want to limit access.

I believe the problem is not Google, search engines, or crawlers; is the content business model that has to be review, adv fragmentation does not cover alone quality content (created by someone who lives of doing so); google will become shortly in the largest content seller when publishers start to have suscriptions or micropyments for content, thats my end game.

A page blocked with robots.txt can still appear in the SERPs, but Google won't have crawled it.The snippet would use either link anchor text or DMOZ title for the title, and DMOZ description if available.

Noindex, Google will still crawl the page, and links can confer PageRank to other pages.The page won't appear in the SERPs.

For Noindex to work, Google has to have access to a page, thus if you mix both the robots.txt disallow directive with meta noindex, Google will obey the robots.txt, and thus can't read the noindex.The page can still appear in the SERPs, with title from anchor text or DMOZ.

Well Google invades the sites, many people do not know how to block indexing.

Some cases we have seen on the Internet for sites that are hacking even before being launched as the Google index the page without permission.

I believe that the way that Google works in some parts is very failure, including the exploitation of content from other people, we know that small businesses could not grow as much of the bill is intended to advertising within the search engines (where the content is used so wrong).

I can not be unfair and say that Google is a bad company, most believe that the internet goes beyond online advertising, the business model used by Google is no different from old media and believe it will become obsolete as old media.

Besides that if you have good content you are competing with low-quality sites that are only there because they are paying part of the links, even with an entire process of verification within the Adwords quality is still very imperfect, because in the world various searches and some have little content in search engines, where you can see a clear difference in quality of service quality, even if free could have a better way to find the company paying the bills.

I've been in the newspaper business since 1989. I was on the internet before most newspapers were, before there was even a World Wide Web. I always thought that as soon as publishers started to realize that they were losing money because they had content available for free they would just pull it back.

Some have and some haven't. The NYT has gone back and forth. The Wall Street Journal has always had a pay wall. Neither are doing particularly well.

That's because it's not about the price charged for the content; it's about selling the audience.

Newspapers are in trouble because they forgot how to sell audiences to advertisers.

When I worked as a newspaper circulation executive my goal was to make enough revenue to cover variable costs of printing and distribution, in essence making it free to distribute. Advertising sales had to cover all of the overhead - salaries, benefits, building, maintenance, presses, trucks, news gathering, etc.

This is not unlike TV, radio or even the internet. Radio and TV always gave away content because once the studio was built, the contract with the talent was signed and the transmitter was in place, there was very little cost to deliver the content.

However that content was sponsored so all of those other fixed costs could be paid with enough left over for profit, pension plans and a decent Christmas party.

This is what newspapers need to relearn. They can block Google or any other search engine from indexing their content so they can charge for it, but that's just not going to generate enough money to run the rest of the operation. Newspapers need to engage audiences and sell those audiences to people that want to reach them. Until they start doing that again they will never be profitable. People are just not going to pay enough for content to make up for all the losses in advertising dollars newspapers have seen.

Hiding from Google isn't going to help. Finding new ways to delight and amaze audiences, and proving to advertisers that your audience is delighted, will.

Well, I'm in the news industry (and let me tell you, it's an old industry) and I also agree completely with the above.

That said, it's worth acknowledging that the newspaper/magazine publishers do have a valid argument: Google is more valuable because of their content, and maybe there ought to be a half-measure between "block all bots" and "search engines crawl everything for free."

Maybe Google thinks there's no profit for them between those two extremes. And they're probably right. But let's not pretend traditional publishers are the only ones making a choice here.

@Brendan:What should the news industry respond to that polite and *right* article? "Sorry, we didn't our homework and didn't read the the f....g manual?"

Two lines of code. As easy as this. Nothing else more. End of discussion.

I think, the news industry has to learn, that they can earn money even on the web, but under new and other conditions than within the last decades. But they have to change, and they have to have the willingness to change. Otherwise, their companies will die.

But what if you are a publisher of original news and feature stories and Google refuses to carry your content whilst carrying that of your competitor publications? They have an anti-European, pro-American bias. They should either publish all news stories in a subject area or none, and be impartial. Currently Google is myopic and very US oriented. They should be barred from all European News media until they oiperate fairly

I heartily concur with your article. If publishers don't wish their content to be online, there is a simple solution: take it off.

Underlying this, I wish Google would NOT serve links to material that is hidden behind toll access barriers, or at least make a browsing option that hides this from view.

When I use the internet I want FREE information. I know that there is more information out there on a topic published commercially, but if I wanted to buy that I would be looking in a library or bookshop.

It seems to me publishers want it both ways: they want Google to generate demand for their product by placing it in searches, then hide it behind toll access so they can charge for it. This isn't how the internet works: the internet is primarily about about free access to information. If you don't want to play in this game, that's fine: please do go and consign yourself to irrelevance.

Totally agree. No publisher is forced to have a website, so get out of internet if you don't like it, but don't try to force the others to follow your dead rules. The rules of internet are clear since Sir Berners-Lee created the WWW long time ago, and no businesmen has the right to change them.

It's not about Google, the name is just used as a synonym for internet by people who can not adapt do new distribution channels.

They claim that their "quality journalism" has to be protected as their source of income. The internet, and especially the search engines, help us consumers to find the source of most news, the big agencies like Reuters! Most of the "quality journalism" is copied word by word from the agencies tickers. In the days of dead tree publishing we rarely found out.

A lot of the rest of the "quality journalism" is copied from blogs and social websites, often without credit to the author and no payment thrown in as good measure.

I have to subscribe to the printed edition of the local paper so I can subscribe to the online edition at extra cost, this just doesn't make any economic sense! Especially when said paper has not much more to offer than the agency news.

Oh, and they have a sports journalist who used one of my pictures without my permission!

I have a site that I did "protect" from indexing, from the very beginning, with a robots.txt file, i.e.:User-agent: *Disallow: /This robots.txt file is still there. However, my site content is showing up on Google.

Whilst I agree that publishers should protect their content if they don't want Google to index it, it seems to me, at this point, that Google has ignored the robots.txt file. Can you help me figure out how THAT happened and what I can do to have it fixed.

You are welcome to comment here, but your remarks should be relevant to the conversation. To keep the exchanges focused and engaging, we reserve the right to remove off-topic comments, or self-promoting URLs and vacuous messages