Along the Google questions, anybody heard anything about the copyright issues faced by Google's cache? I have only read snippets of the furor, but it seems there are certain legal issues in relation to the cache at Google.

Deb, I split this topic off from the previous thread because it seems that it should deserve it's own thread.

As far as an answer, I haven't read all the speculation about it, but personally, it seems like a total non-issue to me. There's no copyright infringement at all. You don't want to be in the cache? Exclude the bot, and/or put a no google cache tag on your site.

There's only copyright infringement if they won't remove it when you ask them to, imo.

All arguments I have heard stating that Google has been partaking in copywrite infringment have been pretty poor ones IMHO. You can easily 'opt out' of Google's cache, and I think it is safe to assume that the majority of the webmasters out there want and encourage Google to spider and cache their site. Even if googlebot malfunctioned and continued to cache a site that specificly instructed not to, I still can't really see anyone suing Google over this and winning.

All search engines have a cache of one sort or another. An indexed page is a cached page - index and cache are synonymous, in this context. Google is unusual among search engines for allowing public access to their cached pages. However, Google provides plenty of mechanisms for preventing the cache being seen.

The Web is built on caches and caches are inherent to the Web. Many caches aren't even labelled as caches (unlike Google's, which is clearly labelled) and display old content under the URL of a site that has newer content (unlike Google, which displays the content under a Google URL, and makes plain that the page may have changed since). People need to be careful what they ask for - the Web would not work with caches.

Just to play devil's advocate here, I'm going to try to bring up some more recent arguments against a cache to show that there are some copyright issues that may be still murky.

Subscription content cached

It seems like a reasonable business model when a site makes information available for a limited period of time, and then moves it to an area of their site where people have to subscribe to see the material. But what if the articles still remain available in the Google cache? And people get to read them for free?

Sometimes people publish things online that they shouldn't. Sometimes corrections need to be made to online newspapers, or sites release information that they shouldn't such as an access code to software. Here's a copy of a Cease and Desist letter sent from Microsoft to Google:

A common counter-argument to the notion that Google's cache may violate copyright is that the search engine isn't competing with the sites cached, and it isn't reselling anything. It is just providing a service. But the reality is that Google does earn money when they provide services.

When Google gathers indexing information, it does so to enable people to find the original pages. While Google doesn't have express permission, an argument might be made that it has a fair use exception to copyright. It is also possible for people to keep Google from putting pages in the cache, using this metatag:

<META NAME="robots" CONTENT="noarchive">

On Google's page about their indexing spider, they say the following about this meta tag:

Google maintains a cache of all the documents that we fetch, to permit our users to access the content that we indexed (in the event that the original host of the content is inaccessible, or the content has changed). If you do not wish us to archive a document from your site, you can place this tag in the head of the document, and Google will not provide an archive copy for the document.

From this quote, we can probably surmise that Google's reason for maintaining a cache that people can view is to allow their users to see sites that might be inaccessible at the moment or where the content has changed.

Even though I'm not sure that those reasons might fit into the fair use doctrine -- it's possible an argument could be made either way. Google isn't just performing an altruistic function of indexing the web. They earn money showing advertsing while doing so. And, the cache is a way of providing search even when the pages aren't available or have changed. It's good customer service.

If you use that noarchive meta tag, are the search results for your site influenced negatively? Might it be indexed less frequently? I would say, probably not. But we don't know this for certain. I've seen that issue raised, but it could be baseless. We just don't know.

Conclusion

A good deal of the sites on the web are copyrighted, and when Google indexes those sites, it does so without express permission. While a webmaster can take some affirmative action to deny Google to index pages, Google is still taking without permission. It is possible that Google's taking and indexing of pages could be considered fair use of the copyrighted information on the web sites.

Google's display of other people's copyrighted pages on its server is a question that hasn't been litigated, and may or may not not be protected under laws such as the Digital Millennium Copyright Act, or the doctrine of fair use.

It seems to me that if there is a problem here, it is one of those few and special problems that should NOT be fixed. I cannot imagine a "fix" which would not have dire consequences for everyone involved.

I mean really, what gives Google the right to display the cache? Regardless of whether or not people can use noarchive tags, what gives Google a legal right to take those files and display them upon their own server?

They can index sites just as easily without allowing people access to the cache files.

They don't have permission from the copyright holders.

Why is there a cached copy? How does Google benefit by having one? How does a copyright holder benefit? Remember, they don't need to display cached files to show search results.

As a copyright holder, I have a right to decide how other people use my copyrighted pages. It's my call as to whether or not someone else needs permission to use them. When someone reproduces the entire work for their own financial gain, the fair use doctrine may have questionable signficance. I'm not sure that the noarchive tag would work as a valid defense in court.

Of course, I'm not likely to be a plaintiff in a case against Google for copyright infringement on the basis of the cache. But, as we saw from the cease and desist letter above, a Microsoft might be willing to make such a legal challenge.

Recently I have noticed that the age of the cached copy may jump violently without it apparently affecting the SERP's. I have a home page which changes frequently and is usually showing a cached copy a few days old. Then for a month the cached copy will be a month old consistently. Now we are back to very recent cached copies. So I ask the question is the cached copy the one used in determining the SERP. If not, Google should be declaring that this is "a" cached copy and not "the" cached copy.

If this supposition is true, then the value of the cached copy drops considerably. If there is legal flak for Google (as there well may be), I would imagine they would drop it pretty quickly. It is probably a pain to deliver the cached copy and the person in the street may well hardly ever use it. It's only the SEO experts who may find it of possible use.

Regardless of the use of caching by search engines and crawlers of all types, it is the displaying of the cache on Google's site that is the issue here, I think.

Personally, I have no strong opinion on this either way, but the copyright issue seems to be used mainly as another stick to beat Google with. It is usually people with a strong anti-Google agenda who bring this up and really, I don't think they care any more about it than I do.

Human brain: keeps and displays local copies of pagesComputer Printer: keeps and displays local copies of pagesComputer memory: keeps and displays local copies of pagesBrowser caches: keep and display local copies of pagesWeb caches: keep and display local copies of pagesGoogle: keeps and displays local copies of pages

If the Google cache complied with these, would there be any argument at all? Does it?

All search engines want the searcher to see what they saw. Freshness is a big reason why this does not occur. If Google reindexed more regularly - say within 2 hours - would there be a problem then? Would it be the same problem? What about 1 hour? 1 minute? Instantly? Is it the displaying of the cache that's the problem, or the freshness of the display?

Suppose we flip the whole thing on its head...suppose Google dropped from its search results any page which it suspected was no longer "fresh", i.e. that searchers would not see what they were looking for when they reached the page. This would seem to improve relevancy. Would Webmasters be happy then? What if they dropped the direct link to the page, and *only* allowed the cached page to be viewed, because that was the only page guaranteed to contain what the searcher was looking for (based on on-the-page criteria).

bragadocchio, excellent posts. Like you, I am ambivalent. I tend towards the "no permission required, permission may be revoked" camp, so I'll just post the counters to your counters, without necessarily agreeing with them 100%. [quote]Subscription content cached

It seems like a reasonable business model when a site makes information available for a limited period of time, and then moves it to an area of their site where people have to subscribe to see the material. But what if the articles still remain available in the Google cache? And people get to read them for free?[/quote]What if somebody printed it when it was freely available? What if AOL continued to make it available through their caching proxies? What Google is doing is not unique.[quote]Unauthorized material on website and Google cache

Sometimes people publish things online that they shouldn't. Sometimes corrections need to be made to online newspapers, or sites release information that they shouldn't such as an access code to software.[/quote]IMO, under section 512 of the DMCA (which the Google cache feature pre-dates), Google is not responsible for that. Nor would AOL be if their caching proxies showed the unauthorized material. Nor would normal users be if their browser caches contained the material.[quote]Fair Use, Permission and Fear of Going Cacheless

Even though I'm not sure that those reasons might fit into the fair use doctrine -- it's possible an argument could be made either way. Google isn't just performing an altruistic function of indexing the web. They earn money showing advertsing while doing so. And, the cache is a way of providing search even when the pages aren't available or have changed. It's good customer service.[/quote]Google does not show advertising on cached pages. If it did, you might have a very strong case there. As it is, I am often pestered by resellers for Ask Jeeves and Lycos, trying to sell me banner space on the framed page that they display a site within after someone clicks on their search result. The sales pitch is that the site the searcer is looking at is my competitor (despite the fact that competitor may also be paying for the privilege). Framing - and especially selling advertising outside the frame - what do you make of that? Why isn't there uproar about that, when there is about the cached page?[quote]If you use that noarchive meta tag, are the search results for your site influenced negatively? Might it be indexed less frequently? I would say, probably not. But we don't know this for certain. I've seen that issue raised, but it could be baseless. We just don't know.[/quote]If I was writing the search engine, I WOULD take notice of the noarchive tag as an indication of the confidence I could place in my on-the-page ranking criteria. The presence of noarchive would lower my confidence. I'm not saying Google does that, only that there could be good reason why it would.[quote]A good deal of the sites on the web are copyrighted, and when Google indexes those sites, it does so without express permission. While a webmaster can take some affirmative action to deny Google to index pages, Google is still taking without permission. It is possible that Google's taking and indexing of pages could be considered fair use of the copyrighted information on the web sites.[/quote]IMO this is provided for by Section 512 of the DMCA.[quote]Google's display of other people's copyrighted pages on its server is a question that hasn't been litigated, and may or may not not be protected under laws such as the Digital Millennium Copyright Act, or the doctrine of fair use.[/quote]IMO this is provided for by Section 512 of the DMCA. But NOT the part that provides for search engines (Section d) - rather, the part that applies to System Caching (section b ).[quote]I mean really, what gives Google the right to display the cache? Regardless of whether or not people can use noarchive tags, what gives Google a legal right to take those files and display them upon their own server?[/quote]That which gives any cache provider the right to do the same. The Web is full of caches.[quote]Why is there a cached copy?[/quote]To allow searchers to see the content that caused that URL to be listed at that position in the search results.[quote]How does Google benefit by having one?[/quote]Google lists 10 things Google has found to be true. #1 is "Focus on the user and all else will follow". The Cached page feature is a service for searchers, so Google benefits. You can bet given the amount of usability testing they do, that if searchers didn't find it useful it would have been dropped long ago.[quote]As a copyright holder, I have a right to decide how other people use my copyrighted pages. It's my call as to whether or not someone else needs permission to use them. When someone reproduces the entire work for their own financial gain, the fair use doctrine may have questionable signficance.[/quote]The fact that there is no advertising on the cached page makes it tricky to argue that there is any direct financial gain to be had from showing it ... Google is simply providing a service to searchers.

All very good arguments on both sides. The interesting thing about Google is they know how to make a precedent. By this, I mean, when they started placing the 30 year cookie or whatever. People couldn't believe it! What? A 30 year cookie?

Then again, why not? It was ASSUMED it was wrong only cuz nobody had ever thought to do it before--therefore there was no law against it.

I think it's the same way with the cache. Not to mention the fact that we are an extremely litigious society to begin with!

Google does this stuff not because it CAN, but because nobody ever said they CAN'T. There's the difference.

How often do you replace PC/browser/OS, or just wipe your cookies? No cookie of value really lasts more than about a year, IMO. 30 years is Google's joke. Would anybody complain if it was a month, but was refreshed every time you accessed Google rather than only when you changed it?

I started with a response, but I'm not sure that I have the energy or time to flesh the counter arguments out.

I'll keep it very simple. And short.

There is a difference between using a cache as part of transporting information, and featuring it as part of your advertising sponsored search services for when an actual site is unavailable or changes.

There's a good argument to be made that using a cache in that manner isn't covered under the DMCA.

The other problem is that the DMCA's limitation of liability is for when someone infringes upon someone else's copyright, and the cache mirrors the infringing activity. It's not intended to be a shield in case a service provider is the one infringing.

Added -- I don't know if this is the type of information or debate that you expected Deborah.

There is a difference between using a cache as part of transporting information, and featuring it as part of your advertising sponsored search services for when an actual site is unavailable or changes.

I'm not quite sure what you mean.

I see PPC ads on results pages that have the Cached links on them.

However, I see no advertising on the cached pages themselves. Just a header that clearly explains that it is a cached page, and provides a link to the current page. I suppose that might be classified as advertising by some, but IMO it isn't. IMO some kind of in situ explanation for the cache is essential.

In terms of "transporting information", this is how I see it:

1) the searcher is searching for information2) the search engine saw some relevant information on a page, some time in the past3) the search engine provides the page it saw, containing that information4) the information is thus transported

If the cache was delivered unlabelled, or containing advertising, or if there was no means to prevent it, I would have an issue with it.

As it is, I see the cache as providing a service to searchers. The cache allows searchers to see the content that caused a particular URL to be listed at a particular position in the search results.