Google Library--Scanning Back the Index

At first, it sounded like a wonderful idea. Google was going to take
the libraries of five major universities and scan them, making them
fully indexable and searchable. The project would allow anyone to find a
book on the topic they were looking for, narrow down a quotation to the
exact title and author where it originated and overall increase
Google's goal of organizing the world's information. Works in the
public domain would be available in their entirety while copyrighted
works would show only a sentence or two of context as well as the title
page. Additionally, copyrighted works would have links to booksellers
where the title could be purchased. The idea seemed bullet proof and
students everywhere rejoiced.

Unfortunately the publishers disagreed. Unlike Project Gutenberg
which has over 16,000 public domain works available on the web, the
publisher's feel threatened Google's venture into copyrighted content,
claiming that the scanning is infringement. In response to the outcry,
Google put the scanning of the copyrighted materials on hold and offered
the publishers the chance to opt out of having their works scanned.
Google makes the assertion that the scanning is fair use because they
will display only a snippet of the copyrighted material. Publishers,
including the more than 300 members of the Association of American
Publishers (AAP), disagree and are suing Google to prevent the search
company from continuing.

While publishers yell and scream about Google, the Open Content Alliance
has been garnering much better publicity. Initially launched by a
consortium of companies including Yahoo and recently joined by MSN, the
OCA is also scanning public domain and Creative Commons licensed work
but they're handling copyrighted works with a difference. The OCA only
scans books that have been opted into the program, saving them from the
legal battles currently plaguing Google.
Though the publishers are outraged, it's really not that hard to
understand why a search engine would believe that it is entirely
acceptable to require the copyright holder to opt out. After all, they
do it every day. Google, Yahoo, MSN, in fact, all the search engines
hold the view that unless you tell them they can't have your copyrighted
work, they will happily index it and even cache it, making it available
whether you will them to or not. That's the entire purpose of the
robots.txt, which itself is often ignored by the search engine spiders.
This overall approach makes Google's attitude quite understandable.

As Danny Sullivan
has pointed out before, the fact that these two situations parallel
each other is also what makes this case so important to the web as a
whole. Danny's case is simply this--if the library digitalization
project is copyright infringement, then every page ever indexed by
Google is also infringement and the Wayback Machine is most certainly
infringing. Some may claim, and indeed the AAP is doing so, that web
pages are different but that simply isn't true. Copyright laws apply
equally to web or book content. Google CEO Eric Schmidt equated the two
in a recent article with the Washington Post and reprinted to the
Google Blog.

Even those critics who understand that copyright law is not absolute
argue that making a full copy of a given work, even just to index it,
can never constitute fair use. If this were so, you wouldn't be able to
record a TV show to watch it later or use a search engine that indexes
billions of Web pages.

Remember that the AAP is concerned not with the republishing to the
web of the works--indeed Google has made it very clear that they won't
be doing that--but with the scanning in the first place, making the
digital copy. This is equated at the very least with caching and could
include search engine indexes as well.

So what does it all mean? At the moment, it's easy to opt out of being
indexed (though it's hard for an SEO to imagine why anyone would want
to). The trouble is that if Google's efforts are ruled as infringing
then we are looking at a whole new web. Online publishers will demand
the same rights as their traditional counterparts. It could result in
much smaller, less relevant indexes.

On the bright side, it will be much easier to be ranked number one in an index made up only of those who want to be in.

HQ Hours of Operation:
8:30am to 5:30 pm Pacific timeDays of Operation:
Monday through Friday — email works other times in many casesSupport Operations:
M-F 9:00 to 5:00 Email Support FormTraining Facility:
Please see the training facility map