Siva Vaidhyanathan says that #Amazonfail is more than just about crowdsourcing and user tagging, it is about “metadata, cataloging, books, Web commerce, and justice.” A commenter quoted in theNew York Times states that “We have to now keep a more diligent eye on Amazon and how they handle the world’s cultural heritage.”

Have we really placed Amazon (and similar companies) in charge of our cultural heritage? Perhaps not directly, but many people have high expectations for these companies’ ability to make information accessible –even if this does not take into account most of the aspects of information literacy.

But libraries differ from these for-profit companies in how they organize information and why they exist. Most libraries are not-profit and their goal is to serve some type of public (what librarians call a patron group). Libraries are generally built on similar organizational systems to each other– such as Library of Congress or Dewey classification, but libraries are intentionally duplicative in their collections. Not only do libraries often have the same item in their collections, but through interlibrary loan, libraries are tied together in a larger network. And unlike Amazon and Google, even if a library’s online catalog wasn’t working, a user could still use the organizational system to find useful information.

But another major difference is that libraries — and even twitter — directly rely on people for the system to work, not a algorithm, as with Amazon and Google. As we’ve seen with Googlebombing and likely with #Amazonfail, it is possible for an algorithm to be fooled. Or provide inaccurate information.

We rely on Google quite openly, even though sometimes the information is not right. For example, as of when this post is posted, the top result when googling “four stages of tornadoes” gives the blunt answer of “u suck balls” from wiki.answers. This can’t possibly anywhere close to the correct answer to this scientific question, but it is the one Google’s algorithm is choosing!

“The proposed settlement will make Google the only company in the world with a license to use orphaned works. No other company will be able to buy a similar license because, outside the context of the proposed class-action settlement in this case, there is no one from whom to buy such a license….The settling parties plot a cartel in orphaned works.

… Because exclusive rights in orphaned works do not serve the ultimate purpose of copyright, the public domain has a claim to free, fair use of orphaned works.

We have the right to intervene to present the public domain’s claim to free, fair use of orphaned works. None of the present parties will present our claim….”

Many oppose this bill, including Harvard University, which has written a letter opposing this legislation:

The NIH public access policy has meant that all Americans have access to the important biomedical research results that they have funded through NIH grants. Some 3,000 articles in the life sciences are added to this invaluable public resource each month because of the NIH policy, and one million visitors a month use the site to take advantage of these research papers. The policy respects copyright law and the valuable work of scholarly publishers.

[Instead of passing this bill], Congress should broaden the mandate to other agencies, by passing the Federal Research Public Access Act first introduced in 2006. Doing so would increase transparency of government and of the research that it funds, and provide the widest availability of research results to the citizens who funded it.

Google, Amazon, and the publishing industry — are highly valuable and useful tools and services — but we should not allow closed proprietary systems to determine how we address information that belongs entirely or in part to the public — like the public domain, government publications, and publicly funded studies. And even when “public” information is not at issue, we need to become more wary on relying solely on these systems.

… no single digital archive or repository can ever be as secure and safe as multiple archives, libraries, and repositories. … The nature of digital information is that it can easily be corrupted, altered, lost, or destroyed. It can become unreadable or unusable without constant attention. Relying on any single entity is simply not as safe as relying on multiple organizations. … But this is about more than redundant copies. It is also about relying on different organizations because they have different funding sources, different constituencies, different technologies, and different collections. No single digital collection can ever be as safe as multiple, reliable digital collections.