1: The Text Deluge

According to one estimate, human beings created some 150 exabytes (billion gigabytes) of data in 2005 alone. This year, we will create approximately 1,200 exabytes. The Library of Congress recently announced its decision archive Twitter, which includes the addition of some 50 million tweets per day. A search in Google Books for the phrase “slave trade” in July 2010, for example, returned the following: “About 1,600,000 results (0.21 seconds).” Scholars once accustomed to studying a handful of letters or a couple hundred diary entries are now faced with massive amounts of data that cannot possibly be analyzed in traditional ways.

The trend towards an increasing deluge of information raises the question posed by Gregory Crane in 2006: “What do you do with a million books?” “My answer to that question” wrote Tanya Clement and others in a 2008 article, “is that whatever you do, you don’t read them, because you can’t.”

Luckily, scholars need not adhere to traditional methods. Increasingly humanities scholars are adopting digital tools to analyze large quantities of data in new ways. New forms of analysis have emerged as computer processing has progressed, allowing greater maneuverability within large amounts of data. The same processing power of a mainframe computer a couple decades ago now fits inside an iPhone. In addition to processing technology, advancements have improved access to data and the speed and ease of transferring it. Text mining allows scholars to deal with this massive quantity of data–to draw out patterns that may not be visible to human readers.