Monday, July 26, 2004

What to do with a 40 Petabyte iPod

A Conversation with Brewster Kahle "Let's consider the question of how much information there is. If you break it down, it turns out to be not that big of a deal. The largest print library in the world, which is the Library of Congress, has about 28 million volumes. A book is about a megabyte. That's just the ASCII of a book, if you put it in Microsoft Word. So 28 million megabytes is 28 terabytes, which fits in a bookshelf and costs about $60,000 right now. Storing books in ASCII is no problem, and the scanned images are more but still affordable.

Scanning books costs between $5 and $20. That's the mechanical cost if you just wanted to scan a book and end up with the images of the pages at high enough resolution that you could print it on a high-end laser printer so it would be a good facsimile at 600 DPI, color—a nice-looking book. So books are doable, in terms of technology.

Now let's take music. It's been estimated that there are about 2 to 3 million albums. In terms of salable units—things that were sold as either 78s, LPs, or CDs—that's the universe of commercial music. If you do the math again, it's a few more of your bookshelves. So you're still not talking about anything daunting.

If you take movies and video, Rick Prelinger [founder of a film collection known as the Prelinger Archives] estimated that the total number of theatrical releases of movies was between 100,000 and 200,000. Again if you do the math, based on DVD quality, you come up with low numbers of petabytes [one petabyte is 1 million gigabytes]."

You'd still have enough storage space left over for your address book, email, and every second of your life in video.

What was also interesting is the comments about the printing of library books rather than borrowing:
"A 100-page black-and-white book with current toner and paper costs in the United States is $1, not figuring labor costs, rights costs, or depreciation of capital. That's an interesting number, because at a buck a book, it turns out that for a library, it could be less expensive to give books away than to loan them. In his book, Practical Digital Libraries, Michael Lesk reported that it cost Harvard incrementally $2 to loan a book out and bring it back and put it on the shelf. This is not figuring in the warehousing costs and all the building costs. This is just the incremental cost of loaning a book out."