Since then, I’ve become somewhat obsessed with references to the size of the collections, and the use of “a Library of Congress” as a unit of measure. Just check Wikipedia under “unusual units of measurement.”

Library of Congress Main Reading Room

The ur-number seems to come from a 1997 report written by Michael Lesk titled “How Much Information Is There In the World?” In that report he provides the proposed calculation for the “size” of a digitized book, and the guesstimate that the Library had 20 millions books. To be fair, this report also makes a guesstimate about the size of collections of photographs, video, and audio, and comes up with the figure of 3 petabytes worth of collections. For 1997, this was a very well-informed estimate.

But the numbers that caught the public’s imagination were the ones for books. And that 10 TB figure is everywhere.

So, how many Libraries of Congress does it take to…? Or how many Libraries of Congress can be contained in…?

“Every Six Hours, the NSA Gathers as Much Data as Is Stored in the Entire Library of Congress.” LINK

“Facebook’s photo collection has a staggering 140 billion photos, that’s over 10,000 times larger than the Library of Congress.” LINK

“The [Honeywell India Technology] centre stores some 32 terabytes (32,768 GB) of data. That’s five times more than the world’s largest library – the US Library of Congress.” LINK

“The fiber optic cable is capable of transmitting data at a maximum of 40 gigabits per second from deep-sea locations where gaps of instrument coverage currently exist. For comparison, the entire print collection of the Library of Congress could be transmitted over the link in just more than 30 minutes.” LINK

“There are 25 Petabytes (10^15) created every day and thrown into the internet. This is 70 times larger than the Library of Congress.” LINK

“…it is estimated that the entire collection of the Library of Congress including photos, sound recordings and movies might take 3,000 TB of storage. Assuming $100 each for 2 TB hard drives, the entire book collection of the Library of Congress could be stored on about $1500 worth of hard drives at today’s prices.” LINK

“The upper end of the reference configurations is 96 blades [servers] with 1,152 cores, 9.2 TB memory and 57.6 TB of disk storage, enough disk space to store the entire Library of Congress six times.” LINK

“He keeps 500 terabytes of storage near Factual’s headquarters. That’s about twice the amount needed to hold the entire Library of Congress.” LINK

“The size of Facebook’s data retention database alone would be larger than all of the content that the Library of Congress has put online to date.” LINK

“… in a world where the entire Library of Congress will soon be accessible on a mobile device with search procedures that are vastly better than any card catalog, factual mastery will become less and less important. ” LINK

I have more of these, but I am always looking to add to my growing collection. Please let me know about more by commenting!

I have had a number of people write to ask just how much digital data there really is at the Library of Congress. We don’t provide numbers since it changes every day. All I will say is that it is multiple Petabytes across all the collections, servers, and tape libraries. That should start some conversations!

235 terabytes data collected by the US Library of Congress by April 2011

15 out of 17 sectors in the United States have more data stored per company than the US Library of Congress

p. 3

One exabyte of data is the equivalent of more than 4,000 times the information stored in the US Library of Congress. (footnote 6: According to the Library of Congress Web site, the US Library of Congress had 235 terabytes of storage in April 2011.

I meant to post that last week I was sitting across from Michael Lesk at an event dinner. I teased him that his report had a ruinous effect on my life and he laughed. He never expected that to be the item from that report that took on its own life, long after it was correct or relevant.

i read an article saying new methods can store data onto synthetic DNA and that scaling the amount up about 2.2 petabytes of data can be stored on a gram of DNA. so i got curious and measured that for a person of my weight 200 lbs about 90718 grams around 199,580 petabytes could be stored. great but its mostly just numbers to me HOW MUCH DATA IS 199,580 petabytes?! please im so intrigued and yet so unable to give this a workable scale in my mind

Arthur: That’s actually a much more difficult question than you think, because there are so many possible ways of calculating how many books have even been written. Written or published? In every language? And what constitutes a book? Is there a minimum length for what constitutes a book? Is a pamphlet a book? Is a serial a book? What about a government publication?

In short, there’s no easy way to come up with a definitive number of how many books have ever been written or published. And since more books are published internationally every minute, the number is constantly changing. Just like the size of the digital collections of the Library of Congress.

OK, yes this is all very amusing. Let’s point and laugh at the poor, harassed, writers and editors who continually “get it wrong.”

Or, the LoC could, you know, actually publish an accurate number? You could then point editors to that when you spot one of these hi-lar-i-ous errors and then they could update their stories. Would that be such a bad use of taxpayer dollars?

Anyway, in the absence, I’ll assume from Leslie’s nods and winks that it’s of the order of 10PB.

There’s definitely no intent to harass tech writers. The issue is that the number changes every single day in an effort that includes multiple sites and dozens of digitization projects and acquisitions of born-digital content. And because older numbers continue to live on the web, it’s easy for someone to find those out-of-date numbers, or make estimates based on calculations from many years ago. When people ask our public affairs office, I can always provide a current answer.

I would say the number _today_ is 6.5 PB, not including multiple tape archived copies. We grow at a rate over more than 15 TB/day.

““…it is estimated that the entire collection of the Library of Congress including photos, sound recordings and movies might take 3,000 TB of storage. Assuming $100 each for 2 TB hard drives, the entire book collection of the Library of Congress could be stored on about $1500 worth of hard drives at today’s prices.” LINK”

There’s no account for bad math. That would be $150,000 in hard drives at 2012’s pricing. Without redundancy.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully
responsible for everything that you post. The content of all comments is released into the public domain
unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless,
the Library of Congress may monitor any user-generated content as it chooses and reserves the right to
remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and
may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's
privilege to post content on the Library site. Read our
Comment and Posting Policy.

Disclaimer

This blog does not represent official Library of Congress communications.

Links to external Internet sites on Library of Congress Web pages do not constitute the Library's endorsement of the content of their Web sites or of their policies or products. Please read our
Standard Disclaimer.