It’s easy to feel both amazed and utterly overwhelmed by the amount of information that humans have created in the digital age. And now, researchers have calculated a number to go with those feelings. A big one.

As of 2007, humans had the capacity to store 295 exabytes. An exabyte is 1018 bytes. If you think of the gigabytes (a billion bytes) in which your hard drive space might be measured, an exabyte is a billion of those gigabytes. Another size comparison: Astronomers, by necessity, are designing new information processing techniques to help them grapple with the coming age of “petascale” astronomy, because they’re starting to get more information than they can handle. “Exa” is the prefix after “peta”; it’s a thousand times more.

Or, simply, a stack of CDs storing 295 exabytes of information would reach beyond the moon.

Scientists calculated the figure by estimating the amount of data held on 60 analogue and digital technologies during the period from 1986 to 2007. They considered everything from computer hard drives to obsolete floppy discs, and x-ray films to microchips on credit cards. The survey covers a period known as the “information revolution” as human societies transition to a digital age. It shows that in 2000 75% of stored information was in an analogue format such as video cassettes, but that by 2007, 94% of it was digital. [BBC News]

This isn’t just about tallying gargantuan numbers and cheeky comparisons to blow our minds, though. The study by Martin Hilbert that created this huge estimate comes from a special issue of the journal Science dedicated to figuring out how to deal with the torrent of information that gets larger every day. As mentioned above, astronomers are quickly finding themselves awash in more data than they can process. (Keep your eye out for the upcoming April issue of DISCOVER, which explains bold new projects that will help to solve these problems).

And stargazers are far from the only ones suffering a scientific case of TMI. In a separate study, Elizabeth Pennisi details the problem for biologists:

A single DNA sequencer can now generate in a day what it took 10 years to collect for the Human Genome Project. Computers are central to archiving and analyzing this information, but their processing power isn’t increasing fast enough, and their costs are decreasing too slowly, to keep up with the deluge.

New processing technology, as well as citizen science projects like those of the Zooniverse (Galaxy Zoo and company) can help sort through the incoming data, but there’s also the problem of saving and reassessing what already exists.

It may seem odd that particle physicists would ever want to look back at decades-old experiments as they forge ahead with newer, bigger hardware. However, with updated theories and perspectives, physicists can extract new results from old data. Siegfried Bethke, the head of the Max Planck Institute for Physics in Munich, Germany, managed to publish over a dozen papers when he reexamined data from his days as a young physicist at DESY, a high-energy physics lab in Germany. [Ars Technica]

When can I get an Exabyte hard drive for my computer? I am tired of running out of space.

Brian Too

Re: “Computers are central to archiving and analyzing this information, but their processing power isn’t increasing fast enough, and their costs are decreasing too slowly, to keep up with the deluge.”

This goes against everything that I know about Information Technology. And I’m in the field. If it’s true then it is almost certainly a short-term data glut. The same thing happened when digital imaging became popular in the medical community. There was a period of several years when the torrent of data was overwhelming, but nowadays it is routinely handled and no big deal.

http://clubneko.net/ nick

Re: Brian, I don’t think you exactly understand the scales involved here because they are so far removed from the levels of information technology that are routinely handled in industry.

295 million terabyte hard drives would be required to store all the information available to the entire world in the year 2007. Assuming that only doubles every two years, humanity will have three quarters of a thousand exabytes (3/4th zettabyte) by the end of 2011. Or almost a billion terabyte hard drives. A stack over 15,782 miles high, assuming 1 inch per hard drive, 1 billion inches / 12 inches per foot / 5280 feet-per-mile. A year from then that stack of drives would circle the earth at the equator.

Torrent does not begin to describe how humanity is generating data. The biggest storage solution I could find from a few minutes of searching ( http://www.sgi.com/products/storage/raid/16000.html ) stores 1.2 petabytes in a single rack. You would need over 245 thousand racks to store what humanity had accumulated by 2007 if you were using hard drives.

The LHC alone is expected to generate 15 trillion gigabytes (no that is not a typo – http://www.nsf.gov/discoveries/disc_summ.jsp?cntn_id=111420 ) of data PER YEAR. In other words, the LHC will generate 60 times the amount of data humanity had available to it in 2007 on a per year basis. And this says nothing of all the new space and ground based telescopes generating gigabits per day, nor of the supercomputers running simulations that are also pouring out torrents of data. The LHC pumps out so much data that they could fill up the best single hard drive you could buy once every second and a half – and that’s just from the ATLAS detector.

Yes, we will invent new storage solutions that make our current technology look like floppy disks, but our capacity for collecting and generating that data will be increasing just as fast if not faster.

J.L.Lee

MPAA and the religious censors will be reviewing the LHC data to search for improper interactions between sub-atomic particles or possibly anything Sony thinks it owns! Anyway, How do you defrag 295 exabytes?

Fernando

The silly inevitable question, how much of those 295 exabytes is porn?

Russ

The real issue is discarding that which is obsolete or irrelevant. How much of that data is worth storing? And how would we go about sorting through all of that knowledge? We all know in our personal lives that finding what we know we have can be the hardest thing. We are at informational overload. Imagine what has been learned and lost in human civilization preceding us.

Chris

And I wonder how many of those exabytes are filled with porn?

Brian Too

@3. nick,

So then the sky is falling? I think not.

Over and over again we’ve heard about how the next information wave was going to overwhelm our systems. Well sometimes it happened, briefly, but then the IT industry learned to cope. Also clients had to adjust their expectations a bit.

Then it was video. Advanced inter-frame compression algorithms (mainly MPEG-2 and later -4) and improved mass storage systems helped there.

Next it was the mania to save “everything”, including the Internet. Improved mass storage systems stepped up to the plate, combined with powerful search mechanisms.

See a pattern developing? The LHC is churning out data to be sure, but they’ve developed a tiered filtering and extraction system. Clustered management systems are quite capable of dealing with the remainder.

You see the ultimate governor on the system is that if our computers are not capable, then the implementers stop trying. For a while. Then the technology catches up and the systems architects rub their hands together, thinking gleefully, “it might be possible if we approach it this way…”

Of course if you insist, then by all means sing out, “the sky is falling!”