Digital Data and DNA

One of the benefits of constantly proliferating information is that we’re getting better and better at storing lots of stuff in small spaces. I love the fact that when I travel, I can carry hundreds of books with me on my Kindle, and to those who say you can only read one book at a time, I respond that I like the choice of books always at hand, and the ability to keep key reference sources in my briefcase. Try lugging Webster’s 3rd New International Dictionary around with you and you’ll see why putting it on a Palm III was so delightful about a decade ago. There is, alas, no Kindle or Nook version.

Did I say information was proliferating? Dave Turek, a designer of supercomputers for IBM (world chess champion Deep Blue is among his creations) wrote last May that from the beginning of recorded time until 2003, humans had created five billion gigabytes of information (five exabytes). In 2011, that amount of information was being created every two days. Turek’s article says that by 2013, IBM expects that interval to shrink to every ten minutes, which calls for new computing designs that can handle data density of all but unfathomable proportions.

A recent post on Smithsonian.com’s Innovations blog captures the essence of what’s happening:

But how is this possible? How did data become such digital kudzu? Put simply, every time your cell phone sends out its GPS location, every time you buy something online, every time you click the Like button on Facebook, you’re putting another digital message in a bottle. And now the oceans are pretty much covered with them.

And that’s only part of the story. Text messages, customer records, ATM transactions, security camera images…the list goes on and on. The buzzword to describe this is “Big Data,” though that hardly does justice to the scale of the monster we’ve created.

The article rightly notes that we haven’t begun to catch up with our ability to capture information, which is why, for example, so much fertile ground for exploration can be found inside the data sets from astronomical surveys and other projects that have been making observations faster than scientists can analyze them. Learning how to work our way through gigantic databases is the premise of Google’s BigQuery software, which is designed to comb terabytes of information in seconds. Even so, the challenge is immense. Consider that the algorithms used by the Kepler team, sharp as they are, have been usefully supplemented by human volunteers working with the Planet Hunters project, who sometimes see things that computers do not.

But as we work to draw value out of the data influx, we’re also finding ways to translate data into even denser media, a prerequisite for future deep space probes that will, we hope, be gathering information at faster clips than ever before. Consider work at the European Bioinformatics Institute in the UK, where researchers Nick Goldman and Ewan Birney have managed to code Shakespeare’s 154 sonnets into DNA, in which form a single sonnet weighs 0.3 millionths of a millionth of a gram. You can read about this in Shakespeare and Martin Luther King demonstrate potential of DNA storage, an article on their paper in Nature which just ran in The Guardian.

Image: Coding The Bard into DNA makes for intriguing data storage prospects. This portrait, possibly by John Taylor, is one of the few images we have of the playwright (now on display at the National Portrait Gallery in London).

Goldman and Birney are talking about DNA as an alternative to spinning hard disks and newer methods of solid-state storage. Their work is given punch by the calculation that a gram of DNA could hold as much information as more than a million CDs. Here’s how The Guardian describes their method:

The scientists developed a code that used the four molecular letters or “bases” of genetic material – known as G, T, C and A – to store information.

Digital files store data as strings of 1s and 0s. The Cambridge team’s code turns every block of eight numbers in a digital code into five letters of DNA. For example, the eight digit binary code for the letter “T” becomes TAGAT. To store words, the scientists simply run the strands of five DNA letters together. So the first word in “Thou art more lovely and more temperate” from Shakespeare’s sonnet 18, becomes TAGATGTGTACAGACTACGC.

The converted sonnets, along with DNA codings of Martin Luther King’s ‘I Have a Dream’ speech and the famous double helix paper by Francis Crick and James Watson, were sent to Agilent, a US firm that makes physical strands of DNA for researchers. The test tube Goldman and Birney got back held just a speck of DNA, but running it through a gene sequencing machine, the researchers were able to read the files again. This parallels work by George Church (Harvard University), who last year preserved his own book Regenesis via DNA storage.

The differences between DNA and conventional storage are striking. From the paper in Nature (thanks to Eric Davis for passing along a copy):

The DNA-based storage medium has different properties from traditional tape- or disk-based storage.As DNA is the basis of life on Earth, methods for manipulating, storing and reading it will remain the subject of continual technological innovation.As with any storage system, a large-scale DNA archive would need stable DNA management and physical indexing of depositions.But whereas current digital schemes for archiving require active and continuing maintenance and regular transferring between storage media, the DNA-based storage medium requires no active maintenance other than a cold, dry and dark environment (such as the Global Crop Diversity Trust’s Svalbard Global Seed Vault, which has no permanent on-site staff) yet remains viable for thousands of years even by conservative estimates.

The paper goes on to describe DNA as ‘an excellent medium for the creation of copies of any archive for transportation, sharing or security.’ The problem today is the high cost of DNA production, but the trends are moving in the right direction. Couple this with DNA’s incredible storage possibilities — one of the Harvard researchers working with George Church estimates that the total of the world’s information could one day be stored in about four grams of the stuff — and you have a storage medium that could handle vast data-gathering projects like those that will spring from the next generation of telescope technology both here on Earth and aboard space platforms.

I am not a geneticist or biologist of any kind so I can’t write a good review about the technology or wisdom of such a storage method other than to say that biological systems tend to break down over long periods of time, even small dots of DNA.

I can understand the information carrying capacity of DNA; livings things require googols of information in order to operate their bodies and reproduce, so putting vast amounts of generic info into DNA does make sense.

I would suggest making a virtual model of a DNA molecule, storing it in a crystal and loading the info that way. It would last longer IMO.