24 January 2013

Storing Digital Information in DNA May Be Commercially Viable In The Near Future

Image: extremetech.com

Researchers at the European Molecular Biology Laboratory and the European Bioinformatics Institute (EMBL-EBI) have developed a process that would make storing digital information in DNA commercially viable.

Deoxyribonucleic acid (DNA) are biological molecules containing genetic information used in the development and function of all known living organisms.

DNA stores information based on the arrangement of four chemical bases: adenine (A), guanine (G), cytosine (C), and thymine (T). The sequence of the bases determines how a particular cell or organism is maintained.

The way DNA stores information is similar to that of a computer. Instead of 0s and 1s in computer bits, DNA uses the A,G,C, and T bases. The four bases pair up to form a DNA base pair which is attached to a sugar and a phosphate molecule. The resulting structure from multiple base pairs attached to the sugar/phosphate molecule is the DNA double helix.

Last 2012, Harvard researchers used the four DNA nucleobases as binary markers. They substituted A and C for the digit 0 and the T and G for the digit 1. Whereas in computers, information would come out in 1s and 0s like 00101110011100, DNA encoded information would come out like this: TGAACCTCAAGTAACCTT.

Using this technique, they managed to store 700 terabytes of data in a single gram of DNA. Researchers at EMBL-EBI have managed to develop a process to encode information into DNA using next-generation DNA synthesis and sequencing technologies.

Storing information in DNA has a lot of benefits. DNA can last as long as 10,000 years (as evidenced by the DNA collected from prehistoric organisms). And since information is stored by volume (in a vial or beaker) and not by plane (planar, as in a hard drive disk surface), it is more compact and economical. It doesn't even require any power source like a hard drive to run. Reading back the information encoded is as simple as sequencing the DNA, a normal medical process.

All the world's digital information which is estimated to be around 3 zettabytes (3 billion terabytes) can be stored in less than 5 grams of DNA. As of 2012, no storage system has the capability to store 1 zettabyte of information.

Storing Digital Data in DNA

Researchers at the EMBL-European Bioinformatics Institute (EMBL-EBI) have created a way to store data in the form of DNA – a material that lasts for tens of thousands of years. The new method, published today in the journal Nature, makes it possible to store at least 100 million hours of high-definition video in about a cup of DNA.

Video: Storing Data in DNA

There is a lot of digital information in the world – about three zettabytes' worth (that's 3000 billion billion bytes) – and the constant influx of new digital content poses a real challenge for archivists. Hard disks are expensive and require a constant supply of electricity, while even the best 'no-power' archiving materials such as magnetic tape degrade within a decade. This is a growing problem in the life sciences, where massive volumes of data – including DNA sequences – make up the fabric of the scientific record.

"We already know that DNA is a robust way to store information because we can extract it from bones of woolly mammoths, which date back tens of thousands of years, and make sense of it," explains Nick Goldman of EMBL-EBI. "It's also incredibly small, dense and does not need any power for storage, so shipping and keeping it is easy."

Reading DNA is fairly straightforward, but writing it has until now been a major hurdle to making DNA storage a reality. There are two challenges: first, using current methods it is only possible to manufacture DNA in short strings. Secondly, both writing and reading DNA are prone to errors, particularly when the same DNA letter is repeated. Nick Goldman and co-author Ewan Birney, Associate Director of EMBL-EBI, set out to create a code that overcomes both problems.

"We knew we needed to make a code using only short strings of DNA, and to do it in such a way that creating a run of the same letter would be impossible. So we figured, let's break up the code into lots of overlapping fragments going in both directions, with indexing information showing where each fragment belongs in the overall code, and make a coding scheme that doesn't allow repeats. That way, you would have to have the same error on four different fragments for it to fail – and that would be very rare," says Ewan Birney.

The new method requires synthesising DNA from the encoded information: enter Agilent Technologies, Inc, a California-based company that volunteered its services. Ewan Birney and Nick Goldman sent them encoded versions of: an .mp3 of Martin Luther King's speech, "I Have a Dream"; a .jpg photo of EMBL-EBI; a .pdf of Watson and Crick's seminal paper, "Molecular structure of nucleic acids"; a .txt file of all of Shakespeare's sonnets; and a file that describes the encoding.

"We downloaded the files from the Web and used them to synthesise hundreds of thousands of pieces of DNA – the result looks like a tiny piece of dust," explains Emily Leproust of Agilent. Agilent mailed the sample to EMBL-EBI, where the researchers were able to sequence the DNA and decode the files without errors.

"We've created a code that's error tolerant using a molecular form we know will last in the right conditions for 10 000 years, or possibly longer," says Nick Goldman. "As long as someone knows what the code is, you will be able to read it back if you have a machine that can read DNA."

Although there are many practical aspects to solve, the inherent density and longevity of DNA makes it an attractive storage medium. The next step for the researchers is to perfect the coding scheme and explore practical aspects, paving the way for a commercially viable DNA storage model.