DNA-based Storage Unleashes Tremendous Data Density

In 2010, Google's Eric Schmidt had claimed that humanity generates as much information every two days as it did from the dawn of civilization up till 2003. Researchers George Church, a professor of genetics at Havard Medical School, and Ewan Birney, Associate Director of the European Bioinformatics Institute (EBI), has offered DNA-based storage as a solution to counter the staggering amounts of data churned globally everyday.

Birney explained that the best thing about DNA as a storage mechanism is that it does not rely on electricity, and is especially dense and stable. The drawback is that it has to be stored cold, dry, and in the dark. A 2003 project, lead by Pak Chung Wong from the Pacific Northwest National Laboratory, transferred encrypted text into DNA by converting each character into a base-4 sequence of numbers, each corresponding to one of the four DNA bases (Adenine, Cytosine, Thymine, and Guanine – also known by the abbreviations A, C, T, and G).

Bacteria were considered to be an optimal host because of speedy replication, generating multiple copies of the data in the process. Should a mutation occur within a single bacterium, the remaining bacteria will still retain the original information.

However, fast replication rates of live DNA can compromise data over long periods of time. Inserted DNA could also hamper with the host's bacteria normal cellular processors, and destabilize the bacterial genome. Scientists have proposed using "naked" DNA instead, since living cells need not be present for DNA to remain intact. Unlike bacteria, naked DNA doesn’t require genetic manipulations to safely insert it into a host.

Birney, along with his team, encoded computer files totaling 739KB of unique data, including all 154 of Shakespeare’s sonnets, into naked DNA code, synthesized the DNA, sequenced it and reconstructed the the files with over 99% accuracy. Birney has said that a single gram of DNA can store about one petabyte (1015 bytes, or 1000TB) of data.

Reading and writing DNA is presently costly, so it's not exactly practical for mass storage, yet. Still, if you're looking at very long term applications, such as nuclear site location data, and other governmental, legal and scientific archives that need to be kept long-term but are infrequently accessed, this technology is considerably economical. Researchers have noted that current trends are reducing DNA synthesis costs, and DNA-based storage should be cost-effective for long-term archiving (~50 year periods) within a decade.