A Closer Look at DNA Data Storage

DNA data storage has been making headlines since one researcher at Harvard University managed to code an entire textbook into a piece of synthetic DNA. Since then, researchers from around the globe have been setting theirs sites on DNA data storage, the most recent breakthrough coming from a collaboration between researchers at the University of Washington and Microsoft, who together figured out how to code a record-braking 200 megabytes onto strings of synthetic DNA.

But why are people so excited about advancing this technology, and how does it work? The first question has an easy answer: there’s more digital data in the world now than there ever has been, and the amount of binary information worth storing increases exponentially every year. Because major corporations and national governments are currently struggling to find a cheap and space-efficient way to store all of their relevant data, there’s plenty of funding to be found for projects that might yield a much-needed solution to our current state of data storage chaos. DNA coding seems to have potential because it allows for information to be stored with perfect recall and incredible storage density, plus that data can be expected to remain in tact for hundreds to thousands of years. For the time being it’s still in its earliest and most experimental stages, but one day DNA coding technology truly could revolutionize the world of data storage.

The second question is a little harder to answer; to explain how digital information can be coded onto DNA will involve delving into the realm of biotech. The most important thing to understand is that digital information exists in binary code, i.e. long sequences of 1’s and 0’s. Genetic information is coded into DNA using four primary nucleobases: cytosine, guanine, adenine and thymine. These four bases form into two primary pairs; adenine pairs with thymine and cytosine pairs with guanine. When 0’s are replaced with one pair and 1’s are replaced with another, binary code can be converted into genetic code and strands of synthetic DNA can be constructed to become physical representations of this translation.

In this regard, DNA storage necessitates cutting-edge techniques in terms of data compression and security, as a sequence must be designed to be both sufficiently info-dense to realize DNA’s potential and redundant enough to allow robust error-checking to improve the accuracy of information retrieved down the line.

While the obstacles that come with developing this technology are substantial, the potential rewards are great. Storing data in DNA would make incredible data storage density possible. According to some estimates, DNA storage could make it possible to take the volume of data contained in about a hundred industrial data centers and store it in a space about the size of a shoe box.

This is possible because DNA coding units are smaller than half a nanometer wide on each side. The smallest transistors found in the most advanced computer storage drives are currently about 10 nanometers wide. While that in itself is an enormous leap forward in storage density, the game-changing factor exclusive to DNA storage is the fact that DNA can be packed three-dimensionally. Transistors, on the other hand, must be aligned on a single flat plane; these sheets of transistors can be packed tightly together on sheets of silicon and those sheets of silicon can be stacked, but the more transistors are working in a smaller space, the more likely overheating issues are going to occur.

DNA, on the other hand, can roll up extremely tightly, folding onto itself three-dimensionally. However tightly it packs itself, however, it can then be unrolled fairly easily in order to be read.

That said, researchers on the cutting edge of this technology have plenty of drawbacks to grapple with; DNA is difficult to read, yielding slower access speeds than technology currently on the market.