Writing the Book in DNA

Using next-generation sequencing technology and a novel strategy to encode 1,000 times the largest data size previously achieved in DNA, Harvard geneticist encodes his book in life's language

BOSTON — Although George Church’s next book doesn’t hit the shelves until Oct. 2, it has already passed an enviable benchmark: 70 billion copies — roughly triple the sum of the top 100 books of all time.

And they fit on your thumbnail.

There are currently 70 billion copies of the forthcoming book, Regenesis, all stored in the form of DNA.

That’s because Church, a founding core faculty member of the Wyss Institute for Biologically Inspired Engineering at Harvard University and the Robert Winthrop Professor of Genetics at Harvard Medical School, and his team encoded in DNA the book, Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves in DNA, which they then decoded and copied.

Biology’s databank, DNA has long tantalized researchers with its potential as a storage medium: fantastically dense, stable, energy-efficient and proven to work over a timespan of some 3.5 billion years. While not the first project to demonstrate the potential of DNA storage, Church’s team married next-generation sequencing technology with a novel strategy to encode 1,000 times the largest amount of data previously stored in DNA.

The team reports its results in the Aug. 17 issue of the journal Science.

The researchers used binary code to preserve the text, images and formatting of the book. While the scale is roughly what a 5 1/4-inch floppy disk once held, the density of the bits is nearly off the charts: 5.5 petabits, or 1 million gigabits, per cubic millimeter. “The information density and scale compare favorably with other experimental storage methods from biology and physics,” said Sriram Kosuri, a senior scientist at the Wyss Institute and senior author on the paper. The team also included Yuan Gao, a former Wyss postdoc who is now an associate professor of biomedical engineering at Johns Hopkins University.

And where some experimental media — like quantum holography — require incredible cold temperatures and tremendous energy, DNA is stable at room temperature. “You can drop it wherever you want, in the desert or your backyard, and it will be there 400,000 years later,” Church said.

Reading and writing in DNA is slower than in other media, however, which makes it better suited for archival storage of massive amounts of data, rather than for quick retrieval or data processing. “Imagine that you had really cheap video recorders everywhere,” Church said. “Just paint walls with video recorders. And for the most part they just record and no one ever goes to them. But if something really good or really bad happens you want to go and scrape the wall and see what you got. So something that’s molecular is so much more energy efficient and compact that you can consider applications that were impossible before.”

About four grams of DNA theoretically could store the digital data humankind creates in one year.

George Church and Sriram Kosuri discuss the benefits of using DNA as a storage medium and the approach they developed.

Although other projects have encoded data in the DNA of living bacteria, the Church team used commercial DNA microchips to create standalone DNA. “We purposefully avoided living cells,” Church said. “In an organism, your message is a tiny fraction of the whole cell, so there’s a lot of wasted space. But more importantly, almost as soon as a DNA goes into a cell, if that DNA doesn’t earn its keep, if it isn’t evolutionarily advantageous, the cell will start mutating it, and eventually the cell will completely delete it.”

In another departure, the team rejected so-called “shotgun sequencing,” which reassembles long DNA sequences by identifying overlaps in short strands. Instead, they took their cue from information technology, and encoded the book in 96-bit data blocks, each with a 19-bit address to guide reassembly. Including jpeg images and HTML formatting, the code for the book required 54,898 of these data blocks, each a unique DNA sequence. “We wanted to illustrate how the modern world is really full of zeroes and ones, not As through Zs alone,” Kosuri said.

The team discussed including a DNA copy with each print edition of Regenesis. But in the book, Church and his co-author, the science writer Ed Regis, argue for careful supervision of synthetic biology and the policing of its products and tools. Practicing what they preach, the authors decided against a DNA insert — at least until there has been far more discussion of the safety, security and ethics of using DNA this way. “Maybe the next book,” Church said.

This work was supported by US Office of Naval Research (N000141010144), Agilent Technologies, and the Wyss Institute.

The Wyss Institute for Biologically Inspired Engineering at Harvard University uses Nature’s design principles to develop bioinspired materials and devices that will transform medicine and create a more sustainable world. Working as an alliance among Harvard’s Schools of Medicine, Engineering, and Arts & Sciences, and in partnership with Beth Israel Deaconess Medical Center, Boston Children’s Hospital, Brigham and Women’s Hospital, , Dana Farber Cancer Institute, Massachusetts General Hospital, the University of Massachusetts Medical School, Spaulding Rehabilitation Hospital, Tufts University, and Boston University, the Institute crosses disciplinary and institutional barriers to engage in high-risk research that leads to transformative technological breakthroughs. By emulating Nature’s principles for self-organizing and self-regulating, Wyss researchers are developing innovative new engineering solutions for healthcare, energy, architecture, robotics, and manufacturing. These technologies are translated into commercial products and therapies through collaborations with clinicalinvestigators, corporate alliances, and new start-ups.