Future of Data: Encoded in DNA

By

Robert Lee Hotz

Updated Aug. 16, 2012 8:33 p.m. ET

In the latest effort to contend with exploding quantities of digital data, researchers encoded an entire book into the genetic molecules of DNA, the basic building block of life, and then accurately read back the text.

The experiment, reported Thursday in the journal Science, may point a way toward eventual data-storage devices with vastly more capacity for their size than today's computer chips and drives.

"A device the size of your thumb could store as much information as the whole Internet," said Harvard University molecular geneticist George Church, the project's senior researcher.

In their work, the group translated the English text of a coming book on genomic engineering into actual DNA.

DNA contains genetic instructions written in a simple but powerful code made up of four chemicals called bases: adenine (A), guanine (G), cytosine (C) and thymine (T).

The Harvard researchers started with the digital version of the book, which is composed of the ones and zeros that computers read. Next, on paper, they translated the zeros into either the A or C of the DNA base pairs, and changed the ones into either the G or T.

Then, using now-standard laboratory techniques, they created short strands of actual DNA that held the coded sequence—almost 55,000 strands in all. Each strand contained a portion of the text and an address that indicated where it occurred in the flow of the book.

In that form—a viscous liquid or solid salt—a billion copies of the book could fit easily into a test tube and, under normal conditions, last for centuries, the researchers said.

ENLARGE

Harvard biologist George Church, in his office Wednesday, recently encoded a book he wrote into the genetic molecules of DNA.
Kelvin Ma for the Wall Street Journal

The technique likely is a long way from being commercially viable. But it highlights the potential of DNA as a stable, long-term archive for ordinary information, such as photographs, books, financial records, medical files and videos, all of which today are stored as computer code.

"It shows that the vast increase in capacity to synthesize and sequence DNA can be applied to store significant amounts of data," said pioneering synthetic biologist Drew Endy at Stanford University, who wasn't involved in the project. "If you wanted to have your library encoded in DNA, you could probably do that now."

Molecular biologists have long known that DNA is a natural information-storage system inside every cell that encodes the recipe for individual heredity.

ENLARGE

Dr. Church keeps a vial of DNA encoded with copies of his latest book.
Kelvin Ma for the Wall Street Journal

The exact order of the DNA bases—which for the average person is a sequence of about three billion—determines the meaning of the biological instructions stored in genes and chromosomes, just as letters of the alphabet make up words and sentences.

Some scientists have been experimenting with ways to use that code to store other kinds of information.

Research groups in the U.S., Europe and Canada devised ways to use DNA to encode trademarks and secret messages in cells. And when genomics pioneer Craig Venter and colleagues created the first synthetic cell in 2010, they wrote their names into its DNA code, the way an artist might sign a painting, along with three literary quotations and a website address.

Other researchers used DNA to encode poetry and popular music inside the living cells of bacteria.

In 2003, genetic engineers at the Pacific Northwest National Laboratory in Washington state created micro-organisms that carry the tune of Disney's "It's a Small World (After All)" in their DNA.

Unlike these earlier DNA storage experiments, the DNA book reported Thursday wasn't inserted into a living cell but kept in a laboratory container. If incorporated into a living cell, the stored DNA data might be changed or erased by the normal process of cell biology.

"The cell kicks out foreign DNA," said Harvard's Dr. Church. "In a tube, it is less subject to evolution."

The Harvard effort stands out for its large scale, the scientists said. All told, the book contains 53,426 words, 11 illustrations and a JavaScript computer program. The 5.27 megabits of data are more than 600 times bigger than the largest data set previously encoded in DNA. It is the equivalent of the storage capacity of a 3.5-inch floppy computer disk.

"For some archival problems, this could be the wave of the future," said Dr. Church. The group has filed a patent on the technique.

The method requires a series of advanced laboratory procedures, microarray chips and a high-speed gene-sequencing machine to assemble the strands in the proper order, correct any errors and then read the final text.

The stored data "is sequential, like a magnetic tape, where you have to spool through stuff to get at the data," said bioengineer Sriram Kosuri at the Wyss Institute for Biologically Inspired Engineering at Harvard, who was the project's lead researcher.

It took several days to "write" the DNA form of the book, and even longer to read it back.

So far, the cost of synthesizing and sequencing such very long strands of data-rich DNA as from the book remains too high to make the technique a practical commercial data-storage medium for the foreseeable future.

But Dr. Church is confident those costs will drop dramatically, and the speeds increase, as more advanced technology becomes available.

"The cost of both synthesis and sequencing [of DNA] are plummeting in an unprecedented way," Dr. Church said.

Already, the production costs of generating raw, unassembled DNA sequence data, such as might be used to archive data, have dropped from $10,000 per million base pairs of DNA in 2001 to about 10 cents per million base pairs in 2012, according to the National Human Genome Institute.

"This new work demonstrates that there is a whole new market for these technologies, to synthesize DNA for people who want to store information," said Dr. Endy.

The experiment also set a milestone of sorts in the techniques of book marketing—Dr. Church wrote the book that was turned into DNA. Called "Regenesis," it is scheduled for conventional publication in October.

Dr. Church said he first considered encoding the novel "Moby Dick," but then chose to use his own manuscript because its combination of words, pictures and JavaScript code would better showcase DNA's capacity to handle different kinds of information.

This copy is for your personal, non-commercial use only. Distribution and use of this material are governed by our Subscriber Agreement and by copyright law. For non-personal use or to order multiple copies, please contact Dow Jones Reprints at 1-800-843-0008 or visit www.djreprints.com.