DNA project interprets 'book of life'

Chat with us in Facebook Messenger. Find out what's happening in the world as it unfolds.

Scientists are learning more about how DNA works in our bodies.

Story highlights

The ENCODE project finds more than 80% of genome has biochemical function

There are 4 million sites in the genome where events occur, representing "switches"

"This century, we are going to be working out how we make humans," a scientist says

Our genes play a major role in making us who we are, but a lot of information about their function has been mysterious.

That's why an international team of researchers set out to figure out what the working parts of the human genome are, and what they mean for the human body as we know it.

The project is called the Encyclopedia of DNA Elements (ENCODE). When the Human Genome Project sequenced the human genome in 2003, it established the order of the 3 billion letters in the genome, which can be thought of as "the book of life."

ENCODE is about interpreting that -- understanding which genetic variants affect biochemical function, and which ones are associated with disease.

"This is the science for this century," said Ewan Birney of the European Bioinformatics Institute and lead analysis coordinator for ENCODE in a news conference Wednesday. "This century, we are going to be working out how we make humans starting from this simple instruction manual."

ENCODE has begun to reveal the circuitry, networks and choreography within the human genome, said Dr. Eric Green, director of the National Human Genome Research Institute, at the briefing. More than 30 scientific papers from the project are being published on it, in journals including Nature, Genome Research and Genome Biology.

Scientists have determined that there are 4 million sites in the genome where specific biochemical events occur, most of which have been discovered with ENCODE.

That means if you got your genome sequenced, there could be as many as 4 million differences between you and the person sitting next to you, Michael Snyder, a Stanford University professor who is the principal investigator for ENCODE, told CNN.

These sites are "switches," which determine which proteins and cells are going to be made, Richard Myers of HudsonAlpha Institute for Biotechnology, an ENCODE collaborator, said at the briefing.

Most genetic changes that are likely to cause disease, or are associated with disease, don't lie inside the genes themselves, but rather in these control elements, Snyder said.

Before, scientists believed that a lot of the human genome was junk -- that as little as 5% of it actually had a purpose in encoding proteins, Myers said.

So it was a surprise, according to ENCODE, to find a full 80% of the human genome sequence is associated with some biochemical function.

ENCODE allows researchers to identify which mutations are most likely to cause disease and drug responses. In a study on associations with diseases, scientists found links to heart disease and type II diabetes.

The regulatory switches can cause problems with genes that can lead to disease.

"Before, they were just mysterious events; now we can say, 'Well, they're associated with aparticular problem,'" Snyder said.

Snyder, who has had his own genome sequenced, believes personal genome sequencing will become an important part of health care. In his own case, genome sequencing revealed Snyder's risk for type II diabetes before his body started showing symptoms. Since he knew about the risk ahead of time, he was able to start managing his glucose levels early.

With the new ENCODE information, Snyder could start seeing which variants he has that might affect gene function but don't lie inside the gene; they reside in the control switches.

"I think this will become standard of care for many people," he said of genomic sequencing.

Still, there are issues with how much information about a sequence should be revealed to a person -- after all, just because a person has a genetic link to a disease, it doesn't necessarily mean that he or she will develop that condition, and the information could cause undue concern.

Before ENCODE, scientists' knowledge was mostly about the exome, the protein coding sequences that account for about 1.5% of the genome, Snyder said. The remaining 98.5% was mysterious until the ENCODE project.

A genome sequence as part of a research project costs under $4,000, but the proper interpretation costs a lot more, Snyder said. Some companies such as 23andMe offer DNA testing at a relatively low cost, but only sequence part of the genome.

"In my mind, it's a no-brainer to get your genome sequenced if you have cancer," Snyder said.

The human genome has about 20,000 genes, and 1,000 of those genes are regulators -- the "government" of the genome, says Mark Gerstein of Yale University. Regulators bind to places in the genome, and then they turn genes near that binding site on and off.

Gerstein and colleagues looked at the structure of these regulators, and determined that they have a somewhat corporate hierarchy structure. When you consider how information flows, it appears there are executives, managers and foremen.

ENCODE received about $123 million over five years from the National Human Genome Research Institute, in addition to $40 for the pilot project and $125 million for relevant technology development and research since 2003. And there's still much more work to do, scientists say.