Almost all of the human genome is made of noncoding, or "junk" DNA, that is, DNA that usually doesn't get copied and encoded into proteins.

So when copying DNA, how do cells tell the diference between actual genes and non-coding DNA?

Transcription begins at regions on the DNA molecule called promoters, sequences located at the beginning of genes that are to be copied. The enzyme that copies DNA, called RNA polymerase, latches on to the promoter and starts unzipping the DNA double helix, spooling out a chain of what will become messenger RNA – mRNA for short – that contains the information of the gene. [Editor's note:The original version of this paragraph used the term "replication" instead of "transcription." DNA replication is a different process. The Monitor regrets the error.]

But how does the RNA polymerase know which direction to go? Until now, scientists didn't know. But in research published in the current issue of the scientific journal Nature, MIT biologists say they have discovered the mechanism that points transcription in the right direction.

In all living things except bacteria, the RNA polymerase continues unzipping the DNA until it reaches a stop signal, at which point it stops copying and begins adding a chain of adenine bases to the pre-mRNA molecule, usually a couple hundred links long. This "poly-A" tail protects the mRNA as it exits the nucleus and travels to the ribosome, where the molecule's information is synthesized into proteins.

By sequencing mRNA of mouse embryonic stem cells, the researchers found that the signal sequences for creating poly-A tails – a process known as polyadenylation – are also prevalent "upstream" from the promoter. An RNA polymerase that encounters these sequences will chop up its pre-mRNA. Sequences of DNA that are to be coded into genes, by contrast, have a low density of polyadenylation signal sequences.

The researchers also found that the polyadenylation signal sequences are more likely to be ignored when they appear within coding sequences, thanks to a tiny protein complex called U1 snRNP. When U1 snRNP binds to an RNA polymerase, polyadenylation is supressed. The researchers discovered that binding sites for U1 snRNP are more prevalent in coding sequences than noncoding ones.

“Once you see some data like this, it raises many more questions to be investigated, which I’m hoping will lead us to deeper insights into how our cells carry out their normal functions and how they change in malignancy,” says Phillip Sharp, a professor at MIT's Koch Institute for Integrative Cancer Research and a co-author of the study, in a statement.