Introduction

The information content in human DNA is enormous, but we are just beginning
to understand how efficiently the DNA is encoded. Scientists had originally
speculated that the human genome contained up to 100,000 genes. However, the
human genome project showed that it contained only one quarter that number,
mostly because each gene can code for multiple transcripts. Scientists also thought
that only the protein coding DNA, comprising only 3% of the DNA, was useful. The
other 97% of the DNA was thought to be junk. However,
the last few decades of research have shown that the vast majority (>80%) of
non-coding DNA is functional. Much of the non-coding DNA is involved in
regulation of transcription (the intermediate step in which mRNA is generated,
from which the protein is translated). However, scientists have now discovered
that some of the protein coding DNA not only codes for the protein sequence, but
simultaneously codes for sequences that bind transcription factors (proteins
that regulate the
transcription and expression of genes). These dual coding sequences have been termed "duons."

How the study was done

The scientists who authored the study used a naturally occurring enzyme
called DNAse I, which digests DNA. It turns out that the enzyme will only
degrade DNA that is not bound to proteins. Since transcription factors are
proteins that bind DNA, any transcription factors that are bound to DNA when it
is isolated are protected from digestion by DNAse I. Scientists isolated the DNA
from 81 different cell types and sequenced the fragments of DNA that were
preserved by binding to transcription factors. They had to use different cell
types because those different cells differentially express genes and
transcription factors on the basis of their own particular function. An example
dual coding region is shown in the figure to the right, which shows the gene
CELSR2, found on chromosome 1. The gene consists of 34 exons (coding regions),
with the ninth exon coding for the transcription factor CTFC, which is known to
regulate the transcription of numerous genes. It is interesting to note that
this short transcription binding site of the exon contains two arginine
residues, which are coded using two very different codons (AGG and CGC) in order
to match the sequence to which CTFC binds. Although most genes consist of
multiple exons (coding regions), the vast majority of duon sequences occur in
the first exon, which is what would be expected if the sequences were involved
in the regulation of gene expression.

Astounding levels of duons

The scientists had originally expected to find a few genes that
simultaneously coded for both proteins and transcription factor binding.
However, what they found was that 14% of coding sequence space were duons
(which represents over 400 million base pairs).
An astounding 86% of all genes expressed at least one duon sequence.
Scientist already knew that intronic sequences within the DNA coded for
transcription factor binding in order to regulate gene expression. However,
since exon coding regions are constrained by their need to code for specific
amino acids, it was never imagined that such regions of DNA could
simultaneously code for the binding of transcription factors, as well. The finding shows
the amazing efficiency of DNA sequences in complex organisms. Although the
authors of the study recognized the obvious optimization of the code, they
attributed such optimization to natural selection, rather than design:

"Our results indicate that simultaneous encoding of
amino acid and regulatory information within exons is a major functional
feature of complex genomes. The information architecture of the received
genetic code is optimized for superimposition of additional information (34,
35), and this intrinsic flexibility has been extensively exploited by
natural selection."

However, they failed to account for how selection could simultaneously
select for two diverse functions in the same, overlapping sequence of DNA
code.

Conclusion

Scientists have discovered that regulation of gene expression, originally
thought to occur only in non-coding DNA sequences, is, in fact, additionally dual coded into
the actual sequence of DNA that defines protein composition. Transcription
factors, which bind to specific short sequences of DNA, regulate how the genes
are expressed. The fact that these transcription factor binding sequences
overlap protein coding sequences, suggest that both sequences were designed
together, in order to optimize the efficiency of the DNA code. As we learn more
and more about DNA structure and function, it is apparent that the code was not
just hobbled together by the trial and error method of natural selection, but
that it was specifically designed to provide optimal efficiency and function.

Related Materials

Reasons
To Believe's Fazale Rana has written The Cell's Design, a comprehensive examination of the biochemistry
of the cell from a layman's perspective. Even so, the text does not gloss over
the significant details of how the cell works. As a scientist
myself, I see the design within the cell as much more beautiful than even the
most wonderful sunset. The cell's design certainly does reveal the artistry of
the Creator.

Darwin's Black Box author Michael Behe takes on the limits of
evolution through an examination of specific genetic examples. Behe finds that
mutation and natural selection is capable of generating trivial examples of
evolutionary change. Although he concludes that descent with modification has
occurred throughout biological history, the molecular devices found
throughout nature cannot be accounted for through natural selection and
mutation. Behe's book claims to develop a framework for testing intelligent
design by defining the principles by which Darwinian evolution can be
distinguished from design.