Language in junk DNA

How do we explain 97% of our DNA being 'junk' - meaningless advertisements in the broadcast program of our genetic code. Could this junk DNA have a role in the language burried inside DNA.

By Karl S. Kruszelnicki

You've probably heard of a molecule called DNA, otherwise known as "The Blueprint Of Life". Molecular biologists have been examining and mapping the DNA for a few decades now. But as they've looked more closely at the DNA, they've been getting increasingly bothered by one inconvenient little fact - the fact that 97% of the DNA is junk, and it has no known use or function! But, an usual collaboration between molecular biologists, cryptoanalysists (people who break secret codes), linguists (people who study languages) and physicists, has found strange hints of a hidden language in this so- called "junk DNA".

Only about 3% of the DNA actually codes for amino acids, which in turn make proteins, and eventually, little babies. The remaining 97% of the DNA is, according to conventional wisdom, not gems, but junk.

The molecular biologists call this junk DNA, introns. Introns are like enormous commercial breaks or advertisements that interrupt the real program - except in the DNA, they take up 97% of the broadcast time. Introns are so important, that Richard Roberts and Phillip Sharp, who did much of the early work on introns back in 1977, won a Nobel Prize for their work in 1993. But even today, we still don't know what introns are really for.

Simon Shepherd, who lectures in cryptography and computer security at the University of Bradford in the United Kingdom, took an approach, that was based on his line of work. He looked on the junk DNA, as just another secret code to be broken. He analysed it, and he now reckons that one probable function of introns, is that they are some sort of error correction code - to fix up the occasional mistakes that happen as the DNA replicates itself. But even if he's right, introns could have lots of other uses.

The next big breakthrough came from a really unusual collaboration between medical doctors, physicists and linguists. They found even more evidence that there was a sort-of language buried in the introns.

According to the linguists, all human languages obey Zipf's Law. It's a really weird law, but it's not that hard to understand. Start off by getting a big fat book. Then, count the number of times each word appears in that book. You might find that the number one most popular word is "the" (which appears 2,000 times), followed by the second most popular word "a" (which appears 1,800 times), and so on. Right down at the bottom of the list, you have the least popular word, which might be "elephant", and which appears just once.

Set up two columns of numbers. One column is the order of popularity of the words, running from "1" for "the", and "2" for "a", right down "1,000" for "elephant". The other column counts how many times each word appeared, starting off with 2,000 appearances of "the", then 1,800 appearances of "a", down to one appearance of "elephant".

If you then plot on the right kind of graph paper, the order of popularity of the words, against the number of times each word appears you get a straight line! Even more amazingly, this straight line appears for every human language - whether it's English or Egyptian, Eskimo or Chinese! Now the DNA is just one continuous ladder of squillions of rungs, and is not neatly broken up into individual words (like a book).

So the scientists looked at a very long bit of DNA, and made artificial words by breaking up the DNA into "words" each 3 rungs long. And then they tried it again for "words" 4 rungs long, 5 rungs long, and so on up to 8 rungs long. They then analysed all these words, and to their surprise, they got the same sort of Zipf Law/straight-line-graph for the human DNA (which is mostly introns), as they did for the human languages!

There seems to be some sort of language buried in the so-called junk DNA! Certainly, the next few years will be a very good time to make a career change into the field of genetics.

So now, around the edge of the new millennium, we have a reasonable understanding of the 3% of the DNA that makes amino acids, proteins and babies. And the remaining 97% - well, we're pretty sure that there is some language buried there, even if we don't yet know what it says. It might say "It's all a joke", or it might say "Don't worry, be happy", or it might say "Have a nice day, lots of love, from your friendly local DNA".