Look who's talking

12 July 2003

Steve Nadis

Steve Nadis is a freelance writer based in Cambridge, Massachusetts

LAURANCE DOYLE didn't set out to enlist animals in the hunt for extraterrestrial intelligence, but that's how his research has turned out. Doyle works at the SETI Institute in Mountain View, California, where his main priority is the search for extrasolar planets. It's a task that has given him great expertise in detecting almost imperceptible signals. And when he was invited to apply this skill to analysing animal communications, it led to an unexpected outcome: Doyle has found a new way to spot ET's call. It does more than just spot the signal, though - it could also offer hints as to how intelligent the alien is. But the truly amazing thing about Doyle's strategy is that it can give us all this information without the need to understand anything that's being said.

For more than 40 years, SETI has focused on scanning the heavens for radio or laser signals emanating from a distant civilisation. Far less attention has been placed on how to make sense of an extraterrestrial signal if one is, miraculously, intercepted. Prospects for interpreting such a signal are bleak, considering that after decades of attempts at interspecies communications here on Earth, animal researchers have failed to translate a single non-human "language".

Now Doyle seems to have found a way forward. It all started in 1995, when he was helping Brenda McCowan and Sean Hanser, animal behaviour experts at the University of California, Davis, categorise the whistles of bottlenose dolphins at Marine World in Vallejo, California. In an effort to understand the dolphins, the team decided to employ methods more normally used in the telecoms industry. The methods are based on information theory, a mathematical way of analysing any sequence of symbols - a series of DNA bases, numbers, letters, phonemes, words or phrases, for instance - to see whether it contains information. According to the theory, the information content of a message is distinct from its meaning, relating instead to the number of 1's or 0's needed to encode it. The theory also offers a way to gauge the complexity of a given communication system: as complexity increases, more information can be conveyed. Doyle reasoned the same argument could be applied to dolphin whistles. And so he, McCowan and Hanser immersed themselves in the intricacies of information theory to find out what it could tell them about underwater sounds.

The first step in information theory is to confirm that a communication signal is carrying information and is not just random noise. Harvard University linguist George Zipf suggested a straightforward way to do this in the late 1940s. Aided by his graduate students, Zipf counted the number of times different letters appeared in representative English texts. He then logarithmically plotted the frequency of occurrence of these letters in descending order. The resulting slope had a gradient of -1 (see Graph). Chinese text also yielded a -1 slope, as did most written and spoken languages. This relationship, Zipf's law, held true, moreover, for words, letters, characters and phonemes - perceptually distinct units of speech.

For an entirely random string of letters that contained no encoded information, the slope would be flat, or zero, because every character occurs equally often. There would be no rhyme nor reason to it, no way of anticipating what will come next. The other extreme would be a vertical line on the left-hand axis representing, for example, speech with just one sound replayed continuously. In between are countless possible lines of negative slope, centred around -1, indicating that some elements of a signal are used more frequently than others. A -1 slope is a sign of optimised communications: it is more efficient because elements that occur frequently can be coded to be shorter than those that are used less often. In English, for example, common words like "a" and "the" take fewer letters and syllables than less familiar terms like "antidisestablishmentarianism".

So do dolphins also use optimised communications? One of the challenges facing Doyle and his colleagues was separating the dolphin whistles from background noise. This is fairly straightforward when the language in question has detailed spelling and grammar rules. In English, for example, "e" is the most commonly used letter, and "q" is almost always followed by "u". Similar rules dictate word usage and frequency. So searching for statistical correlations between words, letters and other symbols provides a way to work out what is being conveyed. But it's not so easy if you are dealing with animals or otherworldly creatures.

That's partly because you need to break the transmission down into units, working out where one discrete segment ends and the next begins. "You first try to identify natural breaks in the system itself," explains Hanser. "There are spaces between dolphin whistles where there is no sound. Sometimes we see the same whistle completely isolated, and other times it is part of a chain. That is as good a place as any to start breaking these things down."

But Zipf's law can also help. For example, if Morse code is analysed one dot or dash at a time, the line on the Zipf plot is almost flat. At first glance, this would not seem like a promising signal. But if the dots and dashes are taken two at a time, the slope goes up. When dots and dashes are examined in groups of four, roughly the size of letters, the slope approaches -1. "Then you know you've sampled correctly because a -1 slope cannot occur by chance," Doyle says. So if you want to work out the correct units, simply tinker with a Zipf plot, trying different combinations.

This is what Doyle, McCowan and Hanser did with their dolphin whistles. And, using Zipf's technique, they were able to produce the same -1 slope characteristic of human languages. Dolphins, it seems, have optimised their whistles for efficient communications. The same is not true of squirrel monkeys, however. When the trio evaluated adult squirrel monkey "chuck" calls - a timid bark - recorded by McCowan at the primate research centre at the University of California, Davis, they found a Zipf slope of -0.6. "You can combine the calls any way you want and you won't get a -1 slope," Doyle says. The team believes this suboptimal communication style reflects the animals' limited social behaviour.

But another aspect of information theory can take you further than just knowing a species has the potential for advanced communications. Without understanding what is being said, you can find out just how complex its communications are.

Claude Shannon, the father of information theory, pioneered the approach. Shannon employed the term "entropy" to size up the complexity of a communication system. It is not just a concept borrowed from thermodynamics: Shannon introduced a whole hierarchy of entropy levels (see Graph). Zero-order entropy measures the diversity of the communicative repertoire - how many different types of elements (letters, words, phonemes, whistles, barks and so forth) make up the signal. In written English, for example, everything can be represented by 27 characters - 26 letters and a space. The entropy value is the logarithmic value of the number of elements. The researchers used base 2 logarithms because the binary digit or bit, is the most familiar form for information (though any base will do), making the zero-order entropy of English to be 4.755. Next in Shannon's scheme is first-order entropy, which measures - again, using logarithms - the frequency of occurrence of each element within the language.

The higher entropy levels, second order and up, relate to the notion of "conditional probabilities": once you have seen a particular sequence of elements, what are your chances of predicting the next element in the series? If, for instance, you know the first and second words of a phrase, the third-order entropy tells you (in logarithmic form) the odds of guessing the third word correctly. Analyses of English and Russian suggest that these languages show evidence of 8th or 9th-order Shannon entropy, meaning that when presented with a string of eight words, you have some ability (slim but non-zero) to predict what the ninth word might be. After that, though, all bets are off. If you want to guess what the 10th word is, the previous nine are of no value.

Primitive communications, such as chemical signalling systems employed by cotton plants, don't go beyond first-order Shannon entropy. That means there is no discernible connection between signals - knowing one doesn't help you predict the next. Adult squirrel monkeys, on the other hand, show second or third-order Shannon entropy: their "language" has some predictability in its structure, but not much. So far, the research of McCowan, Hanser and Doyle indicates that dolphin whistles bear signs of 3rd or 4th-order Shannon entropy.

This kind of analysis can reveal differences in communication complexity between the species. Interpretation of such graphs is complicated, but it does show that human languages have broadly similar communication potential, and that other animals have different capabilities, the limits of which have not yet been fully explored.

Dolphins, for instance, may go well beyond the third-order Shannon entropy level documented so far. However, to see whether their communications show signs of sixth-order entropy, thousands of hours of recordings will be needed. Individual samples will need to consist of at least six consecutive whistles, Hanser explains. "It's hard to get samples that long because dolphins, like humans, go through prolonged periods of silence," he says.

Ultimately, the team plans to evaluate several other species - including humpback whales, crows, ravens and jays - to establish a hierarchy of communication complexity. "So far, we can't define intelligence per se, but information theory can give us a precise measure of communication complexity," Doyle says. And that's why it appears to hold so much potential for SETI.

While it seems unlikely that humans will find a superior intelligence on Earth, there may be one lurking elsewhere in the galaxy. If we spot an alien signal, says Doyle, "the information in there has to obey the rules of information theory. This would be the first line of attack in analysing the complexity of a signal." After that a Shannon entropy analysis can put a number on just how much "communication intelligence" it expresses. In other words, we'll ascertain the complexity of ET's language. "The hope is that complexity relates in some way to intelligence, or at least to social intelligence," he says.

According to Doyle, information theory even changes what we're looking for in the first place. One of the main criteria in SETI is to find a narrow-band signal of the order of 1 hertz (one cycle per second), he says, "because nature can't make a 1-hertz signal." Now we can go further, thanks to information theory, by recording the signals and doing a Zipf plot. "If you get a -1 slope, it's not a random signal," he says. "The Zipf plot is a way of extending SETI's reach so we're not just limited to looking for narrow-band signals." In Doyle's opinion, the first thing SETI researchers should do upon receiving a candidate signal is to make such a graph. If they get a -1 slope, they should then look for higher-order Shannon entropies.

Suppose the aliens showed 30th-order Shannon entropy compared to our measly nine? What would that mean? To our eyes, their communications would be unimaginably abstruse - containing layer upon layer of nested clauses, complicated tense changes ("we would have had have had been there, were it not for..."), quadruple negatives, and the like - sufficient to reduce even the most intrepid reader to tears. But what it really signifies, says Frank Drake, chairman of the SETI Institute, is that "we would know we're dealing with a very sophisticated creature".

Drake considers Doyle's pioneering work with animal communications a "big step forward" for SETI. At the first-ever SETI meeting in 1961, neuroscientist John Lilly proposed that studying dolphin languages could help with the recognition of an ET language. But that work never panned out. "People have been interested in linking SETI with animal communications for more than 40 years, and there has been one abortive attempt after another," Drake says. "It's all anecdotal stuff that has never been quantified."

But now animal research has given us the most unequivocal way of detecting intelligent life yet devised. "If we get a narrow-band signal, a -1 slope on the Zipf plot, and higher-order Shannon entropies, we've nailed it," Doyle says.

"The beauty of this approach," he adds, "is that it doesn't require any knowledge of their language or culture or intelligence." It does, however, require a readiness to be humbled. As well as showing what kind of communications complexity - and thus, perhaps, intelligence - ET enjoys, it would also reveal where our species stands on the evolutionary line of communication complexity. We should be prepared for the worst: if the aliens do turn out to be of 30th-order complexity, they may shun our primitive attempts at communication. They may even insist that we get off the line and put them straight through to the dolphins.