Human language can express limitless meanings from a finite set of words based on combinatorial rules (i.e., compositional syntax). Although animal vocalizations may be comprised of different basic elements (notes), it remains unknown whether compositional syntax has also evolved in animals. Here we report the first experimental evidence for compositional syntax in a wild animal species, the Japanese great tit (Parus minor). Tits have over ten different notes in their vocal repertoire and use them either solely or in combination with other notes. Experiments reveal that receivers extract different meanings from ‘ABC’ (scan for danger) and ‘D’ notes (approach the caller), and a compound meaning from ‘ABC–D’ combinations. However, receivers rarely scan and approach when note ordering is artificially reversed (‘D–ABC’). Thus, compositional syntax is not unique to human language but may have evolved independently in animals as one of the basic mechanisms of information transmission.

The article is open access, so you can read it yourself.

The key idea is that the sequence of notes ABC means "look out!" (they gloss it "scan for danger") while a sequence of D notes (e.g. DDDDDDDDD) means "come here" (they gloss it "approach the caller"):

They show this in the now-standard way — ABC calls, played over a loudspeaker, generally produce a much larger number of observed "scans" in birds that hear them, compared to background noise or D calls. ABC-D sequences produce a somewhat smaller number of scans, but still more than D calls alone:

And listening tits approach the loudspeaker more often when a sequence of D notes is played, compared to background noise or an ABC sequence. The sequence ABC-D elicits a somewhat smaller percentage of approaches, but still more than ABC or BN:

So far, this is just, as we might say, one thing after another. In a combination of note sequences, each subsequence has (a weakened form of) the effect that it has in isolation.

For the authors of this paper, the key question is, does order matter? What's the effective meaning of ABC-D vs. D-ABC?

This is not logically equivalent to the question of whether these calls have a "syntax" — there might be other reasons why order matters. We might be studying bird-call pragmatics, or bird-call discourse analysis, rather than bird-call syntax. But it's certainly a relevant question.

And the result they report is that D-ABC is less effective than ABC-D at eliciting scans:

And also less effective in eliciting approaches:

So this is suggestive — though it would be more persuasive if they also had an explanation for why ABC-D is less communicatively effective than either ABC or D alone, since whatever the explanation — distraction from responding to one of the calls due to responding to the other? — it might also explain part of the reduced response to the reversed order.

However, their explanation in terms of the birds' order-expectation makes sense. They write that

Tits produce ‘chicka’ calls when approaching and mobbing predators, and these calls contain a number of unique call types composed of different note types, mainly A, B, C and D notes. A, B and C notes are typically produced in combination with other note types, resulting in AC, BC or ABC calls. In contrast, D notes are produced as a string of seven to ten notes (hereafter referred to as a D call) and are also used in non-predatory contexts, such as when a bird visits its nest alone and is recruiting its mate. In predatory contexts, D notes are often produced in combination with other note types and typically appear at the end of note strings, such as AC–D, BC–D or ABC–D calls. Thus, D notes are both produced alone and in combination with other notes, suggesting that they modify the meaning of ABC calls to elicit appropriate mobbing responses to different predator types.

In other words, these birds are used to hearing "AC-D, BC-D or ABC-D calls", and not to hearing D-whatever calls. This difference in order-expectation is analogous to human-language syntax or morphology, but it's also analogous to other behavioral-sequence regularities.

For the authors, the crucial point is that there's a behavioral-sequence regularity, combined with "meanings" for the sequence elements, combined with a difference in communicative effectiveness for normal vs. reversed orders.

If you've been paying attention to the graphs, you may have noticed another puzzle. What their Figure 4b shows as the approach percentage for ABC-D in experiment 2 is (closer to) the approach percentage that they reported for D alone in Figure 3b:

From measurements made on pixel positions in their Fig 3b, I estimate the approach percentage for D at 62%, and for ABC-D at 48%; similar measurements on figure 4b yield an approach percentage for ABC-D of 66%.

Thus the difference in approach percentage between ABC-D in experiment 2 (66%) and ABC-D in experiment 1 (48%) is almost as large as the difference between responses to ABC-D in experiment 1 (48%) and responses to D-ABC in experiment 2 (24%).

I had to resort to this pixel-measurement because nowhere in the paper do the authors report the actual numbers. They give various measurements of statistical significance, but not even the mean values (of scan counts and approach percentages), much less a full listing of the underlying data. This omission is lamentable, in my opinion.

Another lamentable omission is the set of stimuli — good practice these days would be to make the acoustic stimuli (and the raw behavioral data) available as on-line supplementary material. This is relevant (for example) because there might be coarticulatory issues that produce differential artefacts in different artificial note sequences of the sort that they used. (Their stimuli, as far as I can tell from their description, were constructed by concatenating individual notes from different calls, with short silences in between. The notes used came from the calls of 17-21 different birds, but each stimulus was apparently constructed from notes drawn from the calls of a single bird (?).

All in all, this is an interesting paper, but I feel that the editors of Nature Communications have seriously failed in their responsibility to require (or allow?) adequate documentation of the work.

[h/t Sybil Shaver]

Update — I've changed the title from "Birdsong syntax" to "Bird syntax", to remedy a possible misunderstanding.

As I understand things, researchers make a distinction between "songs", which are typically territorial or mating displays generally produced by males, and "calls", which are functional vocalizations produced by both sexes.

It's been clear for a long time that bird songs are often complex sequences of well-defined smaller units, often called "motifs" and "syllables", which occur in regular but not invariant patterns. There's an interesting literature on the nature of such patterns, and their relationship (or not) to the "grammars" of human languages.

But those motifs and syllables don't have any independent meaning, as far as anyone knows, and for that matter the songs as a whole don't mean anything besides "I'm a skillful member of my species", and "check me out" or "this is my territory".

What's special about this paper is that it discusses two different calls, with different functional meanings, which are often fluently combined in a fixed order; and the experiments show that the order matters, in that the behavioral response to re-ordered calls is much weaker than the response to calls in the natural order.

2 Comments

David Wheatcroft said,

I'm one of the co-authors of the paper. We're really happy about the interest in the study! I just wanted to clarify the issue of differential strength of responses to ABC-D in the two experiments. The first experiment (playing back ABC, D, and ABCD) was conducted during the breeding season, while the second (playing back ABCD and DABC) was conducted during the winter/non-breeding season. The response strengths are NOT comparable between experiments, only within experiments. (These details are provided in the Methods section of the paper, but the issue you rightly bring up was not explicitly explained).

What we have in the D-ABC sequence vs the ABC-D one could be interpreted as

(1) A phrase (in the human-language sense, ~ 'sentence') whose meaning potentially and maybe crucially depends on the order of its constituents (assumed to be the notes, but then there are pauses, too); in that case, how many 'parts' (a.k.a. 'words', 'morphemes') are there (to the birds; the researches obviously assume two parts)?

(2) A 'melody' (or 'theme' or 'set', 'score') that can not be meaningfully subdivided semantically, although its physical, phonetic segmentation is not doubted. In that case, could it be that ABC-D just happens to be a popular tune with known content while D-ABC is less so? Does the data prove or disprove that ordering is 'meaningful' under this assumption? We know for sure that at least some bird species do repeatedly produce melodies, sure jumbling the notes will have some effect. So here we jumbled the notes and registered that fewer fellow tits showed measurable reactions; can we rule out from that data that we just happened to play some misfit cacophony they didn't like or couldn't care less about?

(3) Two 'melodies' that happen to occur in close proximity. But how close is close? If there were some longer delay between the parts, how would that affect bird behavior? (What if the two calls didn't originate from the same bird, but two? Could the fellow tits distinguish them as humans recognize voices?) Crucially, can we deduce from this and/or other observations that what we did was more than just broadcast "Look out! (Belay that,) Come here!" in the first (ABC—D), but "Come here! (Belay that,) Look out!" in the second (D—ABC) case? Would we then still say that ordering is important? We might just as well say it's a result of the most recent call taking precedence over earlier signals.