The Language of the Future: Evolutionary Math Breaks the Code

Harvard mathematicians have found that words evolve in a concise manner directly related to frequency of usage. The research looked at the evolution of the English language over the past 1,200 years and found that it’s the infrequently-used words with the habit of changing.

Apparently, just as genes and organisms undergo natural selection, words are also subject to a similarly intense pressure to "regularize" as the language develops. The researchers quantified this trend and compare it with biological evolution.

"Mathematical analysis of this linguistic evolution reveals that irregular verb conjugations behave in an extremely regular way- one that can yield predictions and insights into the future stages of a verb's evolutionary trajectory," says Erez Lieberman, a specialist in evolutionary math at Harvard University. "We measured something no one really thought could be measured, and got a striking and beautiful result."

What they found is that the less often a word is said, the faster it will change over time, whereas more commonly uttered words are much more resistant to change. The researchers believe this is because often-used irregulars are easy to remember and get right, whereas seldom-used irregulars are more likely to be forgotten. Speakers often mistakenly apply the ‘-ed’ rule. The most commonly used word that they found this happened to was the verb ‘to help’ – the past tense was once ‘holp’, but is now ‘helped’.

"We're really on the front lines of developing the mathematical tools to study evolutionary dynamics," says Jean-Baptiste Michel, a graduate student in systems biology at Harvard Medical School. "Before, language was considered too messy and difficult a system for mathematical study, but now we're able to successfully quantify an aspect of how language changes and develops."

Lieberman, Michel, and colleagues built upon previous study of seven competing rules for verb conjugation in Old English, six of which have gradually faded from use over time. They found that the one surviving rule, which adds an "-ed" suffix to simple past and past participle forms, contributes to the evolutionary decay of irregular English verbs according to a specific mathematical function. Specifically, they are regularized at a rate that is strikingly inversely proportional to the square root of their usage frequency. That means a verb used 100 times less frequently will evolve 10 times as fast.

To develop this formula, the researchers tracked the status of 177 irregular verbs in Old English through linguistic changes in Middle English and then modern English. Of these 177 verbs that were irregular 1,200 years ago, 145 stayed irregular in Middle English and just 98 remain irregular today, following the regularization over the centuries of such verbs as help, laugh, reach, walk, and work.

Lieberman and Michel's group computed the "half-lives" of the surviving irregular verbs to predict how long they will take to regularize. The most common ones, such as "be" and "think," have such long half-lives (38,800 years and 14,400 years, respectively) that they will effectively never become regular. Irregular verbs with lower frequencies of use -- such as "shrive" and "smite," with half-lives of 300 and 700 years, respectively -- are much more likely to succumb to regularization.

Extant irregular verbs represent the vestiges of long-abandoned rules of conjugation; new verbs entering English, such as "google," are universally regular. Although fewer than 3 percent of modern English verbs are irregular, this number includes the 10 most common verbs: be, have, do, go, say, can, will, see, take, and get. Lieberman, Michel, and colleagues expect that some 15 of the 98 modern irregular verbs they studied will regularize in the next 500 years, but the top 10 probably never will.

The paper, published in Nature makes a quantitative, astonishingly precise description of something linguists have long suspected: The most frequently used irregular verbs are repeated so often that they will likely never die.

"Irregular verbs are fossils that reveal how linguistic rules, and perhaps social rules, are born and die," Michel says.

"If you apply the right mathematical structure to your data, you find that the math also organizes your thinking about the entire process," says Lieberman, whose unorthodox projects as a graduate student have ranged from genomics to bioastronautics. "The data hasn't changed, but suddenly you're able to make powerful predictions about the future."

Lieberman, Michel, and their co-authors project that the next word to regularize will likely be "wed."

"Now may be your last chance to be a 'newly wed'," they quip in the Nature paper. "The married couples of the future can only hope for 'wedded' bliss."