Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

Hugh Pickens writes writes "Christopher Shea writes in the WSJ that physicists studying Google's massive collection of scanned books claim to have identified universal laws governing the birth, life course and death of words marking an advance in a new field dubbed "Culturomics": the application of data-crunching to subjects typically considered part of the humanities. Published in Science, their paper gives the best-yet estimate of the true number of words in English—a million, far more than any dictionary has recorded (the 2002 Webster's Third New International Dictionary has 348,000) with more than half of the language considered "dark matter" that has evaded standard dictionaries (PDF). The paper tracked word usage through time (each year, for instance, 1% of the world's English-speaking population switches from "sneaked" to "snuck") and found that English continues to grow at a rate of 8,500 new words a year. However the growth rate is slowing, partly because the language is already so rich, the "marginal utility" of new words is declining. Another discovery is that the death rates for words is rising, largely as a matter of homogenization as regional words disappear and spell-checking programs and vigilant copy editors choke off the chaotic variety of words much more quickly, in effect speeding up the natural selection of words. The authors also identified a universal "tipping point" in the life cycle of new words: Roughly 30 to 50 years after their birth, words either enter the long-term lexicon or tumble off a cliff into disuse and go "23 skidoo" as children either accept or reject their parents' coinages."

The WSJ is flat out wrong in where the paper was published. Science and Nature are the two highest impact journals in the field today. This paper was published by Scientific Reports, an open access [wikipedia.org] spinoff of Nature. This is a relatively new journal, composed mostly of rejections to Nature, that touts "peer review by at least one member of the academic community" (peer review more typically includes 3-5 reviewers).

I also find it disturbing that none of the paper's authors are releated to the field of linguists or even humanities (e.g. English), though they do cite a number of papers from those who are. I hope the "at least one member of the academic community" to review the paper included a linguist and somebody versed in statistical models.

Hugh Pickens is also a bit off: "Dark matter" (or even the word "dark" or the word "half") is not mentioned in the linked PDF, which appears to be either a draft or an earlier conference proceeding. Since Scientific Reports is open access, you can find the full paper for free at http://www.nature.com/srep/2012/120315/srep00313/full/srep00313.html [nature.com]