Peter Sokolowski at Merriam-Webster Dictionary talks about a billion “lookups” a year

Peter Sokolowski is editor at large at Merriam-Webster, where he works on the Word of the Day podcast, Ask the Editor videos, and short articles about word trends and etymologies (which he also presents on Twitter). In addition to attending professional and academic conferences to talk about dictionaries, he conducts workshops for teachers of English as a second language, serves as pronouncer for spelling bees around the world, and is a substitute jazz host for New England Public Radio. (We also hear he plays a mean jazz trumpet.)

CFS: Recently I was lucky enough to hear you speak about what lexicographers can deduce from the words people look up at Merriam-Webster.com. You gave examples of how a political or celebrity event can cause certain “lookups” to spike—such as the word emaciated when Michael Jackson died. Later I saw your tweet about the spiking of canonize and homily when Pope Francis visited the US. I believe you said that more than a billion words a year are looked up at the M-W Dictionary website and apps. What I’d like to know is, how do you keep an eye on a billion lookups? What kind of tools do you have and how do you use them?

PS: It is indeed a lot of data to take in all at once. Google Analytics churns all that data with each request, making it a slow way to get information, so we have developed some simpler engines that give us fast answers; one is the list of words being looked up at any given moment. It can be refreshed by the second but stores no archive.

Then we have several colored graphs that show lookups aggregated by the hour, day, week, and back several years to when we started keeping track.

Finally, there’s my favorite, the multiplier, which enables me to look at words that have seen increases by 200 percent, 300 percent, or any factor of 100 in the previous twenty-four hours. Since a dictionary database represents a very long tail of information, a word that jumps to, say, the 5,000th position today from the 150,000th would represent a considerable relative spike—but one that I’d miss by only looking at the first few hundred words on the list. It measures speed rather than volume.

CFS: Fantastic—I’m picturing you in goggles in front of a giant screen of words. But seriously, what a privilege to have those tools at your command. Are you able to tell where lookups originate geographically? Can you tell whether lookups are coming from computers or smartphones? Do you make use of that information?

PS: We can tell the country of origin for lookups, but since the majority of our traffic is domestic, we feel that we’re reading American curiosity best and most accurately. We have occasionally seen isolated lookups coming from a foreign country when we can’t quite figure out why a given term is spiking. For example, the term physical education spiked when a change in school policy was made in the Philippines, which is a large market for us both digitally and in print. We found that an e-mail to parents included a link to our entry. A while back we looked at the traffic from American university domains, and it showed a big uptick from colleges and universities all across the country at the beginning of the school year in September. This is very encouraging; it tells us that academic traffic is a big part of the mix. We can also tell if the lookups are from a smartphone, which gives us some insight as to how people use the two platforms. Desktop lookups peak during business hours for serious words, and smartphones peak in the evening for less businesslike reasons: words like love and two-letter Scrabble words are looked up more often then.

The convenience of a smartphone is important. In fact, last year we saw lookups from small screens exceed the desktop site in traffic (and Google just reported the same pattern), so we expect to see even faster responses to news events through the dictionary data, since so many people now carry a dictionary on their phones all the time.

Last year we saw lookups from
small screens exceed the desktop
site in traffic.

CFS: When you think about the ways you collect word-use data compared with a hundred or two hundred years ago when Noah Webster was collecting words, what comes to mind? Just today I heard your colleague Kory Stamper compare language to a river with contributing streams and currents, with people adding words/droplets to it all the time. Do you think technology makes the river run faster? Is language change faster? Are changes less permanent? Has the dictionary’s role changed over time?

PS: Communication is obviously faster these days. But just because we can measure better today doesn’t mean that there was less to measure, relatively speaking, in the past. With more people contributing to the stream, those droplets join an ever-bigger river. I try to guard against cultural myopia and the illusion of omniscience that comes with sitting all day in front of a computer. I’ve seen spikes for words that leave not a single trace on the Internet, later to learn that the word was used on a prime-time TV show watched by twenty million people. It wasn’t exactly a secret, but I couldn’t find it for the simple reason that the show’s script wasn’t indexed and optimized. Our lives aren’t indexed and optimized. The Internet doesn’t always have all the answers.

On the other hand, the biggest change for lexicographers is how fast and comprehensively research can be done today. Whether it’s a new archive of medieval texts, letters from Civil War soldiers, or Google News, we have tools to search with speed that would have been unimaginable even a generation ago. Our office is essentially a library, and we have shelves full of literary concordances. A concordance for Milton or Shakespeare was a lifetime’s work a century ago, and it usually gave the author the status of a major scholar. Today we can search those works in seconds. It seems to me that the increase in volume of text is more than matched by the speed of research. The key is to know where to look.

Our lives aren’t indexed and
optimized. The Internet doesn’t
always have all the answers.

People had thought that the telegraph, the telephone, and radio would ruin language. Like today’s technologies, they’ve just added to the stream.

CFS: I’m getting the idea that your technology is a fun bonus, but that the real payoff is in the cultural and societal insights from seeing what words people are looking up most.

PS: Exactly. Knowing the most looked-up terms tells us what people really use the dictionary for. People want information about abstract words and ideas, words that function at a higher plane of language than concrete nouns. Neologisms get attention in the media, but most of us aren’t looking up novelties like twerk and LOL; we’re looking up words like integrity, pragmatic, and socialism. In a way, the data isn’t about numbers. It’s about what people want and expect from a dictionary. We’ve only just begun to respond to this feedback in our editing and product development.

A dictionary is a tool in a person’s intellectual toolbox. It’s a utility for grounded and objective information, most often used to answer an immediate question. I believe that the dictionary serves a contemplative purpose even when we aren’t writing or editing: people look up love around Valentine’s Day and surreal following national tragedies. Words bring a sharp focus to thought.

CFS: So from your seat at Merriam-Webster, does language use appear to be healthy or in decline?

PS: I’m not worried. The written word is more important than ever because of digital communication. We are all judged constantly and harshly by the way we write and speak; I’m reminded of the Match.com survey that puts “good grammar” as the second most important quality sought in potential dates, after “good teeth.” (Seth Meyers joked on Saturday Night Live that this is good news for “whomever has both.”) But for professional and academic reasons, not to mention the importance of American English as a lingua franca of Internet business, standard English has become its own reward. Standard English is not necessarily a superior form of the language, but it is a privileged form of the language.

At the same time, language declinism is a waste of time and intellectual energy. Everyone who makes a “kids today” comment about the state of language was once a kid. The only constant is change. Languages certainly do follow rules, but they don’t follow orders. As linguists and lexicographers, we notice changes and novelty—we don’t complain about them. The English language is a great marketplace that will accept some changes and reject others. Our job is to observe and report.

Standard English is not necessarily a superior form of the language, but it is a privileged form of the language.

Obviously, copyeditors enforce standards. The straw-man oversimplification that supposedly divides “descriptivists” and “prescriptivists” misses a very important point. A dictionary records two kinds of facts: linguistic facts such as spelling, etymology, pronunciation, and meaning; and cultural facts, which we call usage. Usage is the manners of language. The usage paragraphs are where the descriptive and the prescriptive meet, and they are often the most interesting and useful part of a dictionary. Look at the usage note at irregardless, for example. It gives very clear advice. A good dictionary always gives good advice.

CFS: I’m familiar with this confusion over the role of the dictionary. I see it in the backlash online against news that a respected dictionary has “accepted” a nonstandard word or meaning. Readers think that “accepted” means “declared acceptable in formal usage” instead of “accepted the need to explain a nonstandard word or meaning.” Sometimes I think people forget that we count on dictionaries to tell us what words mean. All words.

OK, I just looked at irregardless at Merriam-Webster.com—it’s even playable in Scrabble! I’m not sure my grandma—who always won at Scrabble—would approve, but she would love knowing that I got to talk with someone from the Official Scrabble Players Dictionary. Last question: can you tell us something fun about Scrabble lookups?

PS: That’s easy! We noticed something remarkable when we saw a difference in words looked up from the mobile app compared to the website: late at night, lookups for qi and za spike, which means that in bars and in beds across the country, people are playing Scrabble. There’s also a big spike in lookups of these two-letter words during the afternoon on Thanksgiving and Christmas. With no competition from business or academic traffic, the Scrabble words soar to the top.

People often say to me that they wouldn’t want to play me in Scrabble, but honestly I don’t play. I’ve asked a bunch of other lexicographers, and we seem to agree: it’s either too much like math or too much like work.