Koo af, yinz: regional US slang thrives on Twitter

It's commonly accepted that widespread national television helped smooth over many local US accents and standardized "proper" English usage; will services like Twitter, where people use more colloquial language, have the same effect on regional slang? A new study from Carnegie Mellon University finds that, so far, regional variation is alive and well on Twitter. All yinz in Pittsburgh and all yous in New Jersey can still find plenty of support on the microblogging service.

Jacob Eisenstein, Brendan O'Connor, Noah Smith, and Eric Xing of CMU's Computer Science department used Twitter's official API to grab 15 percent of all tweets during one week in March 2010. They then filtered the dataset to include only tweets that were geotagged and which had fewer than a thousand followers, and they removed any tweet with a URL or those that came from outside the US. The goal: to eliminate bots, celebrities, and PR firms, leaving only real people with local networks of friends.

Yinz, a contraction of the second-person plural "you ones," lives on Twitter.

The resulting geotagged dataset allowed them to map the tweets, sort them into broad categories, and then look for regional variants. They found plenty.

In northern California, something that's cool is "koo" in tweets, while in southern California, it's "coo." In many cities, something is "sumthin," but tweets in New York City favor "suttin." While many of us might complain in tweets of being "very" tired, people in northern California tend to be "hella" tired, New Yorkers "deadass" tired and Angelenos are simply tired "af."

The "af" is an acronym that, like many others on Twitter, stands for a vulgarity [Ed.: "as fuck"]. LOL is a commonly used acronym for "laughing out loud," but Twitterers in Washington, D.C., seem to have an affinity for the cruder LLS [Ed.: "laughing like shit"].

That's from the CMU press release, which naturally tries to spice things up. The research paper itself (PDF) is stuffed with equations, but it has its own surprises. For instance, when it comes to the topic of "daily life," New Yorkers used the word "cab" far more than anyone else. Those in LA favored "tacos." But those in the Lake Erie region had a special affinity for—I kid you not—"stink," "Chipotle," and "tipsy."

When it came to "emoticons," those from Boston favored "loveee" over San Francisco's "fckn" and LA's "th bomb." The research also showed Spanish terms like "pues," "nada," and "ese" in areas with large Hispanic populations."

The technique can be used to watch slang and regional variation evolve over time, though this particular methodology probably focuses most heavily on young smartphone users (the ones most likely to do geotagged tweeting). The authors also found that their models are good enough to reliably predict someone's geographical location just by examining the raw text of their tweets.

Regional variation may not be what it once was, but it's still alive in America. And that's koo af.