Soda vs. Pop: Twitter Is Pretty Good at Linguistics

Looking to solve the great soda versus pop debate, data scientist Edwin Chen went to Twitter for some data points and his results are pretty consistent with actual linguistic data on the topic. Instead of asking members of a region what word they use for the tasty bubbly drink, Chen used Twitter's geo-tagging feature and searched. "To make this map, I sampled geo-tagged tweets containing the words 'soda', 'pop', or 'coke', performed some state-of-the-art NLP technology to ensure the tweets were soft drink related (e.g., the tweets had to contain "drink soda" or "drink a pop"), and tried to filter out coke tweets that were specifically about the Coke brand (e.g., Coke Zero)," he writes on his personal blog. As the soda-pop divide is one of those distinctions that gets Americans rather heated, linguists have done their own surveying in this 2003 study, giving us a similar map. Comparing the two maps, Twitter might not give the most extensive data, but it leads Chen to similar conclusions as the linguists.

Below we have Chen's map that aggregated nearby tweets, which leads him to believe the following three truths about the great soda versus pop debate.

The South is pretty Coke-heavy.

Soda belongs to the Northeast and far West.

Pop gets the mid-West, except for some interesting spots of blue around Wisconsin and the Illinois-Missouri border.

Though this map below from University of Wisconsin researchers uses opposite coloring, it gives us pretty much the same findings, with a few differences:

Soda is for the Northeast and West

Coke is for the South

Pop is for the Midwest and Pacific Northwest, the part of the country for which Chen did not have that much data.

This map shows some details that aren't on Chen's map — the Pacific Northwest is solid pop country — but all in all the Twitter data provided a pretty accurate portrayal of how people talk. We don't need to tell the linguists this, however. Not only does Chen have a background in linguistics from MIT, but other language researchers have already started tapping into this huge resource, which linguist Ben Zimmer wrote about last fall in his New York Times column about Twitterology.