One of its distinctive characteristics is its often offensive contentJanuary 10, 2018 10:09 AMSubscribe

From the study [PDF]: Our study highlights that UD has a higher content heterogeneity than traditional dictionaries. Depending on the goal, this could mean that more effort is needed to filter and process the data (e.g., the removal of opinions) compared to when traditional dictionaries are used. However, UD is unique in capturing many infrequent, informal words and it could therefore complement the traditional dictionaries. Furthermore, while there is more offensive content in UD, highly offensive definitions do get ranked lower through the voting system. We also found that words with more definitions tended to be more familiar to crowdworkers, suggesting that UD content does reflect broader trends in language use to some extent.

Watson couldn't distinguish between polite language and profanity -- which the Urban Dictionary is full of. Watson picked up some bad habits from reading Wikipedia as well. In tests it even used the word "bullshit" in an answer to a researcher's query.

Ultimately, Brown's 35-person team developed a filter to keep Watson from swearing and scraped the Urban Dictionary from its memory.

I love Urban Dictionary. But I'd naively assumed the data wouldn't be very useful because there are so many joke entries. Also a bunch of garbage drive-by definitions that aren't meaningful. I skimmed the paper and didn't see them address this question head on, but maybe I missed it. They do seem to be relying heavily on up and down votes as a signal for getting through the noise.

I interviewed the Urban Dictionary founder Aaron Peckham back around 2005, when he applied for a job at Google. He was very modest, seemed genuinely surprised I'd heard of the site and liked it. Also he seemed very smart to me. He took the job and I always kind of wondered what happened to him afterwards. Good things, apparently, and I think he still runs Urban Dictionary independently as a business. I suspect he's doing pretty well with that.posted by Nelson at 11:00 AM on January 10 [5 favorites]

In tests it even used the word "bullshit" in an answer to a researcher's query.

This hardly seems like a disqualifying characteristic without more information. Calling out bullshit is a lot more useful than playing Jeopardy. To quote Captain Picard, "It came from us, from our mission records, personal logs, holodeck programs, our fantasies. Now, if our experiences with the [English language speakers with internet access] have been honorable, can't we trust that the sum of those experiences will be the same?" I say, let Filthy Watson live. He is but a reflection of us.

Mostly, I use the urban dictionary these days when figuring out when my friends are making a contemporary television reference. It's easy, because the entry is inevitably incoherent nonsense filled with specific character names that are incomprehensible to anyone who doesn't already get the joke. An algorithm that automates that decision tree and runs it on every phrase in a conversation could be quite useful.posted by eotvos at 1:25 PM on January 10 [1 favorite]

I quit checking the Urban Dictionary because any entry that could possibly be written in a misogynist light is written at reddit/4chan levels. "Often offensive content" is either an academic understatement or the authors are using offensive to mean only vulgar language. Because the misogynist crap is absolutely not being voted down; it's being vigorously voted up.posted by Karmakaze at 2:32 PM on January 10 [6 favorites]

The name “urban” strikes me as problematic, in that the word is often used as a euphemism for “Black”; this gives the name “Urban Dictionary” an aura of “here's what those black dudes mean by all that wacky jive talk, Mr. Whitebread Suburban Homeowner” or something similar.posted by acb at 4:14 PM on January 10 [1 favorite]

"Often offensive content" is either an academic understatement or the authors are using offensive to mean only vulgar language.

I was really surprised that the authors didn’t mention the volume of misogynistic/racist/hate speech entries. They intentionally ignored it because it’s impossible to miss.posted by not_the_water at 4:41 PM on January 10 [4 favorites]

The word with the next highest number of definitions on Urban Dictionary is love, with 1140. The other words in the top 10 by number of definitions are: god, urban dictionary, chode, Canada’s history, sex, school, cunt, and scene.

Tags

Share

About MetaFilter

MetaFilter is a weblog that anyone can contribute a link or a comment to. A typical weblog is one person posting their thoughts on the unique things they find on the web. This website exists to break down the barriers between people, to extend a weblog beyond just one person, and to foster discussion among its members.