A New Yahoo AI Can Detect Online Abuse, But Automation Isn't the Answer

Online harassment is a serious, but tough problem to try and solve. There is a need for a new system that can ease the burden on moderators who have to comb through those reports. As someone who was once responsible for doing that for a local news station (a much smaller venue than, say, Twitter), I can sympathise with the toll seeing all those racist, abusive messages takes on your psyche. It sucks.

A team at Yahoo recently developed an algorithm that claims to be able to automatically identify hateful speech. The tool uses deep learning to detect abusive keywords, punctuation that were typically found in hateful comments, and syntactic clues found in several thousand comments on Yahoo’s websites.

But that was just the start of it. Researchers also used what is called “word embedding,” which is the process of utilising vectors to map out words and phrases often used in natural language processing for machines. According to the study released online, the vectors were able to predict the next word based on different contexts. So even if words weren’t identified as abusive—like if they weren’t slurs or derogatory language—they could still be processed as such.

According to MIT Technology Review, the algorithm was able to identify abusive message with around 90 per cent accuracy.

Additionally, according to Wired, the database will soon be released online on Yahoo Webscope, which would open it up to be used by other experts.

This is an interesting step forward, but automating only goes so far. Most comment systems—including Disqus, which is probably the most widely-used platform—give websites the ability to ban certain words. Comments that use those words are immediately placed in a queue for a moderator to look over and don’t appear online. This isn’t effective, especially since all you have to do is replace a couple letters or distort a word to make it past the filters.

The study does take into account some of these issues. Spotting certain keywords is only part of it. An algorithm can’t detect sarcasm, nor can it keep up with constantly changing internet language and slang. More technically, a machine can have issues detecting hate across multiple sentences. Researchers wrote,

“In the sentence Chuck Hagel will shield Americans from the desert animals bickering. Let them kill each other, good riddance!, the second sentence which actually has the most hateful intensity (them kill each other) is dependent on the successful resolution of them to desert animals which itself requires world knowledge to resolve.”

Social networks have tools that allow users to report abuse or spam, but most of them are useless. I’d have to think most people are aware that there are jerks on the internet (especially after the high-profile abuse of actor Leslie Jones at the very least), but it’s hard to find evidence that the big-name social networks are being proactive or making sweeping changes. Using artificial intelligence to do the work of humans would certainly be welcome for moderators and managers who have to process abuse reports, but it’s only a small part of what can be done to curb abuse.

There still needs to be policies in place that prevent certain acts and behaviours without going over the line into limiting free speech. There needs to be websites and networks that act upon those policies and remain active in the battle. And that’s just the beginning. [MIT Technology Review, Wired]