Cache of banned keywords demonstrates leeway companies have in deciding what and how to censor

On China’s popular streaming video app YY, you can chat about the Dalai Lama or the party drug ecstasy, but if you want to talk about people from Henan province stealing manhole covers, you’ll have to switch to a different app.

A group of internet researchers based in North America spent more than a year tracking how some of China’s better known social video apps censor their users. Their latest findings, released in a report this week after they culled through a huge trove of banned keywords, suggest China’s censorship regime is not the well-coordinated machine it’s often assumed to be.

Similar to Twitter’s Periscope or Facebook Live, China’s streaming video platforms combine live video with social networking. Many allow users to pay video hosts by giving them virtual gifts that can be redeemed for cash, fueling an industry of young, mostly female “stream queens” who make a living singing, talking and otherwise broadcasting themselves through extended live-action selfies.

To compile the report, researchers from the University of New Mexico and the University of Toronto’s Citizen Lab reverse engineered three of the most popular video streaming apps–YY Inc.’s YY, and two apps owned by Tian Ge Interactive Holdings Ltd., Sina Show and 9158.

All social media networks in China use lists of banned keywords to filter and remove potentially sensitive messages. Many of the most popular platforms filter messages as they pass through a central server, making it difficult to compile a full list of banned words. But YY, Sina Show and 9158 censor chats inside the app, according to the report, which gives researchers a much more comprehensive view of the censorship process.

They found that while all of the apps block some terms related to the banned spiritual group Falun Gong , top Communist Party leaders and pornography, there was otherwise very little overlap in the lists used by the three. That suggests each company has leeway in deciding what to block and how to block it.

“Many people believe China censors the Internet in a uniform, monolithic manner. Our research shows that the social media ecosystem in China–though definitely restricted for users–is more decentralized, variable, and slightly chaotic,” Ron Deibert, director of the Citizen Lab, wrote in a blog post announcing the report.

Sina Show, for instance, had a greater interest in blocking discussion of illicit activities like gambling or drugs. YY had more keywords aimed at blocking discussion of political events like the 1989 Tiananmen Square protests and pro-democracy demonstrations in Hong Kong.

While discussing people from Henan stealing manhole covers was out [perhaps to avoid reinforcing stereotypes about Henanese as poor], references to the Dalai Lama, the exiled Tibetan leader excoriated by the Chinese government as a separatist, slipped by.

Several live streaming apps have come under fire from regulators for having featured scantily clad, even topless, female hosts and, in at least one instance, a couple having sex. [One of the banned keywords on YY is “forgot to turn the camera off,” a reference to the live sex incident.]

The idea that China’s government outsources the work of censorship by threatening to punish internet companies found hosting unwelcome content is hardly new. Social media sites and other content portals retain phalanxes of workers to police postings. But Wednesday’s report fleshes out a picture of a censorship system more decentralized than previously known.

The report also suggests a popular theory that China’s censors specially target posts involving collective action, while giving space to other forms of political speech, may not apply across the Chinese internet.

“Such theories present a centralized censorship program with clear intent that creates a uniform outcome across companies and platforms. Our research unearths a more complex reality,” the report says.

A selection of keywords banned on YY gives a sense of the censorship involved:

派性斗争 – “Factional struggle,” a reference to conflict between different cliques within the Communist Party.

反思文革 – “Reflect on the Cultural Revolution,” something scholars have been discouraged from doing publicly as China marks 50th anniversary of the start of the decade-long upheaval.

团派干部 - “Communist Youth League cadres,” a reference to the power base of former President Hu Jintao.

5月35日- “May 35th,” a reference to the June 4, 1989 Tiananmen Square crackdown.

Archive.org – a searchable library that saves snapshots of web pages over time.

１对１衤果辽 – “One-on-one naked chat” with the character for naked split in two in a [failed] effort to evade censors.

Luo×女 - “Naked woman” with naked written in Roman letters in another frustrated attempt to fool censors.

Interestingly, keyword lists for all three apps included references to other streaming apps, suggesting their censorship efforts were united in at least one purpose: preventing users from talking about competing services.

NOTE: An earlier version of this post referred in some areas to an older version of the researcher’s findings. It has been updated to reflect their latest results.