Hackers, bloggers and professors team up to tap into blocked microblog content

With over 500 million registered users and over 46 million daily active users, Sina Weibo is the largest and most influential social media platform in China. It has also become known as a fostering ground for discussions with a more liberal slant.

But what is not allowed to be discussed on Weibo perhaps says just as much as what can be. There are a number of projects that aim to uncover content blocked on Weibo. Most of the people behind such efforts are China watchers based overseas or foreigners living in China. While they may have different approaches and backgrounds, their efforts are successful in bringing this vanished content back to light.

One such project, Freeweibo.com, won the 2013 Bobs, or Best of the Blogs awards, for best innovation in June. The Bobs awards, started by Deutsche Welle in 2004, are given out in 34 categories in 14 languages, and aim to honor the open exchange of ideas of free expression.

Hu Yong, a professor at Peking University and a new media observer, served as a juror at the awards. He commented that Freeweibo preserves digital memories and makes disappeared content visible again, according to the official website of the Bobs.

"We ignore relevant laws, legislation and policy," the welcome message on the website reads, a response to the expression Weibo and Chinese search engines use to explain why searches for certain words come back empty.

The website, in both English and Chinese, displays posts that are blocked or deleted on Sina Weibo. When searching for keywords, Freeweibo breaks search results down to "blocked by Sina Weibo" and "official search results," which allows users to see which search results are missing from the official Weibo.

Freeweibo has around 10,000 unique visitors per day, with most coming from China, including Taiwan, based on the language setting, according to Percy Alpha, the pseudonym used by one of the founders.

A week after the website went live, it was blocked on the mainland. But the creators of the website have also been trying to provide mirror sites that are accessible without a VPN.

From the list of blocked keywords provided on the website, it is also clear when some words become sensitive and when such scrutiny is lifted.

For instance, the name of Bo Xilai, former Party chief of Chongqing who was recently prosecuted on corruption charges, was banned from searches until July 25, the day the news of his prosecution was announced.

Meet the founders

The same team also founded Greatfire.org back in 2011, a website that enables real-time testing of what is being blocked by the Great Firewall of China (GFW). URLs being tested are added by users or are imported from other similar projects. At the moment, the website monitors over 10,000 websites regularly to see if they are blocked and then analyzes precise methods of online monitoring such as connection resets, DNS poisonings and so on, explained Percy Alpha.

The website also provides an up-to-date database of URLs and keywords that are blocked.

Greatfire is also blocked in the mainland. Test data collected by the website clearly showed a 6-month gap between when its Chinese version was blocked and then the English.

The founders of the two websites have remained anonymous but one of the three is an American in China who goes by the alias Martin Johnson.

Percy Alpha would only say in an e-mail interview that he lived in China for a long time and is now based in the US. He said they are collaborating with other organizations and developers, though he wouldn't disclose the nature of the organizations they are working with or give further details about their collaboration.

According to their own introduction on Greatfire, they are self-financed but are exploring ways to "make the website a financially sustaining entity."

Percy Alpha said that what pushed him over the edge and made him start the project was the Google China dispute in 2010. Google refused to comply with China's regulations to filter search terms and later moved its Google China servers to Hong Kong.

Not long after that, search for individual characters, mostly those contained in Chinese leaders' names, were also blocked even when they are frequently used in other phrases and expressions.

"Chinese people in general know very little about censorship," Percy Alpha told the Global Times. He said that when he talked to Chinese people about the Google withdrawal from the mainland and searches being blocked, he found that most didn't seem to care and repeated the official line that censorship is just and necessary.

China's regulation on Internet information lists nine types of banned content, most of which concerns national security, state unity, rumors, pornography and violence. But in practice it isn't always clear where the line is and in the event of a breaking incident, certain words or phrases that are otherwise normal might become sensitive for a period of time.

Data provided by Greatfire has been used by other researchers to get to grips with Internet restrictions. In May, for instance, two professors from Northwestern University in the US used its data to study how the GFW affects users' online behavior.

Percy Alpha says the team is also developing easy tools that allow people to access free Internet and to make information available in China.

Zhang Zhi'an, an associate professor in new media at Sun Yat-sen University, said plenty of Chinese scholars also observe and study Weibo regulation. He acknowledged it might be easier for researchers overseas as they are not restricted by the GFW and take less risks when doing so.

"I don't know about their motives, but by presenting this blocked information, they allow more people to know about Internet regulation in China and provide data for other scholars who might be interested in studying China's Internet monitoring," he said.

Academic support

Their team isn't the first or the only one watching the censors and collecting data about blocked content. Many individual or academic efforts are also being made to take a closer look at how China's Internet and social media operate. Oftentimes, such projects inspire each other and even use each other's data.

For example, Freeweibo was inspired by and uses data from WeiboScope, a data collection and visualization system developed by the Journalism and Media Studies Center of the University of Hong Kong in 2011.

WeiboScope uses API tools provided by Weibo to retrieve posts from 350,000 users at set time intervals to show how posts are diffused and censored. People can also search for the most reposted microblogs with images within the past 24 hours or search for specific keywords in several languages. This allows people to get a real-time idea of trending topics on Weibo, without online monitoring.

With this tool, researchers at the school are able to assess online monitoring on Weibo and the impact of policies such as the real-name registration policy enacted last year that requires microbloggers to register with their real identity.

The web page for WeiboScope is also not accessible in the mainland.

Another project centered in academia is China Digital Times, a bilingual news website that brings "uncensored news and online voices from China to the world." It is supported by the Counter-Power Lab at the School of Information, University of California, Berkeley. Both the Chinese and English websites are blocked.

Since 2011, it started a research project that aims to construct a database of sensitive Weibo search keywords. It's an open source project where Web users could pitch in.

Xiao Qiang, the founder and editor-in-chief of China Digital Times, is an adjunct professor at UC Berkeley. He was a theoretical physicist by training and later became a human rights activist.

Other efforts

One of the few projects that remains accessible in the mainland is a Tumblr page called Blocked on Weibo, which documents words blocked on Sina Weibo and also offers contexts and explanations for the bans. The creator of the blog is Jason Q. Ng, a 2013 Google Policy Fellow at the University of Toronto's Citizen Lab.

Ng uses a different approach. He developed an automated process to check individual words to see whether they are blocked or not. He tested 700,000 Chinese Wikipedia titles in early 2012. The script performed searches on Weibo for three months and recorded whether they were censored. He collected over 150 terms and explained why they were sensitive in a book also entitled Blocked on Weibo which will be published next month.

Ng, a US citizen and a graduate student in East Asian Studies at the University of Pittsburgh, said he didn't have a background in computer science prior to this project.

He said he doesn't have an agenda with Blocked on Weibo, and that it's a "fun little challenge" for him as "coding is akin to solving a puzzle, solving little pieces at a time."

In his past career as a book editor, Ng worked on a book about China Central Television and developed an interest for how media works in China, he explained.

Ng wrote on his blog that he hopes his site "proves the resourcefulness and resiliency of Chinese netizens as well as the sense of responsibility that Chinese leaders (in the government and in private organizations) have for shepherding the country forward. You could even claim that the CCP [CPC] cares too much for its citizens."

Ng explained he meant no sarcasm by this. "Even though I don't agree with such a sentiment, I think it is part of a argument that needs to be legitimately considered in order for those outside China to begin understanding why such restrictions are in place in China," he said.