Beyond Google’s Reach: Tracking the Global Uprising in Real Time

On Oct. 15, groups of protesters affiliated with the Occupy Wall Street movement began filing into the branch offices of their banks to close their accounts. Later that day, videos began to show up online of those protesters being arrested. Irate branch managers had called the cops, claiming that these customers were being disruptive, so police began hauling the protesters away for booking.

The spectacle of citizens being arrested for attempting to close out their personal bank accounts made a splash in all of the usual corners of the internet. Except one: Google.

Like the larger Occupy Wall Street movement, which is often referenced online via the Twitter hashtag #OWS, the Oct. 15 protest was organized using #oct15. A search for #oct15 on the day of the protest yielded nothing but garbage results, and my searches as late as a day later yielded similar output. But despite allegations that Google — especially Google News, which still doesn’t have any worthwhile results for #oct15 — is censoring protest-related material, the more straightforward answer to the question of why the world’s largest search engine can’t produce useful results for current events in real time is that it’s simply not designed to.

As I found out on the day of Oct. 15, if you want quality information about events as they unfold in real time, then you can forget about the Google search box. Instead, you have to turn to alternative search engines, and specifically to Topsy, which had links to blog posts, videos, and pictures of the protest on the day of the protest, often mere minutes after the information was posted online. I’ve been a Topsy user for the past six months, and on Oct. 15, when Google searches were turning up garbage, I typed “#oct15” into the Topsy search box and was able to track events as they happened.

After that experience, I set out to learn why and how Topsy can do what Google cannot. What I found is that Topsy is a completely different kind of search engine than Google, and its architecture rests on a set of assumptions about authority, relationships, and the nature of the web that would have been inconceivable prior to the rise of Facebook, Twitter, and the rest of social media. Those assumptions, in combination with some smart technology, have produced a search engine that appears to be custom made for tracking the current wave of protest movements as they sweep across the globe.

Ranking people, not pages

“Twenty years ago, when the first web search engines started, what they did was basically database lookup,” says Rishab Aiyer Ghosh, as we settle into a conference room at Topsy’s headquarters in downtown San Francisco. “But once you had millions of documents, it was very difficult to make sense of the results and you needed ranking. That’s where Google’s key innovation of ranking things by the authority of the website came into being. They created graphs of websites and domains, and gave each domain authority based on how much it was cited by other websites, but that’s still the technology that all web searches use. That’s still one of the key relevance criteria, and that’s based on a structure of authority on the web that has changed.”

Ghosh, who co-founded Topsy after a successful career in open source (he’s an OSI board member, and he coined the term FLOSS in 2001), goes on to contrast Google’s page — and domain-based authority model with that of Topsy, which essentially ranks people, not pages.

“We don’t build a graph of websites — we build a graph of people. And we compute the authority of people — what we call “influence”— based on the likelihood that a single post from a single individual is going to get attention,” Ghosh says. “It’s basically like PageRank for Twitter.”

Web pages, videos, photos, apps, and the other kinds of digital artifacts that Topsy indexes are then given a score that is based on in part on how many people are talking about them at the moment, and in part on who is doing the talking. The result is a search engine that can immediately surface information about events either as they happen, or very shortly thereafter.

Topsy vs. Google in an earthquake bake-off

As if the universe were trying to prove a point about the merits of real-time search, while Ghosh and I were talking, a magnitude 4.2 earthquake rocked the Topsy headquarters, which is on the sixth floor of a building in downtown San Francisco. Our first reaction to the quake, after we took stock of the situation and realized that everything appeared fine, was to check Topsy for information on the quake that had just happened the moment before.

At the top of the Topsy results for “earthquake” were links and tweets about a quake that had happened in San Antonio, Texas earlier the same day. Beneath that was information about the Great California Shakeout, a statewide earthquake drill that had ironically taken place that very morning. And then, below the Shakeout news, there was a small trickle of incoming tweets about the quake that had just happened.

Ghosh told me that it takes tweets about 10 seconds to show up in Topsy (the site pulls all tweets from the full Twitter firehose). So we waited about 30 minutes and then went back to do a Topsy vs. Google earthquake bake-off, complete with screen shots. By that time, Topsy was full of news about the new quake.

Google… well, not so much:

Tracking the Arab Spring and #OWS

The same technology that can track the rush of tweets that now inevitably follows an earthquake can also track social movements as they develop. So when the Arab Spring broke out in late 2010, Ghosh immediately began using Topsy to track it. “Some friends of mine were in Tahrir Square,” he says, “so I started using Topsy to see what was going on.”

What Ghosh found was that he could follow the uprisings by looking at the volume of protest-related hashtags. As the Egypt protest began to gain traction, the volume of hashtags related to hit suddenly began to surge and eclipse those related to the Tunisia protests, which had previously been dominant. Then Bahrain began to surge, and after that, Saudi Arabia.

Share of Voice Applied to Mideast Country Hashtags. Source: Topsy

As with the #oct15 example from Occupy Wall Street, many of the protest organizing hashtags for the Arab Spring were simple dates. Ghosh found that he could tell which protest dates were gaining traction and which were sputtering out by watching their hashtag volume on Twitter; the hashtag volume turned out to be a pretty good indicator of whether a protest was actually going to happen on a given date.

While hashtag activity may correlate with protest success, it’s important to note that Ghosh is skeptical of claims that the Arab Spring wouldn’t have happened if not for social media. People in Egypt were extremely frustrated, he speculates, and there wasn’t much they could do to express that frustration other than to take to the streets. So the protests probably would’ve happened anyway. However, he does acknowledge that social media played at least some role in solving the protesters’ information problem.

[The protesters’] frustration is an information problem,” Ghosh explains. “You might be ready to go and protest your dictator, but you don’t want to be the only one doing it. You want to go out if there are a million people going out. But how do you know if there are a million people going out? With social media, if you feel like going out today, within a few hours you know if everyone is going out today or if they’re going out next week. So that’s the information piece — the knowledge that other people share your perspective and that they’re going to do this spreads faster with social media.”

Occupy Wall Street is a different story, though, and Ghosh sees it as possibly the first truly social media-driven mass protest movement. He argues that the level of frustration in America is lower than it was in Egypt, so #OWS needed the visibility that Facebook and Twitter provided — the reassurance that there would be many others out protesting on a certain date, and that you’re not wasting your time if you show up.

In the first graph below, you can see that in the beginning of the Wall Street protests, it was referred to variously as #OccupyWallStreet, occupywallstreet, and #OWS. After the two former terms peak in popularity, eventually conversation becomes more focused and the shorter #OWS hashtag takes over as the movement’s main label. You can also see a recent dip in #OWS hashtag popularity, which may indicate a loss of momentum on the movement’s part.

Volume Analysis Applied to OccupyWallStreet

The next graph below shows a sentiment analysis of #OWS and related hashtags. Most of the sentiment is neutral (in red), a very small amount is negative (in green), and a sizable and growing amount is positive (in blue). This actually represents a very positive state of public opinion for #OWS, because, as Ghosh explained later, most of the sentiment expressed on Twitter is negative.

Sentiment Analysis Applied to OccupyWallStreet

“We did a whole analysis of intense sentiment hashtags,” Ghosh says, “and it was interesting that the most intense negative sentiment hashtags were things about politics. And the most intense positive sentiment hashtags were, firstly, not that intense, and secondly pretty boring sounding — travel, photography.” This means that sentiment levels have to be normalized with this knowledge in mind, which, when combined with the fact that most sentiment around political terms is overwhelmingly negative, makes #OWS look fairly well received.

A killer app for real-time, non-establishment news

What Ghosh and the team at Topsy have ultimately produced isn’t just a search engine, but a killer app for non-establishment, real-time, crowd-sourced media. As Ghosh points out, a Google image search for “Egypt” yields pictures of pyramids and camels, while the same Topsy image search turns up camera phone pictures from Tahrir Square. What’s more, these cameraphone photos showed up at the top of Topsy’s results shortly after they were uploaded on the day of the protest.

“Previously, the only source of information was what professionals had edited together and published officially, and that’s kind of the Google model. That’s why when you search for any proper noun, the Google top results are going to be Wikipedia results,” says Ghosh.

Ghosh contrasts Google’s results, which always presume deliberate curation on the part of individual domain owners, with Topsy’s more chaotic, bottom-up results, the character of which changes drastically over time from non-curated to curated as stories move from niche to mainstream. In the beginning of the Egyptian protests, for instance, there were a handful of native Egyptian Twitter users that Topsy had identified as authorities on the movement, and the search engine’s results reflected that. But later, as mainstream news outlets increased their coverage of the events in Cairo, the Topsy results began to look more like Google’s results, with NPR topping the list of most credible sources for Egyptian protest news.

This dynamic — where results change over time from niche and free-form to mainstream and curated — suggests that on a long enough time horizon, Google’s and Topsy’s results will tend to converge on an established set of authoritative, curated pages. But in those early moments of a protest (or an earthquake), Topsy is pretty much the only place you can go to get ranked and sorted real-time information from the people on the ground.