How do you parse a tweet? Five years ago, that question would have been gibberish. Today, it's perfectly sensible, and it's at the front of Amit ­Singhal's mind. Singhal is leading Google's quest to incorporate new data into search results in real time by tracking and ranking updates to online content--particularly the thousands of messages that course through social networks every second.

Real-time search is a response to a fundamental shift in the way people use the Web. People used to visit a page, click a link, and visit another page. Now they spend a lot of time monitoring streams of data--tweets, status updates, headlines--from services like Facebook and Twitter, as well as from blogs and news outlets.

Ephemeral info-nuggets are the Web's new currency, and sifting through them for useful information is a challenge for search engines. Its most daunting aspect, according to Singhal, is not collecting the data. Facebook and Twitter are happy to sell access to their data feeds--or "fire hoses," as they call them--directly to search providers; the information pours straight into Google's computers.

What's really hard about real-time search is figuring out the meaning and value of those fleeting bits of information. The challenge goes beyond filtering out spam, though that's an important part of it. People who search real-time data want the same quality, authority, and relevance that they expect when they perform traditional Web searches. Nobody wants to drink straight from a fire hose.

Google dominates traditional search by meticulously tracking links to a page and other signals of its value as they accumulate over time. But for real-time search, this doesn't work. Social-networking messages can lose their value within minutes of being written. Google has to gauge their worth in seconds, or even microseconds.

Google is notoriously tight-lipped about its search algorithms, but Singhal explains a few of the variables the company uses to analyze what he calls "chatter." Some are straightforward. A Twitter user who attracts many followers, and whose tweets are often "retweeted" by other users, can generally be assumed to have more authority. Similarly, Facebook users gain authority as their friends multiply, particularly if those friends also have many friends.

Other signals are more subtle. A sudden spike in the prevalence of a word in a message stream--earthquake, say--may indicate an important event. If a message on a commonly discussed topic includes unusual phrasing, that may signal new information or a fresh insight. Google, says Singhal, continuously scans for shifts in language and other deviations from predicted behavior.

The company is also working to connect message content to the geolocation data that's transmitted by smart phones and other mobile computers, or broadcast through services like Foursquare. The location of someone sending a message can matter a great deal. If you know that a person tweeting about an earthquake is close to the epicenter, chances are those tweets will be more valuable than those of someone hundreds of miles away.

Singhal's view of real-time search is very much in line with Google's strategy: distilling from a welter of data the few pieces of content that are most relevant to an individual searcher at a particular point in time. Other search providers, including Google's arch rival, Microsoft, are taking a more radical view.

Sean Suchter, who runs Microsoft's Search Technology Center in Mountain View, CA, doesn't like the term real-time search, which he considers too limiting. He thinks Microsoft's Bing search engine should not just filter data flowing from social networks but become an extension of them.

Ultimately, says Suchter, one-on-one conversations will take place within Bing, triggered by the keywords people enter. Real-time search, he predicts, will be so different from what came before that it will erase Google's long-standing advantages. "History doesn't matter here," he says. After a pause, he adds, "We're going to wipe the floor with them."

Amit Singhal has heard such threats before, and so far they haven't amounted to much. But even he admits that real-time search comes as close to marking "a radical break" in the history of search as anything he's seen. Keeping Google on top in the age of chatter may prove to be Singhal's toughest test.

Monday, January 11. 2010

Now social media has the equivalent of the Times Square "deficit clock."

Today the Web is bursting with social media content and a burgeoning supply of (and demand for) "real-time" information. This information is created as people open new Facebook and other social media accounts, churn out Tweets and other microblogs, post photos and videos, and tirelessly text one another. But getting a grip on exactly how much is happening--and what the primary sources are--is a slippery task, especially since web companies often jealously guard their metrics.

The new social media counter. Credit: Gary Hayes.

Now there's a social-media "clock" of sorts, which you can check out here. It charts the second-by-second accumulation of social-media accounts, blogs, Tweets, photo uploadings, status updates, and the like. Consider it the social-media equivalent of that national-deficit "clock" in Times Square.

The effort does require a reality check. It's not actually an accurate rendering of the real-time Web. Rather, it's a counter, created by an Australia-based virtual-world entepreneur named Gary Hayes. Hayes set the various rates of increase according to various estimates culled from disparate sources such as analysts, company blogs, and news media accounts. Some of the estimates are several months old and may not actually be accurate or complete.

But, while it may not provide any new primary information, or be accurate in all categories, Hayes' social-media clock is nevertheless an excellent visualization of where much of the Web's growth is coming from these days.

fabric | rblg

This blog is the survey website of fabric | ch - studio for architecture, interaction and research.

We curate and reblog articles, researches, writings, exhibitions and projects that we notice and find interesting during our everyday practice and readings.

Most articles concern the intertwined fields of architecture, territory, art, interaction design, thinking and science. From time to time, we also publish documentation about our own work and research, immersed among these related resources and inspirations.

This website is used by fabric | ch as archive, references and resources. It is shared with all those interested in the same topics as we are, in the hope that they will also find valuable references and content in it.