What Is Real Time Search? Definitions & Players

There seems to be no end to companies saying they offer real time search these days. And no end to people quoting how Google itselfsays it wants to improve in the area. But what does real time search really mean? This article offers some definitions and focuses on players in the space.

Real Time By Any Other Name Would Smell As Tweet

For me, “real time search” means looking through material that literally is published in real time. In other words, material where there’s practically no delay between composition and publishing. You take a picture and seconds later, it’s posted to the world to see. You think of something, immediately tap it out on Twitter, and your tweet is shared almost as soon as you thought of it.

What’s NOT real time publishing? Blogging, for the most part. A post has to be written, which typically will be at least a few paragraphs long. It may involve some research, taking more time. It may involve a ton of research, taking even more time. The mere act of creating and publishing the post will likely take more than a minute, if not several.

Publishing in minutes isn’t the same as publishing in real time? Nope. Not when the time to publish a tweet is seconds. You see something, hear something, want to say something, feel an earthquake happening –- you bang it into a simple box and bam, you’ve microblogged.

Microblog, by the way, is my generic word of choice for the moment to represent what we do on Twitter when we post, or when we do a “status update” on Facebook, or a post on FriendFeed. It’s not a perfect word, but “status search” or “update search” sound stupid; “activity stream search” sounds like something out of Ghostbusters. Microblogging fits better than those; I’m open to alternatives.

What about news content? With few exceptions, news content isn’t instantly posted online, and there’s still that same “minutes” to publish aspect as with blogs.

How about the fact that Google can return “fresh” content sometimes within minutes of that material first being published. Yes, Google does this. But the material itself wasn’t published in real time, nor does it make Google into a real time search engine. That blog post or web page or news article took time to compose between the original thought and the actual publishing event. It didn’t go out in real time.

How about mining social media sites like Delicious or Digg? Some social sharing activity fits into the real time publishing model. But social bookmarking or social news sharing isn’t necessarily real time to me (How Search-Like Are Social Media Sites? provides a deeper look at different types of social media sites). That’s especially true when you consider that most of what’s shared through news sites like Digg itself already wasn’t published in real time. When I see such services included in a “real time” search engine, it usually indicates to me that they’re trying to be more than a Twitter search service as best they can.

Twitter, of course, is the real time publishing leader — the leading microblogging service. So much material is published by so many people so quickly through Twitter that real time search is largely synonymous with searching tweets.

Yes, there are alternative real time publishing platforms. You can Twitpic photos within seconds. But a mass audience doesn’t really know about your picture until you tweet the Twitpic URL, I’d say. Sure you can share thoughts through Facebook status updates. But those updates aren’t readily available to the entire world.

In the real time publishing solar system, Twitter is the Sun around which everything else currently revolves. And that leads to the Twitter problem, for those trying to offer real time search.

The Twitter Firehose

People tweet a lot. So much so that there’s no way for even the champion of crawling content quickly, Google, to catch it all. To index every tweet – as it happens, and if Google could even find them all – would probably bring Twitter’s web site to a halt under the external load.

Instead, anyone who wants to fully index the Twittersphere in real time needs access to what’s known as the Twitter Firehose feed, effectively a direct blast of all the tweets as they happen, piped directly to a partner.

Twitter said last year that only four companies were getting the firehose data. One partner they purchased, Summize — which is now Twitter Search. FriendFeed was only getting it for a subset of Twitterers who also use FriendFeed. It’s unclear if Zappos really got the entire thing or still gets it. Twittervision might still get it, but I suspect they’re now using the Twitter API. It’s unclear who, if anyone, is still getting it.

The Twitter API allows partners to conduct searches at Twitter automatically, to bring back data to someone based on those they’re following or tap into Twitter data in other ways. However, the API limits how much data can be requested and does not give access to everything Twitter has stored.

The major search engines like Google, Bing and Yahoo want the firehose data. Twitter’s been talking to them, but no agreements have been reached. This seems less to do with any technical issues and far more to do with financial ones. If Twitter gives Google its firehose, it loses a unique feature, that of Twitter Search as the only service with the ability to search all tweets (when Twitter Search actually works right, more on that in a bit). Access to the Twitter firehose won’t come at a cheap price.

Searching Tweets Versus Tweeted Links

So one challenge for any real time search player is how to get Twitter’s data. Another is what type of search to offer using that data. Do you let people search through what’s being said — to find tweets — such as what’s going on in Iran? Or do you try to mine links that are being tweeted, such as to hot news articles about Iran that are being shared through microblogging?

These are two entirely different things yet “real time search” is applied to both of them, creating confusion.

To me, “real time search” should be reserved for searching what’s being tweeted, what people are talking about, what they’re microblogging. “Real Time Chat” or “Real Time Talk,” if you need a different name for it. Maybe “microblog search.” Just as news search covers what’s being published from news sources, real time search for me covers what’s being said in real time, not links that are being passed around.

As for the links, services that mine these can be useful. But link-based search doesn’t necessarily correspond to fresh content. For example, some microblogged links will be tops in some “real time” search engines based on tweets over a long period of time. Even when the links are for new material, being passed around in real time doesn’t equate to being real time content. “Hot Search” or “Popularity Search” might be better names for these services. But since the “real time” theme is hot right now, I don’t expect the names to clarify. Confusion will continue.

Major Microblogging Players = Major Real Time Search Players

With some definitions out of the way, it’s time to dive into services to use for real time search. By that, I mean the services that let you search through real time microblogged content. After that, I’ll talk about those that let you see what’s hot based on real time publishing.

No one has better access to what’s been tweeted than Twitter. And since Twitter is the king of real time publishers, Twitter Search is also the king of real time search. Visit the site, enter what you’re looking for, and you’ll see everything being published as it goes out (except from the relatively few who “protect” their updates and don’t release them to the public).

Unfortunately, Twitter Search has been plagued with problems recently. Trying to locate something you tweeted in the past? Use the advanced search page, and despite the many options offered, you might not find what you’re looking for. You’ve not lost your mind. Twitter Search has simply lost your tweet. I want to write a song about it. “To all the tweets I’ve lost before…”

The missing tweets are one reason why Twitter should offer all users the ability to export their tweets, in the way blog posts can be exported from one blogging provider to another. It reassures users that their microblogged content will remain accessible even if Twitter fails. And good news — after I tweeted wanting this yesterday, Twitter cofounder Ev Williams tweeted back that it’s in the works:

@dannysullivan I agree we should do this. It’s on the list — as is extending the search window.

Twitter Search also faces problems with spam. When the primary ranking mechanism in a search engine is publish time, it’s pretty easy to spam the results. These past articles from me explain this in more depth

What will come from Twitter Search? Will spam filtering improve? Will we see some type of authority metric mixed into ranking results? All we know that things tend to move slowly with Twitter — and also so far, Twitter Search still seems to have the most complete access to Twitter’s data.

When Twitter goes down, where do you go to as an alternative for getting that important thought out of your head and into the world in real time? For me, it’s FriendFeed. There might be relatively few like me, but I still think FriendFeed will grow an important microblogging service.

Originally, FriendFeed was more a place where you could pull together all your social activities into one single feed. You could (and still can) link your blog, your Flickr account, your YouTube account, submissions to Digg, things bookmarked on Delicious and more to your FriendFeed account. Do this, and you have a megafeed of everything you’ve done. That makes it easy for your friends (or others) to track what you do.

FriendFeed isn’t limited to flowing in stuff from elsewhere. Just like Twitter, you can post thoughts that anyone can see through the FriendFeed service. Just enter something into the posting box similar to how you’d tweet:

Do that, and what you’ve written goes out, Twitter-style. The arrows in the screenshot above point at the box and also how you can “CC” what you’ve posted to go out also to your Twitter account, if you’ve linked it to FriendFeed. What you tweet on Twitter can also flow into FriendFeed — which I highly recommend. More on this in a bit.

FriendFeed allows anyone to search against anything being recorded by the service. Just go to the home page, enter what you’re interested in, and you’ll see results come back:

Some of this will be “real time talk” or microblog posts — information that’s been published via Twitter or Facebook, as the screenshot shows. Some will be information that’s not real time talk, such as the first item listed, a news article that’s been shared via Google Reader. Occasionally, real time talk coming off FriendFeed updates will also appear.

The downside is that FriendFeed isn’t complete. It won’t have all that Twitter or Facebook has. It only has material from those who make use of the services and explicitly link them to FriendFeed. Another downside is that there appears to be no easy way to see only what’s being microblogged on FriendFeed.

Then I get back only material being microblogged on Twitter or Facebook, by those who are also FriendFeed users. But I can’t combine them, to see posts from both places at once. And the “service:friendfeed” command doesn’t bring back just FriendFeed microblogged posts. It also brings back items that are bookmarked and a few other things.

So if you’re after just real time talk, FriendFeed’s results are a bit polluted with other material. On the upside, it’s a wonderful backup to Twitter, for those who are using it to pull in their feeds. Consider this search on Twitter:

Kind of a bummer. Apparently, I’ve never said anything about “southwest” on Twitter. Except, I have:

As you can see, FriendFeed finds them, even when Twitter itself doesn’t.

Facebook is the closest challenger to Twitter in the real time publishing space, I’d say. The service has long had the ability for people to share with friends what they’re up to. Similar to Twitter, you’ve got a box that lets you enter anything:

These “status updates” can also be linked to photos, videos or links to material on the web, though they don’t have to be.

So why isn’t Facebook a leader in the real time search space, like Twitter? Because the service doesn’t offer real time search — at least not yet.

For one thing, you cannot go to Facebook as you can with Twitter or FriendFeed and search without being logged in. Then, even if you are logged in, most people do not have the ability to search against status updates. You can search for people, Facebook Pages, Facebook Groups, Facebook Applications — even the web — but not for what people are microblogging on Facebook.

Facebook is currently testing a new search service with a small group of people that changes this. For example, here’s a search for “4th of july” using the new service:

The arrows highlight that I’ve searched for status updates (microblog posts) done from everyone on Facebook. Not just people I’ve friended. Everyone.

Woah! What about all that privacy stuff! Relax (a bit). In another recent change, you can now choose to share your status updates with everyone (the world), or just your friends, friends and their friends, or block them from particular people (through customize):

When someone searches “everyone,” the only see posts that have also been shared with everyone.

Still, there’s no denying that plenty of people are microblogging via Facebook. It’s clear the service will keep opening things up. If Facebook users themselves shift more toward sharing with the world, it’s a major microblogging source to be mined.

NOTE: SINCE THIS WAS WRITTEN, BOTH GOOGLE AND BING HAVE ROLLED OUT REAL TIME SEARCH SERVICES. SEE FURTHER BELOW.

Meta Real Time Search Engines & Third Party Twitter Search

For those unfamiliar, a “meta search engine” is a search engine that allows you to issue a single search and pull back information from more than one search service. Want results from Google, Yahoo and Bing all at once? A meta search engine like Dogpile does this for you.

With microblogging, the situation is somewhat similar. Twitter is both a microblogging service and a way to search against its own real time posts. Facebook is a microblogging service with its own developing search engine. But neither searches outside their own borders.

FriendFeed can pull in from both Twitter and Facebook, plus adds its own microblogging content, and so it would seem to be a meta search service for microblogging. However, FriendFeed doesn’t pull in all content. It only pulls in content being shared from these places by its own members.

True meta real time search would reach across borders, allowing you to see what people are microblogging on a variety of services. But we’re not quite there yet.

With the players I’ve looked at, no one seems to be tapping into the Facebook “everyone” microblog feed. No one seems to be tapping into FriendFeed’s stream, either. Twitter is present, often along with some minor microblogging services. For the most part, that makes the services below less meta real time search and more third party Twitter search services. They hold promise for the future, and they offer some interesting features even now.

Collecta lets you search against blog posts, articles, comments on blog posts, but those aren’t microblogging, to me. What is microblogging are the posts it will search against from Twitter, Jaiku and Identica, at the moment. Twitter’s the big fish in that pond. Collecta is one to watch especially if it begins to pull in FriendFeed and Facebook data.

The screenshot above shows how searches can be filtered to just microblogging (first arrow on the left) and then you can click on any listing to read in more depth (second arrow):

Flickr results can also be tracked, and that’s semi-microblogging. Some people do send their pictures straight to Flickr immediately after shooting them.

Scoopler is both a meta search site for microblogging as well as a service that highlights content being shared through microblogging. In terms of posts, it only carries microblogged text content from Twitter. Flickr photo search is included, but Twitpic is absent. Digg and Delicious are included, but I wouldn’t consider either of them to be sources of real time data.

Brand new, Status Search depends on logging into your Twitter and Facebook accounts (this is done in a way that Status Search itself doesn’t actually see your password, to my understanding). Once you’ve connected, you can then search updates from your friends and followers in both places at once. This doesn’t provide you an “everyone” view, but it’s also kind of neat to focus on what’s being said by those you know or are connected with.

The focus here is featuring latest posts from Twitter. Twazzup also pulls in related pictures from Twitpic, which I like, plus shows most popular links on a topic. You can also see top contributors (by default, those being most retweeted on a topic).

A pure meta microblog search out since January, Twingly Microblog Search in posts from Twitter, Jaiku and Identica and Bleeper. Others are offered, though I couldn’t see how to enable these easily. Twitter is the big fish, of course.

Mining Links

The services below primarily focus on tapping into links being shared through microblogging service, especially in an attempt to show you what’s being shared in real time.

You can do a search of microblogging talk with CrowdEye, using the “next” button to see the latest information. It all comes from Twitter. The service seems primarily oriented toward helping you find the most popular articles that are being passed around on a particular topic through Twitter.

OneRiot: One of the veterans of the “real time search” space, formerly known as Me.dium, OneRiot doesn’t cover real time microblogging at all. Instead, it’s designed to show you the most popular shared links based on data it monitors from Twitter, Digg and other unnamed social sharing services (these are a secret, I’m told).

If you want to know what’s currently hot in terms of sharing, definitely a place to check out. If you’re trying to figure out who should get credit for first sharing an item, OneRiot tries to identify that, too.

Like OneRiot, Topsy is designed to help you find the most popular links that are being shared through microblogging. Currently, it depends entirely on Twitter data. One thing I particularly like about Topsy is how useful it seems for finding good content that’s not necessarily new.

For example, a search for url shorteners brings back solid articles that are several weeks old. While they’re not being passed around much at the current moment, the “All Time” view is helpful. You can also narrow to popularity on a month, week or day basis. “Top Authors” are also listed for a particular search topic (how and what exactly this means isn’t explained on the site. I may update later).

The site:twitter.com lets you search for anything just on the Twitter site (mostly updates people are making) for the words you specify (google os, in this example). You can then use the new “Show Option” feature to select “Recent Results” and then “Sorted By Date” to get the freshest posts at the top.

Easy, huh? Well, it’s an alternative. Even sorted by freshest, the results will be old compared to Twitter search:

In the example above, the “freshest” post was 14 minutes old, while on Twitter Search, the freshest was 10 seconds old.

You can try similar commands with Yahoo and Bing, but I haven’t found them to work near as well. However, Bing did recently add results that appear when you search for some people by name followed by the word twitter, tweet or using their twitter name:

Bing Adds Twitter Smart Answers explains more about this. It’s not the best integration in the world, but it’s a start — and the first integration by any major search engine.

Better integration would be to show Twitter or other real time search results right within regular searches. You can get that, but you need Firefox and some add-ons:

An even further improvement would be a dedicated microblog search service from any of the majors. Some of that will depend on firehose agreements they strike with major providers such as Twitter and Facebook.

In the case of Google, we’ve also seen hints that a microblog service specifically to tap into Twitter is in the works. For more about this, see:

There are other places to do microblogging beyond Twitter, FriendFeed and Facebook. MySpace offers its “Status and Mood” box. LinkedIn has its “Network Update” box. Dedicated services like Plurk and Identi.ca are still out there. But these are the Plutos of the microblogging solar system, I’d say (in terms of social networking, MySpace and LinkedIn are, of course, major planets). Relatively few people seem to use these alternatives to microblog compared to the major players above. At MySpace and LinkedIn, there’s also no way that I can see to do a search restricted to microblogging.

As for third party microblog search engines, I know there are some I haven’t covered in the lists above. I looked at plenty, and the ones above stood out to me personally in some way. Got your own favorite? Operate a service not mentioned? Well, you can comment below. Here are some I also visited briefly:

almost.at: No searching. Instead, pick from predefined topics and see microblogged content, images and links scroll.

dailyRT: Interesting. Search to see most retweeted by keyword, for different time periods. You can also filter by those who have a certain number of followers, by set data and if you log-in, only from those you follow. Worth watching.

itpints: Pulls back microblogged content from Twitter (called lifestreaming on advanced search page) as well as from news, social bookmark, image and other sites. Would like it much better if all the content sources were fully itemized.

Monitter: Picks three trending topics, shows live tweets scrolling down the page. You can narrow to a particular place.

Twitority: Do a search, and it tries to rank tweets by the authority of the twitterer. What that authority is, how it’s calculated, isn’t explained. If it were, I’d like it more.

Twitmatic: Just want to see video that’s being shared? That’s what this service provides. With plenty of spam showing up when I looked at a search for “iran.”

Yauba: Worried about privacy? This pulls up pages within the site itself, supposedly protecting you from being tracked. Not sure how that goes if those pages still carry tracking codes (Twitter pages that loaded still had Google Analytics code that should register your visit, for example). The “Real Time” drop down option lets you search against Twitter and Identica.

Much of real time search remains a waiting game. We wait to see how Twitter will improve its own service. We wait to see how Facebook opens up status updates. We wait to see if FriendFeed will be able to pull in the full firehose of data from both places, which would strengthen it even more. We wait to see if any of the other services will get further data, or which one of them tapping into link data may emerge with the largest audience. Most of all, we wait on the major search engines, to see how they’ll integrate real time search — or not.

And if they don’t get there? Right now, I find real time search pretty compelling. It’s a new area which seems to have lots of prospects for advertising (when people check to see if Time Warner has gone down, Comcast having an ad to entice them away is pretty well targeted). It’s an area that can provide fast answers that news or blog search can’t match (along with disadvantages that speed can also spread false rumors. Proposition 8 wasn’t overturned. Jeff Goldblum isn’t dead).

The link data and activity is also enticing. Google’s Marissa Mayer is out this week talking again in a Guardian article about the potential that can be mined from real time data:

We think the real-time search is incredibly important and the real-time data that’s coming online can be super-useful in terms of us finding out something like, you know, is this conference today any good? Is it warmer in San Francisco than it is in Silicon Valley? You can actually look at tweets and see those sorts of patterns, so there’s a lot of useful information about real time and your actions that we think ultimately will reinvent search.

Still, some of the hype needs to be dialed down. Go back to 2004, and there was plenty of talk that the major search engines were somehow missing out on a major opportunity by not offering blog search. Go back further, and when Google acquired Blogger, you had people speculating the purchase was necessary in order to better find information on the web.

Blogger didn’t help Google with blog search. It certainly didn’t help improve search overall. And while Google is the only major search engine today to offer blog search, no one suggests that this is why it has a lead over Yahoo and Bing. Google Blog Search isn’t even Google’s top sites. Don’t get me wrong. I’m glad Google offers it and continuestoimprove it. But blog search didn’t turn into some secret weapon. Real time search probably won’t, either. But it does deserve attention, and it’s an exciting space to watch develop.