Google's new 'Caffeine' search completes roll-out

New search methodology for faster search of real-time web

Google has completed the roll-out of Caffeine, its new search indexing infrastructure which promises to make searching faster and more intuitive than ever before.

Caffeine aggregates more pages than ever before and is designed to keep up with the increasingly growing, real-time web.

Google has halved the time it takes to index pages, which means users get faster access to more up-to-date content.

Completion

"Today, we're announcing the completion of a new web indexing system called Caffeine," reads a post on the Official Google Blog.

"Caffeine provides 50 percent fresher results for web searches than our last index, and it's the largest collection of web content we've offered.

"Some background for those of you who don't build search engines for a living like us: when you search Google, you're not searching the live web.

"Instead you're searching Google's index of the web which, like the list in the back of a book, helps you pinpoint exactly the information you need."

Higher expectations for search

Google understands that "people's expectations for search are higher than they used to be" and, as such, sees Caffeine as a major step forward in the way in which search works.

The company explains the difference between Caffeine and its older search index as follows: "Our old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks.

"To refresh a layer of the old index, we would analyse the entire web, which meant there was a significant delay between when we found a page and made it available to you.

"With Caffeine, we analyse the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index.

"That means you can find fresher information than ever before—no matter when or where it was published."

Miles of paper and iPods

This means that Caffeine processes hundreds of thousands of pages in parallel every second, or as Google puts it: "If this were a pile of paper it would grow three miles taller every second."

The Google Blog notes that Caffeine takes up, "nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day."

A startling amount of data which, again to put this in terms many people will understand, would "need 625,000 of the largest iPods to store," and, "if these were stacked end-to-end they would go for more than 40 miles."

So it's goodbye to Google's old indexing system with its layered approach and hello to the real-time web friendly Caffeine, which has been in testing internally at Google since last August.