Saturday, June 10, 2006

Caching in Google search

Very recently came across a scenario while searching in the web and got to know about the real usage ‘Caching’ facility that is provided by Google. While searching the web I found the information I was looking for and clicked the link. But the contents in the page were moved and the page was having something other than what Google search page showed in its initial page. So it means that the contents of the web page are moved. I now tried the ‘Cached’ button in Google and got contents which I was actually looking for.

In computing ‘Cache’ memory comes under the dynamic memory similar to Random-Access-Memory (RAM) and stores the ‘frequently’ accessed contents since the contents are frequently accessed the processor picks the contents from Cache instead of RAM. This increases the processing speed. Storing the contents in the Cache depends on various fundas like spatial, temporal characteristics. Also this Cache gets refreshed periodically.

In normal PCs the ‘Cache size’ is in the order of Kilo Bytes (KBs). With the kind of scale Google is operating, I am just wondering what would be the size of the Cache. I am sure it will be really huge size and maintaining that much huge cache is a really a very difficult task. I wonder how Google does that.

In the mean time can anyone help me to get the ‘Cache’ size of Google?

3 comments:

While you are correct about the Computational cache. In the www, cache has a similar but different meaning. Proxy servers are allowed to store the contents of the url. and depending upon the set parameters of content staling they can return the content as if recieved from the actual site.

Google cache also works on the same funda... uses the content staling rules meant for the proxy servers.

Pash, Thanks for your comments.I was not aware of this proxy server fundas as I am more into Embedded systems and Networking field.But its really interesting to know the 100 TBs of storage for cache :)

think this will be my first negative feedback. Your data is little misleading.The idea of having a cache on google is totally unrelated to the idea of cache on the processor. The cache in a processor as u described is to retrieve the frequently used data faster, and avoid unnecessary off chip memeory access and disk delays. It always contains latest data which might not be on the disk.

But cache in Google is used more as a backup incase the page has been removed and it does not contain the latest data. Abt the size of google cache one need not bother as its not a dynamic memory, it all disks.