Thursday, August 31, 2006

Google Personalized Search and Bigtable

Personalized Search generates user profiles using a MapReduce over Bigtable. These user profiles are used to personalize live search results.

This appears to confirm that Google Personalized Search works by building high-level profiles of user interests from their past behavior.

I would guess it works by determining subject interests (e.g. sports, computers) and biasing all search results toward those categories. That would be similar to the old personalized search in Google Labs (which was based on Kaltix technology) where you had to explicitly specify that profile, but now the profile is generated implicitly using your search history.

My concern with this approach is that it does not focus on what you are doing right now, what you are trying to find, your current mission. Instead, it is a coarse-grained bias of all results toward what you generally seem to enjoy.

This problem is worse if the profiles are not updated in real time. This tidbit from the Bigtable paper suggests that the profiles are generated in an offline build, meaning that the profiles probably cannot adapt immediately to changes in behavior.

4 comments:

Anonymous
said...

I wonder if this could potentially limit your access to information on the web. If I regularly search for keywords about one of my hobbies, high end audio, I may consistently use a word like "tube" looking for vintage vaccuum tubes for an amp. But if I am travelling to London, and search for London Tube (the transportation system), would I be more likely to have someone in London who supplies vacuum tubes arrive at the top of the list? This is just a random example, but would a system like this bury certain information further, making a less democratic search tool?

Likewise, would it become as annoying as the Amazon homepage where if I do a single search for a blender, or forget to remove a blender from my wish list - I am constantly staring at blenders on my Amazon homepage.

Personlized search is not a new thing that Google came up with. They are definitely a smart bunch of people but personalizing of deductive searches were done a before that. In fact yours truely already has a patent granted from his work out of Novell almost seven years ago that has a patent granted in this type of searching algorithm. " Predicate indexing for locating objects in a distributed directory". My doamin for the application of this patent was different. It was the Directory objects in an X.500/LDAP directory. Eric was the CEO then.

To answer the other queries, this type of searching logic does not limit your abilities to search for more relevant objects as generally these types of cached subsets get updated with new content when the search crawlers get more updated data results. In short if this is implemented properly, and in my mind I do not have the slightest doubt that the Google chaps have done a good and smart implementation, then you should have the most relevant results. I have also heard that elements of classification like the ones that Dr. Barney Pell is working on with his new startup will actually enrich these types of solutions by applying semantics to the search criteria.