Friday, May 06, 2011

The value of Google Maps directions logs

Ooo, this one is important. A clever and very fun paper, "Hyper-Local, Direction-Based Ranking of Places" (PDF), will be presented at VLDB 2011 later this year by a few Googlers.

The core idea is that, when people ask for directions from A to B, it shows that people are interested in B, especially if they happen to be at or near A.

Now, certain very large search engines have massive logs of people asking for directions from A to B, hundreds of millions of people and billions of A to B queries. And, it appears this data may be as or more useful than user reviews of businesses and maybe GPS trails for local search ranking, recommending nearby places, and perhaps local and personalized deals and advertising.

From the paper:

A query that asks for directions from a location A to location B is taken to suggest that a user is interested in traveling to B and thus is a vote that location B is interesting. Such user-generated direction queries are particularly interesting because they are numerous and contain precise locations.

Direction queries [can] be exploited for ranking of places ... At least 20% of web queries have local intent ... [and mobile] may be twice as high.

[Our] study shows that driving direction logs can serve as a strong signal, on par with reviews, for place ranking ... These findings are important because driving direction logs are orders of magnitude more frequent than user reviews, which are expensive to obtain. Further, the logs provide near real-time evidence of changing sentiment ... and are available for broader types of locations.

What is really cool is that, not only is this data easier and cheaper to obtain than customer reviews, but also there is so much more of it that the ranking is more timely (if, for example, ownership changes or a place closes) and coverage much more complete.

I find it a little surprising that Google hasn't already heavily been using this data. In fact, the paper suggests that Google is only beginning to start using it. At the end of the paper, the authors write that they hope to investigate what types of queries benefit the most from this data and then look at personalizing the ranking based on each person's specific search and location history.

4 comments:

You can also take that one step further -- not just what directions people are searching for but where they're physically going using all the geo-location data from Android users.

This is analogous to Google's search algorithms more generally. Instead of relying on humans explicitly rating sites, just look at what they link to. Instead of relying on humans rating businesses, just look at where they go.

One thing about this paper that is pretty interesting, the Googlers are arguing that the Google Maps directions log data might be even better than the GPS trail data. From the paper (in Section 6):

"GPS datasets typically contain many positions for a few users; in contrast, direction queries are derived from much larger user populations, with each user issuing a few queries."

I'd add that the GPS trail data might have a weaker signal, since the direction query states a pretty strong interest in getting to location B, where as just being in a location doesn't make it clear that it is your endpoint or that you want to be there. Also, while GPS data could be huge and cover all users, there's a lot more concern about usage and privacy of that data than of the direction query log data (at least at the moment).

Interesting - thanks for flagging this up. I haven't read the original paper yet, but a couple of things struck me...

1) I wonder to what extent the utility of this data is dependent on a driving-centric culture. For example, from a UK/European perspective, if I was heading to a country pub, I'd probably seek directions based on the specific establishment. However, if I was aiming for a pub in a city I'd probably just be looking for driving directions to the right vicinity (or more likely somewhere to park near the pub), on the basis that I'd be unlikely to be able to drive right to it (unlike many places in the US, in my experience). In fact, I'd probably not be driving at all. I wonder how often people search for walking directions? It may just be me, but this strikes me as an important cultural difference with a big potential impact on the value of the approach.

2) I totally agree with your analysis that this is way cheaper than collecting user reviews, but I wonder about your claim that this approach could be more responsive to changes in ownership (and presumably therefore a rise/fall in standards). Presumably a rise in standards would only (assuming people don't choose places at random) be reflected in greater visits (and therefore higher rankings for the business, according to this approach) off the back of a positive review indicating the change. Conversely, a fall in standards may take some time to be reflected in falling visitor numbers, depending on typical visit frequency.

I think there's also a disconnect here due to the fact that people who already know a place will rarely need directions. How do the people who haven't been before decide to go at all, and therefore look for directions? Imagine a new place opens up, gets loads of hype but is actually terrible. Working out the decay function for recommendations in these sorts of scenarios strikes me as non-trivial if using this approach.

I guess they wrote their paper before the iPhone and Android location data report stuff came out =). They might stop doing it, but they've got a heck of a lot of data with that deployed.

One lesson I learned repeatedly in the LimeWire days was that anything automated adds up really quickly. You always move with your cell phone without doing anything, whereas searching is something you have to be even minimally motivated to do.