Monday, November 28, 2011

What mobile location data looks like to Google

A recent paper out of Google, "Extracting Patterns From Location History" (PDF), is interesting not only for confirming that Google is studying using location data from mobile devices for a variety of purposes, but also for the description of the data they can get.

From the paper:

Google Latitude periodically sends his location to a server which shares it with his registered friends.

A user's location history can be used to provide several useful services. We can cluster the points to determine where he frequents and how much time he spends at each place. We can determine the common routes the user drives on, for instance, his daily commute to work. This analysis can be used to provide useful services to the user. For instance, one can use real-time traffic services to alert the user when there is traffic on the route he is expected to take and suggest an alternate route.

Much previous work assumes clean location data sampled at very high frequency ... [such as] one GPS reading per second. This is impractical with today's mobile devices due to battery usage ... [Inferring] locations by listening to RF-emissions from known wi-fi access points ... requires less power than GPS ... Real-world data ... [also] often has missing and noisy data.

17% of our data points are from GPS and these have an accuracy in the 10 meter range. Points derived from wifi signatures have an accuracy in the 100 meter range and represent 57% of our data. The remaining 26% of our points are derived from cell tower triangulation and these have an accuracy in the 1000 meter range.

The paper goes on to describe how they clean the data and pin noisy location trails to roads. But the most interesting tidbit for me was how few of their data points come from GPS and how much they have to rely on less accurate cell tower and WiFi hotspot triangulation.

A lot of people have assumed mobile devices would provide nice trails of accurate and frequently sampled locations. But, if the Googlers' data is typical, it sounds like location data from mobile devices is going to be very noisy and very sparse for a long time.