Google Correlate: Your data, Google's computing power

Google Correlate is awesome. As I noted in Search Notes last week, Google Correlate is a new tool in Google Labs that lets you upload state- or time-based data to see what search trends most correlate with that information.

Correlation doesn’t necessarily imply causation, and as you use Google Correlate, you’ll find that the relationship (if any) between terms varies widely based on the topic, time, and space.

For instance, there’s a strong state-based correlation between searches for me and searches for Vulcan Capital. But the two searches have nothing to do with each other. As you see below, the correlation is that the two searches have similar state-based interest.

For both searches, the most volume is in Washington state (where we’re both located). And both show high activity in New York.

Maybe there’s something to that “working out helps you lose weight” idea I’ve heard people mention. Then again, another highly correlated term is [itunes movie rentals], so maybe I should try the “sitting on my couch, watching movies work out plan” just to explore all of my options.

To look at this data more seriously, we can see with search data alone that the wealthy seem to be healthier (at least based on obesity data) than the poor. In states with low obesity rates, searches are for optional material goods, such as Bose headphones, digital cameras, and red wine and for travel to places like Africa, Jordan, and China. In states with high obesity rates, searches are for jobs and free items.

With this hypothesis, we can look at other data (access to nutritious food, time and space to exercise, health education) to determine further links.

Time-based data

Time-based data works in a similar way. Google Correlate looks for matching patterns in trends over time. Again, that the trends are similar doesn’t mean they’re related. But this data can be an interesting starting point for additional investigation.

One of the economic indicators from the U.S. Census Bureau is housing inventory. I looked at the number of months’ supply of homes at the current sales rate between 2003 and today. I have no idea how to interpret data like this (the general idea is that you, as an expert in some field, would upload data that you understand). But my non-expert conclusion here is that as housing inventory increases (which implies no one’s buying), we are looking to spiff up our existing homes with cheap stuff, so we turn to Craigslist.

Of course, it could also be the case that the height of popularity of Craiglist just happened to coincide with the months when the most homes were on the market, and both are coincidentally declining at the same rate.

Search-based data

You can also simply enter a search term, and Google will analyze the state or time-based patterns of that term and chart other queries that most closely match those patterns. Google describes this as a kind of Google Trends in reverse.

Google Insights for Search already shows you state distribution and volume trends for terms, and Correlate takes this one step further by listing all of the other terms with a similar regional distribution or volume trend.

For instance, regional distribution for [vegan restaurants] searches is strongly correlated to the regional distribution for searches for [mac store locations].

What does the time-trend of search volume for [vegan restaurants] correlate with? Flights from LAX.

Time-based data related to a search term can be a fascinating look at how trends spark interest in particular topics. For instance, as the Atkins Diet lost popularity, so too did interest in the carbohydrate content of food.

Drawing-based data

Don’t have any interesting data to upload? Aren’t sure what topic you’re most interested in? Then just draw a graph!

Maybe you want to know what had no search volume at all in 2004, spiked in 2005, and then disappeared again. Easy. Just draw it on a graph.

Apparently the popular movies of the time were “Phantom of the Opera,” “Darkness,” and “Meet the Fockers.” And we all were worried about our Celebrex prescriptions.

(Note: the accuracy of this data likely is dependent on the quality of your drawing skills.)

OSCON Data 2011, being held July 25-27 in Portland, Ore., is a gathering for developers who are hands-on, doing the systems work and evolving architectures and tools to manage data. (This event is co-located with OSCON.)