If you take the smoking rates by state, and throw them into the Google Correlate hopper, you can see the 100 search terms that are most highly correlated with that reported smoking behavior. That is, the terms that are most likely to be used in high-smoking states and least likely to be used in the low-smoking states.

Is the result just a lot of noise? Maybe, but I don’t think so. Here are the smoking-related terms in the top 100:

camel no 9

cigarette coupon

cigarette coupons

marlboro coupons

my time to quit

safe cigarettes

stopping smoking

time to quit

fire safe cigarettes

ways to stop smoking

So that’s good for face validity — a list of random search terms isn’t likely to have all those smoking terms on it.

But after the smoking terms, the thing that jumps out is the health-related terms. We know from the Google flu tracker that people search for their symptoms. So these caught my eye.

Here is a screen shot of the first page of results:

I selected “stages of copd” as the term to map. The map on the left is the smoking rates; the one on the right is the relative frequency of searches for “stages of copd.” That is, chronic obstructive pulmonary disease, a nasty disease the most common cause of which is smoking.

Here is the complete list of health-related terms among the top-100 correlates with smoking rates:

Lymph node swelling, which is implicated in the jaw and neck searches, most often reflects infection — which smoking causes.

How strong are the connections? They’re not the strongest I’ve seen on Google Correlate. The “stages of copd” search is correlated with smoking rates at .77 on a scale of 0 to 1. It’s not uncommon to find correlations of .93 (which is the relationship between “quiche” and “volvo v70 xc”).

But considering the smoking rates come from a sample survey (the National Survey on Drug Use and Health) which includes random error, and states are somewhat arbitrary geographic units, that correlation seems pretty high to me. Here’s the scatterplot:

What is the correlation causality story here? I can’t say. But the simplest explanation is that these are the terms smokers (and maybe those who know or care for them) are most likely to Google relative to non-smokers — not that they are the most common searches smokers do, of course, but the searches that differentiate them from non-smokers. The simplest explanation is the best place to start.

I like this list of conditions because in my experience smokers sometimes have the attitude of “you have to die of something.” But it’s not just the chance of dying that smoking increases — it’s a lot of possible forms of suffering along the way.