Posted
by
timothy
on Thursday February 14, 2013 @11:13AM
from the peak-flu-is-when-supplies-decline dept.

ananyo writes "When influenza hit early and hard in the United States this year, it quietly claimed an unacknowledged victim: one of the cutting-edge techniques being used to monitor the outbreak. A comparison with traditional surveillance data showed that Google Flu Trends, which estimates prevalence from flu-related Internet searches, had drastically overestimated peak flu levels. The glitch is no more than a temporary setback for a promising strategy, experts say, and Google is sure to refine its algorithms. But with flu-tracking techniques based on mining of web data and on social media taking off, Nature looks at how these potentially cheaper, faster methods measure up against traditional epidemiological surveillance networks." Crowdsourcing is often useful, but it seems to have limits.

Modern epidemics and pandemics are almost ALWAYS overestimated by those predicting them. In part, this is because those predicting them often have a vested interest in making them sound a scarier than they actually are. So you get a lot of this "The sky is falling! Weessa all gonna die! Give me more research money!" screaming from epidemiologists and those in related fields.

In part, this is because those predicting them often have a vested interest in making them sound a scarier than they actually are.

Financial incentive? In science?

Well, yes. Scientists are people too, and they want the same thing most of us want: to put together enough of a money pile to leave the rat race adn go do what we want for a change, without having to make it profitable and thus bending it to the lowest common denominator (LCD).

Michael Crichton's State of Fear reveals this tendency in our media and

People mass-failing the iterated researcher's dilemma (similar to an iterated prisoner's dilemma, but related to funding rather than sentencing) does not require a conspiracy. It just requires that enough people know nothing of game theory and have a poor grasp of cost/benefit analysis.

Dismissing observations of common human behavior as some sort of conspiracy is simply obstructive to any process of understanding.

Actually, that's kinda the goal. When it comes to the expenditure of time and money, if you don't come in with a Chicken Little, people are just going to ignore you. With the Chicken Little, you get people to fall in line and the effects of major epidemics or problems are mitigated.

Slashdot-friendly example: Today, people will say that the Y2K issue was completely blown out of proportion. Airplanes didn't fall out of the sky, bank accounts were there on Jan 1, 2000, and everything was just fine. Of course, that ignores the teams of coders working in even-then-archaic coding languages to adapt old software to work beyond their expected lifespan. Who knows what Y2K would have been had we just done nothing, but we're all better off with the purse-string-holders getting concerned.

It's only a problem when it causes people to panic (like yelling "fire" in a crowded theater, then defending yourself with "Well, it got them to think about fire safety, didn't it?"). If it just causes Cleatus Dipshit to wash his hands more and cover his goddamn mouth when he sneezes, I'm okay with it. If it causes people to sell their houses and empty their bank accounts to buy underground bunkers and canned goods, then we have a problem.

Of course, there is also the issue of fraud when it comes to public grant money. I don't like the idea of a scientists who are knowingly exaggerating their findings taking grant money away from those who aren't.

the teams of coders working in even-then-archaic coding languages to adapt old software to work beyond their expected lifespan...would know.

But where are their stories?

I'm asking out of curiosity - not necessarily because I'm sceptical. Wikipedia does have some stories of Y2K-bug related issues (one even fatal, although I think more than just Y2K-bug failed there), but there doesn't seem to be a reference to people stating they

Is there anyone who was working in software in 1999 who WASN'T spending a lot of time considering Y2K issues? We had to upgrade most of the software stack from our servers at the time and put in the approved two-digit rounding code to the UI date parsing. Not exactly heroic, but I'm not aware of a single piece of server software that required no modifications for Y2K. Everyone was involved in a thousand tiny ways.

My guess is that the reason there's not a lot of blogs and personal stories is that it was m

There's not a lot of stories because it was all pretty boring stuff. A lot of setting the clock ahead and redo the QC tests, punch out a few bugs that crop up and test again, just like any code. Where's the stories of coders getting Turbo Tax ready for next year's new rules? It's just not that exciting and most of it happened in industries that typically say nothing about their development efforts in the first place.

There were stories at the time of mid to upper management people being brought in as develop

Actually, I think it's the media that has a vested interest in hyping the story. The interview five people, and the one that says "WE'RE ALL GONNA DIE!" is the one that gets quoted.
They get paid for how many eyeballs see the page, not for how accurate their reporting is.

No. You only hear in the media about epidemic and pandemic estimates of the upper range. The prediction "we'll have 30,000 deaths in 2013 due to the normal flu" wouldn't make any headlines, because every year, about 30,000 die after getting sick with the flu. But most predictions of epidemics and pandemics are exactly like this -- it's just the expected behaviour.
There is a big difference between the average estimates coming from the scientists and the single highest estimates reported in the media. And of course, "everything is normal" is no news, thus it doesn't get reported that often. Information is the inverse of probability, and reports about highly improbable events have higher information content than reports about average events. Highly improbable events happen and contradict our expectations, and thus it is important to report them. Normal events happen, but we were expecting them anyway, thus there is no point in reporting them.
Your "ALWAYS" is probably more due to confirmation bias on your side than anything else.

I was working at the NHS (National Health Service in the UK) a few years back when a 'flu pandemic' was being predicted, bird flu I think. Anyway, as a developer there I was pulled into a meeting to discuss plans to create some sort of emergency website with contact details if things went really bad.

Whilst we waited for various people to get on the phone etcetera the guy next to me, a very senior doctor in the service, started moaning about why he was there. To paraphrase and the figure I use is one I just

I think you need to RTFA, because this has nothing to do with what you're talking about (i'm not entirely sure what point you're making, because what you appear to be saying is complete nonsense, and i'd hold you to a higher standard than that).

google has most likely fallen prey to "man flu" syndrome, where a sniffle and a headache is confused with actually having the flu, which can kill.

Computer modeling is a powerful technology that should not be underestimated.

However, it should also not be overestimated.

When the "real world" has millions of convergent factors responsible for an event, computer models can sometimes capture a few thousand. Based on those, a simulation is created that suggests a certain outcome. But it may be using less than 1% of the necessary data.

This is like making architectural models out of child's blocks and then being surprised when the building falls down after it is eventually made. There are issues of scale in addition to data that can reveal periodistic or epicyclic patterns that cannot be modeled in a linear method.

The professionals will provide the usual range of predictions, creating a more-or-less gaussian distribution around the actual result, and then the media will self-select the ones on the highest part of the curve because that's what keeps people watching the news.

In short, a system that learns from abnormal circumstances will no longer work as well under normal circumstances. This year's flu outbreak didn't follow previous models, so Google's application of those models was inaccurate... but we'll blame Google for it anyway, and cast shame upon them for being so terribly wrong.

Of course, the article is much better, delving into other systems that also predict and monitor flu outbreaks, and why they were or were not correct. TFA is really about the difference between traditional reporting sources (as from doctors' offices) and newer data-mining approaches (harvesting from searches and Twitter).

Is it an intrinsically flawed model? No.
Has it regularly been significantly better than other models? No.
Has it regularly been significantly worse than most other models? No.
Do experts actually expect it to be any better that it is? Not really.

As should be obvious by the "screw you, Slashdot" comment in my original post, I'm actually just ranting against Slashdot's non-existent editorial process. The second half of the article is focused on Twitter-scraping algorithms, but the summary makes no mention of

I've had two different colds in the last month...which is very very odd. One of them was quite powerful. Many people would call it the flu. Some out of ignorance and others to make their situation sound worse than it really is...for pity. Others will say that to solidify to their bosses that they aren't going to work.

I had the flu this season, I was laid out in bed for 4 days. Didn't eat anything, drank a little orange juice. Bundled up in a wool hat under a pile of blankets, drenched in cold sweat. I haven't been that sick since I was a kid.

"Well, there is the continued flu outbreak on the east coast, with the biggest concentration in Boston. There seems to be a ringworm outbreak in pets in the southwest, and our numbers show, and I caution you this is probably a 60% overestimate, the apparent nationwide removal of 3.8 million brains due to unspecified causes."

It's only a matter of time before a real flu epidemic rages though the world. The trick with flu is the balance between it's virulence and it's morbidity. Flu's that come by that are virulent AND overly morbid will burn out. People will die too fast to spread the disease. This is why there has been no world wide outbreak of Ebola... it kills so fast, it can't spread. A mild flu (low morbidity) can spread far and wide, because it doesn't kill the majority of its hosts, thus allowing them to pass the dis

Google announces they're tracking the flu (hey everyone, come see a map that will tell you how bad the flu is in your area!), Larry Page announces he's offering free flu shots [mercurynews.com] to all kids in the Bay Area, and Google announces it's launching a flu shot locator [latimes.com]. Of course searches for "flu" and "influenza" are going to increase. That will throw off the accuracy of your model. What they're really measuring is this: "people who are thinking about the flu and proactively reaching out to learn more."

You don't get free flu shots in the US? I'd be curious to see a cost/benefit analysis - but then I suppose when hospital rooms cost the patient money there's little motivation for the government to try to keep you out of them.

In my State the death rate from influenza is about 1.3 per hundred thousand. Which just happens to be the same as our homicide rate.

The thing I wonder about is if the CDC is accurately estimating the number of people who Google and decide, "yup, I've got the flu, I've got no money for a doctor's visit, no insurance, and certainly no money for anti-virals" and those cases never make it into any surveillance systems. Are they accounting for the real unemployment rate wh

Of course the sensational news story of this past winter was the rampant outbreak of "flu" which suddenly has become one of the biggest health scares the world has ever seen.

Google needs a sensational hyperbole filter on their Internet scrapes, something to blow past the kind of rampant proliferation of "news" not based on fact or reality, but only reported to drive web hits or broadcasts has become common place these days. Some reporter goes to the ER of a hospital, sees a room pack of sniffling, coughing

Personally I find the mania detector quite useful, and hope it can be used to expose other mass-illusions such as hybrid cars being positive for the environment and guns in school being a good thing. I know I won't need a flu shot, but I want to know how many crazy people I should prepare to disbelieve and avoid any given day.

Google Flu has never been used to officially declare a flu outbreak. It's a neat tool, and it has been successfully used in retrospective studies, but until it actually helps us prepare for a flu outbreak in ways above and beyond what traditional surveillance already does, it will continue to just be a neat tool and not a useful one. The same goes for the Twitter flu prediction models. These tools are cool, but unless people actually do things differently to prepare for an outbreak based on their predict