Posts categorized "Food and Drink"

Some behind-the-scenes comments on my recent article on New York's restaurant inspection grades; it appeared on FiveThirtyEight this Tuesday.

***

The Nature of Ratings

This article is about the ratings of things. I devoted a considerable amount of pages to this topic in Numbersense (link) - Chapter 1 is all about the US News ranking of schools. A few key points are:

All rating schemes are completely subjective.

There is no "correct" rating scheme, therefore no one can prove that their rating scheme is better than someone else's rating scheme.

A good rating scheme is one that has popular acceptance. If people don't trust a rating scheme, it won't be used. (This is a variant of George Box's quote: "all models are false but some are useful".)

Think of a rating scheme as a way to impose a structure on unwieldy data. It represents a point of view.

All rating schemes will be gamed to death, assuming the formulae are made public.

Based on that, you can expect that my goal in writing the 538 article is not to praise or damn the city's health rating scheme. My intention is to describe how the rating scheme works based on the outcomes. I want to give readers information to judge whether they like the rating scheme or not.

OCCAM Data

The restaurant grade dataset is an example of OCCAM data. It is Observational, it has no Controls, it has seemingly all the data (i.e. Complete), it will be Adapted for other uses and will be Merged with other data sets to generate "insights". In my article, I did not do A or M.

Hidden Biases in Observational Data

Each month (or week, check), the department puts up a dataset on the Open Data website. There is only one dataset available and the most recent copy replaces the previous week's dataset. The size of the dataset therefore expands over time.

Anyone who analyzes grade data up to the most recent few months is in for a nasty surprise. As the chart on the right shows, the proportion of grades that are not A, B or C (labeled O and gray) spikes up by 10 times the normal amount during the last two months. This chart is for an August dataset, and is not an anomaly. It's an accurate description of the ongoing reality.

On first inspection, if a restaurant is given a B or C, the restaurant has the right to go through a reinspection and arbitration process. During this time, the restaurant is allowed to display the "Grade Pending" sign. It appears that it can take up to four months for most of the B- or C-graded restaurants to be finished with this process. Over this period of time, many of the pending grades will flip to one of A, B or Cs. The chance that they will flip to B or C is much higher than the average restaurant (i.e. for which we don't know they have a Grade Pending).

Indeed, the proportion of As in the most recent two months is vastly biased upwards as a result of the lengthy reinspection process.

For this reason, I removed the last two months from my analysis.

How might this bias affect your analysis?

If you drop all Pending grades from your analysis (while retaining the A, B, and C grades), you have created an artificial trend in the last two months.

If you keep the last available grade for each restaurant, you have not escaped the problem at all. In fact, you introduce yet another complication: B- and C- graded restaurants have older inspection dates than the A-graded restaurants. Meanwhile, those Pending grades are still dropped.

If you automatically port this data to a mapping tool, or similar, you are displaying the biased data and the unknowing users are misled. In fact, the visualization no longer can be interpreted.

IMPORTANT NOTE: The data is NOT WRONG. Data cleaning/pre-processing does not just mean finding bad data. Much of what statisticans do when they explore the data is to identify biases or other tricky features.

The Nature of Statistical Analysis

[Captain Hindsight here.] Of course, I didn't know or guess that the Grade Pending bias would be a problem. I did the first analysis of the data using a July dataset, and by the time I was drafting the article for FiveThirtyEight, it was already August so I "refreshed" the analysis with the latest dataset. That's when I discovered some discrepancies that led me to the discovery.

This is the norm in statistical analysis. Every time you sit down to write something up, you notice additional nuances or nits. Sometimes, the problem is severe enough I have to re-run everything. Other times, you just decide to gloss over it and move on.

As others binge watch Netflix TV, I binge read Gelman posts, while riding a train with no wifi and a dying laptop battery. (This entry was written two weeks ago.)

Andrew Gelman is statistics’ most prolific blogger. Gelman-binging has become a necessity since I have not managed to keep up with his accelerated posting schedule. Earlier this year, he began publishing previews of future posts, one week in advance, and one month in advance.

Also, I have been stubbornly waiting for the developers of my former favorite RSS reader to work out an endless parade of the most elementary bugs, after they launched a new site in response to Google Reader shutting down. Not having settled on a new RSS tool has definitely shrank the volume of my reading.

I only managed to go through about a week’s worth of posts because the recent pieces interest me a lot.

Gelman links to Lior Pachter's review of what he calls "quite possibly the worst paper I've read all year".

This bit deserves further mocking: when the researchers fail to achieve conventional 5% significance, they draw conclusions based on "trend towards significance". This sleight of hand happens frequently in practice as well, where the phrase directional result is utilized.

When an observed effect, as in this case, is not statistically significant, the implication is that the signal is not large enough to distinguish from background noise. When the researcher then says “but I still see a signal”, said researcher is now ignoring the uncertainty around the point estimate, pretending that the noise doesn’t exist. The researcher is in effect making a decision using the point estimate. Anyone who has taken Stats 101 should know not to use a point estimate.

One great tenet of statistical thinking is the recognition that the observed data sample is merely one of many possible things that could have happened. The confidence interval is an attempt to capture the range of possibilities, and the much-maligned tests of significance represent an attempt to reduce such analysis to one statistic. It achieves simplicity at the expense of nuance.

This cannabis study is also a great example of what I’ve been calling “causation creep”. The authors are well-aware that they have merely found an instance of correlation (not even but just for the sake of argument), but when they start narrating their finding, they cannot help but use causal language.

The title of the paper is "Cannabis use is quantitatively associated with...", and yet the lead author told USA Today: "Just casual use appears to create changes in the brain in areas you don't want to change."

Causal creep is actually endemic in academic publishing of observational studies, and I don't want to single these authors out.

Gelman has been on this one for a while. The offensive paper looked at the correlation between hurricane damage and the gender of the names we give these hurricanes. I didn’t find it worth spending my time studying this line of research but I’m assuming that the problem is considered interesting because they claim to have found a “natural experiment” in that the gender is effectively “randomly assigned” to the hurricanes as they appear.

I have been quite irritated over the years by this type of research, encouraged by the fad of Freakonomics. Even if they did find a natural experiment, what is that experiment about? Instead of spending research hours on correlating damage with naming conventions, why not spend the precious time looking for real causes of hurricane damage? You know, like weather patterns, currents, physical phenomena, human-induced climate changes, human decisions to live in high-risk areas, etc.?

I should note that much of Steven Levitt’s original work that launched this field deal with real problems, like crime rates and . It’s just that many of his followers have gone astray.

Matt Novak debunks an article in Vox which repeats the assertion by the tech industry that new technologies have been adopted much more quickly in recent years than in the past. Vox is not the only place where you see this assertion. We have all seen variations of the chart shown on the right.

Novak puts on a statistician's hat and asks how the data came about. This type of chart is particularly prone to errors since many different studies across different eras are needed.

What Novak found: the invention date of older technologies (like TV and radio) were defined by their invention in the laboratory while recent technologies (such as Internet, mobile) were defined by their date of commercialization. Needless to say, adoption is expected to be slow when the technologies were not yet available to consumers!

Needless to say, anyone who cites this chart or its conclusion from here on out should be publicly shamed.

Gelman nicely distills one of the central messages in my Numbersense book (Get it here). All data analyses require assumptions; assumptions are subjective; making assumptions is not a sin; clarifying one’s assumptions and vigorously testing them is what make good analyses. Go read this post.

Gelman was surprised by a recent paper in which the researchers found that 42% of their sample purchased detergent on their most recent trip to the store. This reminds me of the section of Numbersense (Get it here) in which I described a study in which some marketing professors had mystery shoppers track people in a supermarket and within seconds of them placing groceries in their trolley, asked them how much the items cost. The error rate was quite shocking.

There is another big problem with this research design. People's memory of what they purchased depends on how long ago that "most recent" trip was. I also wonder how online purchasing affects this sort of study as I typically don't count going to a website as "a trip to the supermarket". It seems like some sort of prequalification is needed but prequalification always restricts the generalizability of any finding.

Andrew gently mocks both of these commonly used procedures. The discussion of outlier detection is buried in the comments section so if you are interested, you should scroll below the fold. Gelman’s annoyance with outlier detection is semantic: but important semantics, which align with my own practice. Like Gelman, I don't consider any extreme value an outlier.

Stepwise is a suboptimal procedure and Gelman prefers modern techniques like lasso. But lots of practitioners use stepwise because the procedure is “intuitive”, that is to say, one can explain it to a non-technical person without rolling their eyes. The discussion below the post is worth reading.

I will be speaking at the Agilone Data Driven Marketing Summit (link) in San Francisco on Thursday. I will be talking about hiring for numbersense. Drop by if you are in the area. Future events are listed on the right column of the blog >>>

***

I feel bad piling on the "good guys" in the sports doping spectacle but sometimes, you need someone to point you to the mirror.

Here are the breathtaking first sentences from an article in Canada's The Globe and Mail about the scarcity of positive doping results in Sochi 2014:

At the midpoint of the Sochi Games, not yet marred by a single case of doping, the IOC’s top medical official said its efforts to catch drug cheats were so successful they had scared them all away.

A week later, after the disclosure of a fifth doping case on the final day of the games, IOC president Thomas Bach cited the positive tests as the sign of success.

If you have been reading this blog, you already know the people in the anti-doping business set themselves a really low bar. The title of Chapter 4 of Numbers Rule Your World (link) contains the phrase "timid testers" for a reason.

The statement by the unnamed "top medical official" is the more shocking. If there are no positive test results, and such is considered an accurate portrayal of the doping situation, then we must believe that there are no dopers. Apparently, this official believes no athlete that has been tested doped. Not a single one.

“The number of the cases for me is not really relevant,” Bach said. “What is important is that we see the system works.”

Now, it's Bach's turn to display his ignorance of the statistics of anti-doping. As I explained years ago in the book and also on this blog, the proportion of tests that come back positive is one of the most important numbers to look at when judging the success of an anti-doping program. So far, we know that six out of 2,630 athletes tested positive, meaning the rate of testing positive is 0.23%. (Much less than 1 percent is the norm in all large international events.)

What does that mean? If one percent of athletes doped, then we should expect 26 positives if the tests were 100% accurate. Since they only caught six, at least 20 of the 26 dopers passed the test. Yes, that means over 80% of dopers passed. (And I'm only assuming one percent doping, and not allowing the possibility of false positives.)

***

This leads me to the as-yet unrecognized scandal. Lance Armstrong, Ryan Braun, Mark McGwire, Alex Rodriguez, etc. etc. None of these confirmed dopers were caught by steroids tests. In fact, all of them boasted at one point or another that a long string of negative test findings proved that they were innocent.

Rather than gloating about the "success" of anti-doping measures, they should try explaining how the most notorious dopers in sports were repeatedly given a clean bill of health.

I am a supporter of anti-doping. I just want some discussion of the false negative problem.

Here we go again. Another useless study published in a peer-reviewed journal (Mayo Clinic Proceedings) with a relatively high impact factor and promoted as "Breaking News from the Editor" to the press who then attached a sensational headline and reported it as "science".

This is what caught my eye: "Drinking more than 28 cups of coffee a week may be harmful for people younger than 55, according to a study." I saw this on USA Today (link)... but many other outlets also carried the story, including NPR, CBS, AJC, Guardian, etc.

Just the headline should sound alarm bells. Why 28 cups a week? Is it ok to have 27.5 cups a week? Why 55 years old? On your 55th birthday, should you stop worrying about the number of cups?

Of course, even that speculation betrays a causation creep. Nothing in this study, nor any study of this type, can prove a causal link between the accused food and harm.

Now let's check out the summary by the authors:

Conclusion: In this large cohort, a positive association between coffee consumption and all-cause mortality was observed in men and in men and women younger than 55 years. On the basis of these findings, it seems appropriate to suggest that younger people avoid heavy coffee consumption (ie, averaging >4 cups per day). However, this finding should be assessed in future studies of other populations.

Note that construction: "a positive association between .. and .. was observed ... on the basis of these findings, it seems appropriate to suggest that". All those weasel words. What they really mean is "we have data showing a positive association and we make an assumption that correlation = causation, therefore you should ..."

Do you think you'll learn anything in the journal paper about the biological or chemical mechanism by which coffee causes death? Take a guess.

***

USA Today commits the other sin of health reporting: failure to explain the level of harm, and the context to interpreting such. USA Today tells its readers:

Men younger than 55 who drank more than 28 cups of coffee a week (four
cups a day) were 56% more likely to have died from any cause.

56% compared to what? Turns out it's compared to men younger than 55 who do not drink coffee at all. There is a wide gap between drinking over 28 cups a week and drinking zero.

What's also missing is the error bar. According to the paper, the 95% interval is 30% to 87%. Not kidding, it's 25% in each direction.

Absent is the context for understanding what 56% means. How many additional deaths for every 10,000 heavy coffee drinkers? Amusingly, you can't figure this out even after reading the entire paper. The authors got away with presenting data in aggregate (33,900 males, 2198 male deaths, etc.) without showing age group breakouts (Where were the editors??) Stymied, I glance at their other result, the one for "all" men.

In men, those who drank more than 28 cups of coffee weekly had a 21% higher risk of dying compared with their non-coffee-consuming peers.

By the way, the error bar on this result is 4% to 40%. Now, I can't interpret this result either. The baseline death rate in the study was 6.48%. Nowhere in the paper does it break out the number of deaths by the level of coffee drinking. There is no way to know how many of those 2,198 male deaths were men who did not drink coffee at all.

While the important data on the outcomes being analyzed are not published, the authors of the paper regale us with numbers such as the N=11 people who were excluded from the population because they had a history of stroke.

So I ask again: where were the editors? how did they miss this?

***

Readers should also check out Andrew Gelman's recent rant about the excessive reliance on statistical significance, and the neglect of data quality in journal editing (link). It's not that hard to draw up a list of data required for publication.

***

I may write another post about the other issues with the analysis. But here is the most important one in case I don't get to it. This is the chart that supports their finding about men under the age of 55 who drink more than 28 cups of coffee a week. They want us to look at the blue bar on the far right of the chart.

Last week, I had dinner with my friend Cesare, who is a medical researcher. That was the first time I heard of the marketing scam known as "2% milk" or "1% millk" or "'fat free' milk".

Sure, I know what 2% milk is. I've drunk some before. But I don't really know what 2% means. You see, I was comparing that to whole milk, thinking it contains like 50 times less fat. And that is wrong.

According to Google/USDA, whole milk has 3.25% fat, and so the 2% or 1% is not as big a difference as you might think.

The good doctor also advises that the 2% or 3.25% does not tell us how much fat we have ingested - it's just a proportion of some amount of milk, which contains other things besides fat. The proper measure is calories. How many calories do we consume when we drink one cup of milk?

According to Google/USDA, a cup of whole milk has 148 calories and a cup of 2%? 124 calories. So, the difference between whole and 2% is 24 calories. In other words, an accurate description of 2% milk is that it has 16% less calories than whole millk. (It doesn't matter that not all of the calories are from fat.)

Since 24 calories is a really small number, don't feel guilty about drinking that whole milk! Especially if, like me, you only use a little milk in your coffee.

(Warning: The FDA probably does not approve of this message.)

***

Chapter 2 of Numbersense (link) digs into the "science" behind the obesity crisis. I look into the trendy assertion that obesity treatment is failing because we are not measuring obesity properly.

Over at the McGraw-Hill blog, I wrote about how to consume Big Data (link), which is the core theme of my new book. In that piece, I highlight two recent instances in which bloggers demonstrated numbersense in vetting other people's data analyses. (Since the McGraw-Hill link is not working as I'm writing this, I placed a copy of the post here in case you need it.)

Below is a detailed dissection of Zoë Harcombe's work.

***

Eating red meat makes us die sooner! Zoë Harcombe didn’t think so.

In
March, 2013, nutritional epidemiologists from Harvard University circulated new
research linking red meat consumption with increased risk of death. All major
mass media outlets ran the story, with headlines such as “Risks: More Red Meat,
More Mortality.” (link) This
high-class treatment is typical, given Harvard’s brand, the reputation of the
research team, and the pending publication in a peer-reviewed journal. Readers
are told that the finding came from large studies with hundreds of thousands of
subjects, and that the researchers “controlled for” other potential causes of
death

Zoë Harcombe, an author of books on obesity, was one
of the readers who did not buy the story. She heard that noise in her head when
she reviewed the Harvard study. In a blog post, titled “Red meat &
Mortality & the Usual Bad Science,” (link)
Harcombe outlined how she determined the research was junk science.

How
did Harcombe do this?

Alarm
bells rang in her head because she has seen similar studies in which
researchers commit what I call “causation creep.”(link)

She then
reviewed the two studies used by the Harvard researchers, looking especially
for the precise definition of meat consumption, the key explanatory variable. She
discovered that the data came from dietary questionnaires administered every
four years (this meant subjects who didn’t answer this question would have been
dropped from the analysis). All subjects were divided into five equal-sized
groups (quintiles) based on the amount of red meat consumption. Surprisingly,
“unprocessed red meat” included pork, hamburgers, beef wraps, lamb curry and so on. This part is
checking off the box; it didn’t reveal anything too worrisome.

Harcombe
suspected that the Harvard study does not prove causation but she needed more
than just a hunch. She found plenty of ammunition in Table 1 of the paper. There,
she learned that the cohort of people who report eating more red meat also
report higher levels of unhealthy behaviors, including more smoking, more
drinking, and less exercise. For example,

The
researchers argue that their multivariate regression analysis “controlled for”
these other known factors. But Harcombe understands that when effects are
confounded, it is almost impossible to disentangle them. For instance, if you're comparing two school districts, and one is in a really rich neighborhood, and the other in a poor neighborhood, then race and income will be confounded and there is no way to know if the difference in educational outcomes is due to income or due to race.

Next,
Harcombe looked for data to help her interpret the researchers’ central claim:

Unprocessed and processed red meat intakes
were associated with an increased risk of total, CVD, and cancer mortality in
men and women in the age-adjusted and fully adjusted models. When treating red
meat intake as a continuous variable, the elevated risk of total mortality in
the pooled analysis for a 1-serving-per-day increase was 12% for total red
meat, 13% for unprocessed red meat, and 20% for processed red meat.

Her
first inquiry was about the baseline mortality rate, which was 0.81%. Twenty
percent of that is 0.16% so roughly speaking, if you decide to take an extra
serving of processed red meat every day, you face a less-than-2-out-of-1000
chance of earlier death. (Whether the earlier death is due to the red meat or
just more food consumed each day is another instance of confounding.)

This
also raises the issue of error bars. As Gary Taubes explained in his response
to the red-meat study (link), serious epidemiologists only pay attention to
effects of 300% or higher, acknowledging the limitations of the types of data
being analyzed. The 12- or 20-percent effect does not give much confidence.

The
researchers are overly confident in the statistical models used to analyze the
data, Harcombe soon learned. She was able to find the raw data, allowing her to
compare them with the statistically adjusted data. Here is one of her
calculations.

The
five columns represent quintiles of red meat consumption from lowest (Q1) to
highest (Q5). The last row (“Multivariate”) is the adjusted death rates with Q1
set to 1.00. The row labelled “Death Rate(Z)” is a simple calculation performed
by Harcombe, without adjustment. The key insight is that the shape of
Harcombe’s line is U-shaped while the shape of the multivariate line is
monotonic increasing.

The
purpose of this analysis is not to
debunk the research. What Harcombe did here is delineating where the data end,
and where the model assumptions take over. One of the themes in Numbersense is that every analysis combines
data with theory. Knowing which is which is half the battle.

At
the end of Harcombe’s piece, she checked the incentives of the researchers.

Harcombe
did really impressive work here, and her blog post is highly instructive of how
to analyze data analysis. Chapter 2 of Numbersense
looks at the quality of data analyses of the obesity crisis.

When you hear about Big Data, you almost always hear about
the supply side: Behold the data in un-pronounceable units of bytes! Admire the
new science inspired by all the data! Missing from this narrative is the
consumption side. A direct consequence of Big Data will be the explosion of dataanalyses—there
will be more people producing more data analyses more quickly. This will be a
world of confusing and contradictory findings.

In my new book, Numbersense, I argue that the ability to analyze and interpret these data
analyses will give one a competitive edge in this world of Big Data.

Numbersense is the noise you hear in your head when you see
bad data or bad analysis. After years of managing teams of data analysts, I’ve
learned that what distinguishes the best from the merely good is not math
degrees or computer skills; it is numbersense.

Numbersense is an intangible quality that you can’t teach in
a classroom. The best way to pick it up is by learning from people who have it. For this blog post, I selected two great analyses of data
analyses that have impressed me recently. These are highly instructive examples.

***

Eating red meat makes us die sooner! Zoë Harcombe didn’t think so.

In March, 2013, nutritional epidemiologists from Harvard University
circulated new research linking red meat consumption with increased risk of
death. All major mass media outlets ran the story, with headlines such as “Risks:
More Red Meat, More Mortality.” (link) This high-class
treatment is typical, given Harvard’s brand, the reputation of the research
team, and the pending publication in a peer-reviewed journal. Readers are told
that the finding came from large studies with hundreds of thousands of
subjects, and that the researchers “controlled for” other potential causes of
death.

Zoë
Harcombe, an author of books on obesity, was one of the readers who did not buy
the story. She heard that noise in her head when she reviewed the Harvard
study. In a blog post, titled “Red meat & Mortality & the Usual Bad
Science,” (link)
Harcombe outlined how she determined the research was junk science.

She
knows this type of research methodology rarely if ever delivers conclusive
evidence of causation. Then, she found support from a data table included in
the research paper. The table shows that the cohort of people who report eating
more red meat also report higher levels of unhealthy behaviors, including more
smoking, more drinking, and less exercise. Thus, the increased risk of death
observed in the study could have been explained by factors other than red meat
consumption.

For a full dissection of Harcome’s amazing post, please click here.
Chapter 2 of Numbersense looks at the
quality of data analyses of the obesity crisis.

In February, 2013, Netflix, ever the
media darling, premiered House of Cards,
a re-make of the successful British television show, their second foray into
producing original content for its tens of millions of subscribers. Netflix
executives regaled the press with stories of how Big Data analysis took the
risk out of their $100 million decision.

Andrew Leonard, the technology
reporter for Salon.com, gobbled up the Netflix story, even interpreting it as a
“symptom of a society-wide shift.” (link) Like
other news analysts, Leonard was convinced by the “pure geek wizardry” used to analyze
mountains of data collected from Netflix customers. The machine, we’re told,
decided that David Fincher should be the director and Kevin Spacey, the star.
From here, it is a short trip to the lala land of viewers as puppets with
machines as the overlord.

This analysis aroused the skeptic
in Felix Salmon, the finance blogger for Reuters. In his blog post, “Why the Quants
Won’t Take Over Hollywood,” (link) Salmon
raised other factors that affect the box office, including billions spent on
marketing and publicity, the quality of the writing, the sociopolitical
climate, the complex relationship between originals and remakes, and the poor
track record of predictive modeling in Hollywood. On this last point, Salmon exhibits
a keen sense of the limitations of science, speaking of
“impossible-to-formulate cocktail of creativity, inspiration, teamwork, and luck.”

Chapters 4 and 5 of Numbersense explains how you should
judge predictive models used by marketers.

***

When their respective blog posts surfaced, Harcombe and
Salmon were lone voices vetting carefully the claims based on other people’s data
analyses. Their well-honed numbersense allows them to stand firm in the face of
mountains of data, worship of high science, formidable-sounding technical
jargon, and academic reputations. The problems with the original research are
far from obvious. The point is not to debunk these studies—no data analysis is
ever infallible—but to figure out for yourself what is credible, and what is
junk.

It strikes me that in medicine, we are stuck with simplistic models - models that use one variable only, and are linear in the response. In short, we are told X results in Y, and the more X, the more Y. Real life often does not cooperate, but many people in medical research hold on to their models for dear life.

Exhibit 1 is the disappearing of unhelpful data used to reject the non-linear relationship between BMI and mortality, which I discussed here. There are many myths associated with the obesity epidemic. It is a surprise for me to learn that how inefficient physical exercise is for weight loss.

Exhibit 2 is the case of vitamin overload, as described in this NYTimes column. Here are the juicy bits:

In December 1972, concerned that people were consuming larger and larger quantities of vitamins, the F.D.A. announced a plan to regulate vitamin supplements containing more than 150 percent of the recommended daily allowance. Vitamin makers would now have to prove that these “megavitamins” were safe before selling them. Not surprisingly, the vitamin industry saw this as a threat...

Industry executives recruited William Proxmire, a Democratic senator from Wisconsin, to introduce a bill preventing the F.D.A. from regulating megavitamins.

A little more than a month later, Mr. Proxmire’s bill passed by a vote of 81 to 10. In 1976, it became law. Decades later, Peter Barton Hutt, chief counsel to the F.D.A., wrote that “it was the most humiliating defeat” in the agency’s history.

The studies cited in the article show that too much of a good thing kills.

Exhibit 3 is the demonization of salt. It's completely ingrained in our brains that salt is bad for us. However, the evidence is decidedly mixed. New York Times has a summary recently (link). Years ago, I remember reading the late David Freedman's article coming to the same conclusion. (D.A. Freedman and D.B. Petitti. “Salt and blood pressure: Conventional wisdom reconsidered.” 2001.)

A key sentence in the NYT's article is this:

Until about 2006, almost all studies on salt and health outcomes relied on the well-known fact that blood pressure can drop slightly when people eat less salt. From that, and from other studies linking blood pressure to risks of heart attacks and strokes, researchers created models showing how many lives could be saved if people ate less salt.

This is also typical of medical studies, the assumption of transitivity between studies. Anyone trained in statistics would have left transitivity at the door. When each number is not precisely estimated but has an error bar around it, transitivity is not automatic. Notice that the data did not prove the health benefit of less salt; it's the modelsthat assume such benefits.

In my new book, I have a chapter on interpreting the statistics of obesity. Andrew Sullivan (link) recently pointed to a Nature article discussing an aspect of the controversy around these numbers.

The bone of contention is the shape of the mortality curve. It has been thought that the curve is monotonic increasing, meaning that the higher your BMI, the higher the mortality rate. But survey data in the U.S. now show that the curve is probably U-shaped: mortality rates are high for both obese and thin people. Overweight (less than obese) people paradoxically seemed to live longer than those with "normal" weight. This last observation has driven some people nuts.

The article focused on two Harvard researchers who organized a conference specifically to attack a CDC paper demonstrating the U-shaped curve. This is the crux of their argument:

When the researchers excluded women who had ever smoked and those who
died during the first four years of the study (reasoning that these
women may have had disease-related weight loss), they found a direct
linear relationship between BMI and death, with the lowest mortality at
BMIs below 19.

Excluding portions of a sample from analysis is a dangerous game, and should be heavily discouraged. It's one thing to adjust the data; it's another thing to remove data completely. Notice that what was removed weren't outliers, that is, data that might be incorrect and so extreme as to dominate the outcomes. They removed data specifically to conform to their model of the world.

First, they removed smokers because "smokers tend to be leaner and die earlier than non-smokers". This sounded like smokers who die earlier are on the thin side of the curve; removing them has th effect of straightening the curve.

The second cut is even more egregious. How can there be any justification for removing people who died during the first four years when the study's primary metric is death rate? They claimed reverse causality.

The most important reason why you should never drop large chunks of data in a systematic way is that your conclusions are now limited to the group that hasn't been dropped. Since there are no smokers in your sample, you cannot make a statement that applies to the general population. And yet, these researchers seem to have done so.

***

Later on in the article, the journalist repeats the nonsense about how using BMI is a problem. I have previously written about this topic here.

On a related note, a visiting professor at NYU has been making the news, having made insulting comments about "fat PhD applicants". Somehow, the field of evolutionary psychology has attracted many crazies.

Imagine you are locked up in a hospital room with a bed and plenty of food. What do you think you'd be doing when you aren't sleeping?

***

Well, some researchers (link) discovered that you'd be eating the food. What a surprise.

The New York Times then saw it fit to report this result as "Lost Sleep Can Lead to Weight Gain", which isn't as bad as the lead given on the front page of NYTimes.com "To Put on Pounds, Just Sleep Less".

Of course, the researchers never proved that sleep deprivation is a cause of obesity. In the original press release, the lead scientist explicitly noted: "Just getting less sleep, by itself, is not going to lead to weight
gain. But when people get insufficient sleep, it leads
them to eat more than they actually need."

It's a scandal that the prestigious paper would write a headline that directly contradicts what the researcher said.

***

I also have a problem with the link between insufficient sleep and eating more. Here is their description of the experiment:

For the study, researchers monitored 16 young, lean, healthy adults who
lived for about two weeks at the University of Colorado Hospital, which
is equipped with a “sleep suite” for controlling sleep opportunities —
by providing a quiet environment and by regulating when the lights are
on and off — and a sealed room that allows researchers to measure how
much energy participants are using based on the amount of oxygen they
breathe in and the amount of carbon dioxide they breathe out.

All participants spent the first three days with the opportunity to
sleep nine hours a night and eating meals that were controlled to give
participants only the calories they needed to maintain their weight in
order to establish baseline measurements. But after the first few days,
the participants were split into two groups: one that spent five days
with only five hours to sleep in and one that spent five days with nine
hours of sleep opportunity. In both groups, participants were offered
larger meals and had access to snack options throughout the day ranging
from fruit and yogurt to ice cream and potato chips. After the five-day
period, the groups switched.

The participants who were forced to sleep less spent more time eating.

***

The NYT column actually described another study which I think holds more promise. It tries to connect sleep deprivation with fat cell biology. The sample size is really tiny, and a lot more work is needed.