Posts categorized "Story time"

James Kwak (at Baseline Scenario) talks about the link between CEO pay and business performance here. He cites results from a paper by Cazier and McInnis that "people from successful companies don’t deserve the pay premium because the higher the premium they are able to command, the less well they are likely to do." So far so good.

Then James launched into the following:

This should not be too surprising. The more of a superstar someone is at Company A, the more likely the board of Company B is to overlook all the things that make her a bad fit for Company B—like not having experience in the industry, or with the new company’s customer base, or having led Company A through a different phase of its lifecycle than Company B, or not having the skills that Company B needs at that point in time, or any number of other things. The more reasons for concern that Board B overlooks, the more likely the new hire is to do badly. In the end, you get something vaguely like the Peter Principle: the more successful Company A is, the more market power its CEO has, and the more likely she is to be overpaid to be CEO of a company she is not qualified to lead.

This paragraph contains a dozen assertions that have no support from the Cazier-McInnis article. These may be interesting ideas but they are pure speculation on the part of James. The trouble with such ex-post explaining is the narrative fallacy that Nassim Taleb likes to talk about. If Cazier and McInnis were to discover that CEO pay is justified by future business performance, there would have been another dozen assertions one could come up with to "explain" the correlation.

A reader of my blog, Joran E., pointed me to this great article (by Ross Tucker) that covers one of the newer anti-doping measures (the biological passport), which links to this recent NYT article on two Italian cyclists found guilty of doping. So while I was researching this latest development, I came across the latest legal maneuvres in the case of Alberto Contador, the Spanish cyclist and multiple winner of Tour de France who tested positive after last year's victory, and subsequently blamed a contaminated steak (I mentioned his case here last year).

Anti-doping provides a perfect back drop to revisit all five statistical concepts that form the spine of my book, Numbers Rule Your World.

***

The most potent form of doping these days is human growth hormones (HGH), EPO and similar compounds that have the characteristic of occurring naturally so that labs must seek to separate dopers from people who have "natural highs". By contrast, for compounds that don't occur naturally, such as clenbuterol that ensnared Contador, even minute amounts can be proof of wrongdoing.

In order to know what level of a compound is "unnatural", statisticians need to establish what is natural. This is the concept behind Chapter 1: we calculate the "average" (natural) value, but focus on examining variations around the average.

Admitting that the natural value is not uniform across all people, statisticians determine different averages for different "types" of people; the simplest such subgroups would be male/female, and age groups. The biological passport takes this idea to the extreme: each individual athlete is tracked over time to establish his or her own average. We just put into practice the concept behind Chapter 3, which is to avoid lumping together things that are different.

The cases of the two Italian cyclists represent the first two in which athletes have been punished based on evidence from the biological passport. Previously, the enforcers need a failed drug test or a police bust to convict dopers.

Readers who like the materials in the Conclusion chapter (as related to Chapter 4) on the hematocrit test should definitely read Tucker's article for details on how the biological passport works.

***

The developments in the Contador case are very discouraging: the Spanish cycling federation showed an unwillingness to expose the biggest star of the sport, first by assessing a one-year ban (when the norm is two years), and most recently, overturning that shortened punishment. USADA, the US anti-doping body, expressed its concern here, and the reversal of the ban is under appeal.

The Spanish authority accepted the Contador camp's explanation of unintentional consumption of tainted beef as the reason for testing positive. Statisticians who believe in the logic of hypothesis testing will find such a conclusion absurd.

Let's walk through how we apply the logic as described in Chapter 5 of Numbers Rule Your World to this situation. Assuming that Contador did not dope, what is the chance that minute amounts of clenbuterol would be found in his body? Unfortunately for Contador and other athletes failing this drug test, the chance is vanishingly small.

Like most accused dopers, his camp did not challenge the presence of clenbuterol; they merely offered an alternative theory for why it was there. A large number of coincidences had to occur in order for their theory to be believed: beef had to be taken from Spain into France to serve Contador, and only Contador (not any of his teammates); a different source of beef must have been used on other days during the Tour on which Contador ate beef (since he tested negative on most other days of the tour); he was one unlucky fellow since anti-doping tests have high false negative rates in general, and he managed to test positive on that one time he ate the contaminated beef; he was also extremely unlucky since Europe banned the use of clenbuterol to raise cows in the 1990s, and the beef he ate on that one occasion had to have come from an unscrupulous farmer violating the ban.

Statisticans would politely listen to all that, and declare "rare is impossible". It's much easier to believe that he was doping. (We would admit that there is a miniscule chance that the conclusion is incorrect -- the chance is precisely that of those coincidences occurring.)

***

Why does the scientific process disintegrate into this sort of he-said-she-said argument?

The concept behind Chapter 2 proves useful here. The statistical model that links the biological passport and/or the drug test to doping is one based on correlation, not causation. The passport or drug test does not provide direct evidence of doping (unlike a police bust). But as I point out in the book, correlational evidence can be powerful, and has been profitably used in all kinds of decisions. Because clenbuterol is not produced naturally in the human body, this test result is very close to causal evidence; it's less secure for things like EPO and HGH.

It's just more complicated when causal evidence is unavailable because people can now advance all sorts of hypotheses to explain the correlation. We then get story time, a phenomenon I frequently discuss on my blog. I'm happy to hear the stories but one must seek evidence to support these stories.

In the Contador case, for example, I'd like to see evidence that the steak was eaten, receipts from the vendor who imported the beef, documentation of which farm raised the cow, inspection of the farm to confirm that it used clenbuterol, traceback of beef from that farm to find the presence of clenbuterol, etc. In none of the reports on this case have I seen any of this evidence, and more disturbingly, the supporters of Contador don't appear to be asking any such questions. (See, for instance, Christian Josi on Huffington Post.)

***

Why would statisticians accept the chance of falsely accusing a clean athlete, however small that chance is? This is because we know that there is no such thing as a perfect test. The only test that will never yield a false positive is the test that never issues any positive results!

We already accept this type of situation in the Western legal system. A criterion of "beyond reasonable doubt" in the courts does not guarantee no wrongful convictions. In fact, thanks to the work of groups such as the Innocence Project, we know that some unfortunate people are wrongfully convicted, sometimes for long sentences for grievous crimes they did not commit.

As explained in my book (which I won't repeat here), the real issue in anti-doping is not about false positives but about false negatives. I fear that the entire system is so lenient toward dopers that they would take the (small) risk of detection. I'll make a case for this in a future post.

I have often grumbled about "story time!", the practice of spinning grand stories based on tiny morsels of data. It's not that I disapprove of story-telling per se -- it is that the story-teller has got to find evidence to support his/her stories.

A few days ago, I dissected the Trefis financial model used to support a $100 billion valuation for Facebook. A handful of aggressive assumptions must be believed to make it happen. So it is very pleasant to find in Business Insider some actual data to help us assess the credibility of some of these assumptions.

This chart gives the clickthrough rate (CTR) and cost per click (CPC) of Facebook ads.

For those not in this industry, CTR is the number of ads that get clicked on divided by the number of ads shown to Facebook users; and CPC is the average dollars paid by advertisers to Facebook for each click on their ads.

The chart does not provide per-advertiser data. Instead, advertisers are grouped by the industry they are in (health care, internet, etc.), and the aggregate results are shown.

The one thing that should jump out at us is the range of clickthrough rates: it's mostly in the range of 0.01 to 0.1. (The last one -- Tabloids and Blogs -- seems mislabeled, and the last two rows are sufficiently different from the rest that one would want to check the numbers again.)

Mind you, that is 0.01% to 0.1%. What does 0.01% mean? Yes, that's 100 clicks per 1 million ads shown to Facebook users. (My friend Augustine has long ago pointed out Facebook's abysmal metrics, relative to other advertising platforms. Look here for his perspective.)

***

Now, put yourself in the shoes of say Pfizer showing ads to Facebook users. Clicks don't equal revenues. Only some proportion of clicks would turn into sales. For illustration, say 5% of clicks lead to sales. With 100 clicks, they get 5 sales. They have to show 1 million ads to get 100 clicks. The clicks cost them $130 according to the data in the chart. If the value of each sale is more than $130/5 = $26, then Pfizer just about breaks even on the ads.

What's on the advertiser's mind?

One strategy is to flood Facebook with ads. More ads mean more clicks, even if the clickthrough rate is tiny. Perhaps unexpectedly, this sort of tactic works only to a limited extent. It bumps up against the law of diminishing returns. The clickthrough rate typically falls say when you double the number of ad impressions.

Another strategy is to use statistical models to selectively show ads only to Facebook users most likely to click on them. This raises the clickthrough rate. What it doesn't solve is the "quality" of Facebook users, or put differently, their tendency to pay attention to advertising.

***

Now, let's fancy yourself the person trusted to build these statistical models. You are trying to predict who's going to click on a given type of ad and who's not. What you have at your disposal is historical data on who got shown what ad, and whether they clicked or not. (You would typically want to grab any other data you can get your hands on, such as what the user has been doing on Facebook recently. You can get more creative, such as what Facebook has been secretly doing, described here.)

Let's say you are given the data on 1 million ads that were displayed. According to the above, you will find 100 clicks in this data... and 999,8999 999,900 non-clicks. Now, assume that you discover that there are some commonalities among the 100 users who clicked on the ad. Say, 50 out of the 100 signed on to Facebook after midnight, and live in the East Coast. That's a very strong signal.

Now, how many users would you find among the non-clicks who signed on after midnight and live in the East Coast? Oops, there would be multiples more than 50 such cases who did not click on the ad. This is all due to the tiny clickthrough rate.

That, in brief, is the challenge of Web analytics. If this excites you, there are lots of opportunities out there.

Reader John M. sent in this link, so we make it three days in a row: another Times article which pretends to provide data insights but in fact just spins a story.

***

The first thing that bothered John was the title of the article. At first, I didn't know what he was talking about because the headline I saw was "Parenting by Gays More Common in the South, Census Shows". I was a little mystified. How can one object to a Census finding? Later, I discovered that the same article has an alternative title, as shown below:

Compare the header to the lede shown below the header. If Jacksonville were the South, and population size were hospitality, then well, the wine came from those grapes. Poring over the entire article, I could not find a single statistic from the Census that provides a direct measurement of "welcoming"-ness. It would be great to know the distribution of gay parents across the country, the proportion of gay households, survey results of gay people, and so on.

***

I'm not saying Ms. Tavernise did not have an engrossing human interest story on her hands. I just don't think she needs to drag Census data into the narrative. Doing so weakens the story because the data are not relevant.

One can summarize the whole story thus: Southern states (and using Florida as an example) have become more welcoming to gay families in recent years. Even some churches now welcome gay worshippers. These families still face barriers, such as economic hardship, lack of legal status, and awkward situations in which they lie about their sexual identities. Many of the gay parents have kids, often from their prior heterosexual marriages.

Whenever "data" appear, the narrative loses focus. For example:

Gay couples in Southern states like Arkansas, Louisiana, Mississippi and Texas are more likely to be raising children than their counterparts on the West Coast, in New York and in New England.

The pattern, identified by Mr. Gates, is also notable because the families in this region defy the stereotype of a mainstream gay America that is white, affluent, urban and living in the Northeast or on the West Coast.

I think mainstream gay America is also stereotypically single so saying gay couples don't fit the stereotype is not telling us much. Besides, maybe Arkansas, Louisiana, Mississippi and Texas are less white, less affluent and less urban than the Northeast or the West Coast, regardless of sexual orientation. None of these speak to Southern hospitality towards gays.

***

One particular statistic ought to be used more carefully:

About 32 percent of gay couples in Jacksonville are raising children, Mr. Gates said, citing the 2009 Census data, second only to San Antonio, where the rate is about 34 percent.

It's hard to know how high these numbers are without context. (For example, this report (PDF link) tells us that overall, 20 percent of gay couples in the U.S. have kids.) Does this show that gay couples in the South are more likely to raise kids? Or does this show that gay couples are less likely to be "out" in the South, and one of the main reasons for coming out is to get benefits for the kids?

The point is that a large proportion can be due to a large numerator or a small denominator.

One of the themes that I cover in talks is the need for the analytics community to move beyond "insight" and shift attention to "impact". This requires a shift in mentality from "what I can compute" to "what I should compute".

They processed their user profiles and found the top 10 words or phrases people use to describe their work experience. Nice insights. Not something you and I could guess at with any accuracy without the data processing capability. However....

What should they be analyzing? Unfortunately, I don't care about "overuse"; in fact, this is an instance of "story time" as overuse is highest frequency with a negative value judgment, and I'm not sure from where the latter arises.

What should they be analyzing? What LinkedIn users care about, I suspect, is whether the use of particular words is correlated with a higher hit rate, more people looking up your profile. If they can show us this correlation, then they have successfully moved from insight to impact. And they are tantalizing close to this analysis, as they already have the word frequency data, and based on the statistics available on their website, they also track the frequency of profile views.

Reader John M. complains about "story time" in this New York Times article, which tells us about the trend among immigrants to no longer change their names after arrival. The subtext is a celebration of reduced discrimination and/or xenophobia against foreigners. (They did not say anything about Arizona.)

***

I agree. The article is prototypical story time: the promise of a data-driven analysis but nothing that approaches quantitative rigor ever materializes; instead, we get free-form speculation.

The only number offered in this entire article is that only about 6 out of 500 name change applications in June 2010 in New York "appeared to be obviously intended to Anglicize or abbreviate the surnames that immigrants or their families arrived with from Latin America or Asia", according to an accounting by the Times.

Even without making a fuss about this number, we must ask how one number can make a trend? What was the rate of change in the past, and what part did discrimination play in these changes?

***

What's wrong with no data? It becomes "he said, she said". For example, one of the examples given of someone who changed his name was "Tom Lee" aka Wong Ah Ling who was "the unofficial mayor of Chinatown". After mentioning Lee, Anne Bancroft and Charles Steinway, the author wrote this without irony:

The rationale was straightforward: adopting names that sounded more American might help immigrants speed assimilation, avoid detection, deter discrimination or just be better for the businesses they hoped to start in their new homeland.

I don't know who Tom Lee was but come on, if someone was so prominent in Chinatown, how did a name change change his ethnic identity? How did "Tom Lee" sound less Chinese, more American? Okay, I know there is a white dude called Tommy Lee but still, come on. (The author did manage to quote Prof. Nancy Foner later in the article, who essentially contradicted the earlier assertion.)

What's wrong with no data? You get platitudes like "most experts agree," "experts say", "they say," "sociologists say". And for some reason, these experts and sociologists don't use data either.

No sooner had I written about "story time" than the LA Times journalists on the education beat announced "Story time!"

An article published recently on using test scores to rate individual teachers has stirred the education community. It attracted Andrew Gelman's attention and there is a lively discussion on his blog, which is where I picked up the piece. (For discussion on the statistics, please go there and check out the comments.)

In reading such articles, we must look out for the moment(s) when the reporters announce story time. Much of the article is great propaganda for the statistics lobby, describing an attempt to use observational data to address a practical question, sort of a Freakonomics-style application.

We have no problems when they say things like: "There is a substantial gap at year's end between students whose teachers were in the top 10% in effectiveness and the bottom 10%. The fortunate students ranked 17 percentile points higher in English and 25 points higher in math."

Or this: "On average, Smith's students slide under his instruction, losing 14 percentile points in math during the school year relative to their peers districtwide, The Times found. Overall, he ranked among the least effective of the district's elementary school teachers."

Midway through the article (right before the section called "Study in contrasts"), we arrive at these two paragraphs (my italics):

On visits to the classrooms of more than 50 elementary school teachers in Los Angeles, Times reporters found that the most effective instructors differed widely in style and personality. Perhaps not surprisingly, they shared a tendency to be strict, maintain high standards and encourage critical thinking.

But the surest sign of a teacher's effectiveness was the engagement of his or her students — something that often was obvious from the expressions on their faces.

At the very moment they tell readers that engaging students makes teachers more effective, they announce "Story time!" With barely a fuss, they move from an evidence-based analysis of test scores to a speculation on cause--effect. Their story is no more credible than anybody else's story, unless they also provide data to support such a causal link. Visits to the classes and making observations do not substitute for factual evidence.

This type of reporting happens a lot. Just open any business section. They all start with some fact, oil prices went up, Google stock went down, etc. and then it's open mike for story time. All of the subsequent stories are not supported by any data; the original data creates an impression that the author uses data but has nothing to do with the subsequent hypotheses. So be careful!

A new study released by the American Public Transportation Association (APTA) shows that people who live in communities with extensive public transportation networks exercise more, live longer, and are generally healthier than people in automobile-dependent communities. (Via Inhabitat blog)

So far so good. They found that the presence of public transport is correlated with public health.

***

Then the teacher declares story time, and then up is down, and left is right.

The headline writer tells us: REPORT SAYS PUBLIC TRANSPORTATION MAKES YOU SKINNY. It said nothing of this sort.

APTA president makes up this chain of cause-effect, with no supporting evidence:

Use of public transit simply means that you walk more which increases fitness levels and leads to healthier citizens. More importantly, increasing use of public transit may be the most effective traffic safety counter measure a community can employ.

Here are some other stories for your reading pleasure:

1. Public transport typically are better developed in cities. Younger people happen to prefer living in cities. Younger people exercise more and are typically healthier.

2. Cities that have extensive public transport typically also have an abundance of restaurants and food stores. Having lots of food around tempts people, and they tend to eat more than they need. They get fat and unhealthy.

3. Places with great public transport are mostly cities. Cities are crowded, polluted and have a dearth of outdoor space. City dwellers do not exercise as much, and they get unhealthy.

4. Places with great public transport are mostly cities. Cities are crowded, polluted and have a dearth of outdoor space. Because of scarcity, city dwellers value green space more, and therefore they tend to spend more time there and get healthier.

5. Large cities have plenty of distractions from healthy activities. Night clubs, movie theaters, Broadway shows, karaoke, shopping malls, etc. all take up valuable time that should otherwise be used for exercising. Therefore, people are less well in cities.

6. City dwellers are vain people who must look good. Looking good means being skinny, according to the fashion mavens. Thus, people exercise more and eat healthy.

***

Ok, you get the picture. There are no end to telling stories. The one thread linking all these stories is that there is not a shred of evidence to support any of the logic. The data that is cited merely states the correlation between A and Z, and tells us nothing about A causing B causing C causing... Z.

Taleb calls this the "narrative fallacy". Our minds are very active and very successful in making up stories. Unfortunately, most of them are junk, and nothing more.