Posts categorized "Economy"

Harvard Business Review devotes a long article to customer data privacy in the May issue (link). The article raises important issues, such as the low degree of knowledge about what data are being collected and traded, the value people place on their data privacy, and so on. In a separate post, I will discuss why I don't think the recommendations issued by the authors will resolve the issues they raised. In this post, I focus my comments on an instance of "story time", some questions about the underlying survey, and thoughts about the endowment effect.

***

Much of the power of this article come from its reliance on survey data. The main survey used here is one conducted in 2014 by frog, the "global product strategy and design agency" that employs the authors. They "surveyed 900 people in five countries -- the United States, the United Kingdom, Germany, China, and India -- whose demographic mix represented the general online population". (At other points in the article, the authors reference different surveys although no other survey was explicitly described other than this one.)

Story time is the moment in a report on data analysis when the author deftly moves from reporting a finding of data to the telling of stories based on assumptions that do not come from the data. Some degree of story-telling is required in any data analysis so readers must be alert to when "story time" begins. Conclusions based on data carry different weight from stories based on assumptions. In the HBR article, story time is called below the large graphic titled "Putting a Price on Data".

The graphic presented the authors' computation of how much people in the five nations value their privacy. They remarked that the valuations have very high variance. Then they said:

We don't believe this spectrum represents a "maturity model," in which attitudes in a country predictably shift in a given direction over time (say, from less privacy conscious to more). Rather, our findings reflect fundamental dissimilarities among cultures. The cultures of India and China, for example, are considered more hierarchical and collectivist, while Germany, the United States and the United Kingdom are more individualistic, which may account for their citizens' stronger feelings about personal information.

Their theory that there are cultural causes for differential valuation may or may not be right. The maturity model may or may not be right. Their survey data do not suggest that there is a cultural basis for the observed gap. This is classic "story time."

***

I wonder if the HBR editors reviewed the full survey results. As a statistician, I think the authors did not disclose enough details about how their survey was conducted. There are lots of known unknowns: we don't know the margins of error on anything, we don't know the statistical significance on anything, we don't know whether the survey was online or not, we don't know how most of the questions were phrased, and we don't know how respondents were selected.

What we do know about the survey raises questions. Nine hundred respondents spread out over five countries is a tiny poll. Gallup surveys 1,000 people in the U.S. alone. If the 900 were spread evenly across the five countries, their survey has fewer than 200 respondents per country. A rough calculation gives a margin of error of at least plus/minus 7 percent. If the sample is proportional to population size, then the margin of error for a smaller country like the U.K. will be even wider.

The authors also claim that their sample is representative of the "demographic mix" of the "general online population." This is hard to believe since they have no one from South America, Africa, Middle East, Australia, etc.

The graphic referenced above, "Putting a Price on Data," supposedly gives a dollar amount for the value of different types of data. Here is the top of the chart to give you an idea.

The article said "To see how much consumers valued their data, we did conjoint analysis to determine what amount survey participants would be willing to pay to protect different types of information." Maybe my readers can help me understand how conjoint analysis is utilized for this problem.

A typical usage of conjoint is for pricing new products. The product is decomposed into attributes so for example, the Apple Watch may be thought of as a bundle of fashion, thickness, accuracy of reported time, etc. Different watch prototypes are created based on bundling different amounts of those attributes. Then people are asked how much they are willing to pay for different prototypes. The goal is to put a value on the composite product, not the individual attributes.

***

Also interesting is the possibility of an "endowment effect" in the analysis of the value of privacy. We'd really need to know the exact questions that the survey respondents were asked to be sure. It seems like people were asked how much they would pay to protect their data, i.e. to acquire privacy. In this setting, you don't have privacy and you have to buy it. A different way of assessing the same issue is to ask how much money would you accept to sell your data. That is, you own your privacy to start with. The behavioral psychologist Dan Kahneman and his associates pioneered research that shows the value obtained by those two methods are frequently wide apart!

In a classic paper (1990), Kahneman et. al. told one group of people that they have been gifted a mug, and asked them how much money they would accept in exchange for it (the median was about $7.) Another group of people were asked how much they were willing to pay to acquire a mug; the median was below $3.

Is this the reason why businesses keep telling the press we don't have privacy and we have to buy it? As opposed to we have privacy and we can sell it at the right price?

***

Despite my reservations, the HBR piece is well worth your time. It raises many issues about data collection that you should be paying attention to. Read the whole article here.

Only 6% of crashes in New Zealand involve foreign drivers, according to the latest figures provided by the Ministry of Transport.

But in some remote regions of the South Island particularly popular with tourists for their scenery... foreign drivers are involved in about a quarter of all crashes.

These sentences come from a CNN article about a vigilante movement in those regions popular with tourists. The vigilantes snatch car keys from tourists who annoy them by holding up traffic along the scenic routes.

My friend Tonny saw this article and thought about Numbers Rule Your World. I love to hear stories about how you're able to relate the stories in my book to other real-world situations.

***

The 6% aggregate number hides the effect of tourists on the rate of accidents. The effect of tourists is different depending on which region one is looking at. For this particular article, only the 25% number is relevant--and even this point is not clearcut. I'd like to know whether the vigilante incidents are exclusively occurring in those "remote regions of the South Island", as implied.

The 25% figure does not address the more important question of whether locals or tourists are more likely to get into accidents in those regions. While the tourists accounted for one out of four accidents, they also comprise a large proportion of the traffic. If, say, 50% of the cars on those roads are driven by tourists, then they are disproportionately less likely to be involved in accidents. The base rate of tourist traffic is the missing data.

Further, a margin of error is useful, especially if few accidents occur in those remote areas.

Chapter 1 of Numbers Rule Your World deals with the notion of the statistical average and Chapter 3 investigates when it is appropriate to aggregate data and when it's not. Learn more about the book here.

Chapter 1 of Numbersense(link)uses the example of U.S. News ranking of law schools to explore the national pastime of ranking almost anything. Since there is no objective standard for the "correct" ranking, it is pointless to complain about "arbitrary" weighting and so on. Every replacement has its own assumptions.

A more productive path forward is to understand how the composite ranking is created, and shine a light on the underlying assumptions.

***

The New York Times recently published an article entitled "What's the Matter with Eastern Kentucky?" (link). The problem with Eastern Kentucky, as the reporter saw it, is that those counties rank at the bottom of their list. Here is their ranking methodology:

The team at The Upshot, a Times news and data-analysis venture, compiled six basic metrics to give a picture of the quality and longevity of life in each county of the nation: educational attainment, household income, jobless rate, disability rate, life expectancy and obesity rate. Weighting each equally, six counties in eastern Kentucky’s coal country (Breathitt, Clay, Jackson, Lee, Leslie and Magoffin) rank among the bottom 10.

There is a companion blog at The Upshot, giving more context, and a county-level map of the ranking (link). Here are the relevant sentences.

The Upshot came to this conclusion by looking at six data points for each county in the United States: education (percentage of residents with at least a bachelor’s degree), median household income, unemployment rate, disability rate, life expectancy and obesity. We then averaged each county’s relative rank in these categories to create an overall ranking.

(We tried to include other factors, including income mobility and measures of environmental quality, but we were not able to find data sets covering all counties in the United States.)

We used disability — the percentage of the population collecting federal disability benefits but not also collecting Social Security retirement benefits — as a proxy for the number of working-age people who don’t have jobs but are not counted as unemployed.

How should we read this article?

***

What is this a ranking of? What is the research question? The answer is "how hard it is to live in specific counties". Right away, we know any answer is subjective, even if data is proffered.

Look out for the relative weights. The authors tell us it's equally weighted. "Equal weighting" implies fairness but frequently hides the inequity. Are those six factors equally important? Are there strong correlations among some of those factors?

The blog post discloses that each of the six metrics is first converted to ranks before being averaged. This means we need to worry about how much each metric vary from county to county. Take obesity rate for example. Here is a map of obesity at the county-level published by the CDC, based on a model estimate (link).

The people who made this map placed the counties into five groups. The middle groups are narrowly defined, for example, 29.2% to 30.8%. Any analyst who converts the county-level obesity rates to ranks makes over 3000 gradations of obesity rate. Said differently, the worst county is rated as over 3000 times worse than the best county. In the case of obesity, the medical community would consider most of these counties unhealthy.

This is an example that shows too much granularity hurts you, a core insight of statistics that may seem counterintuitive.

***

Ultimately, it's for you to decide whether you believe this ranking makes sense or not. I'm not here to dismiss it because as I said in Numbersense (link), you can replace this methodology with something else, but the new method will also have its own assumptions.

I just did a guest lecture at a New School journalism class. While preparing for the class, I pulled the sad stock chart for GRPN (Groupon):

If you bought the hype in 2011, you'd have lost 70% of your investment ($25 to $7).

Given what we know today, it's hard for people to feel the hype that the media helped fuel in those days. As a reminder, here is the New York Times's David Pogue gushing about Groupon, just before its IPO: link. Pogue was one of many such commentators.

Around that time, I had this response to the Groupon boosters. There was a gaping hole in the win-win-win story from the start. Retailers are giving up sure profit for the probability that the coupon-users are not dealseekers and would come back for repeat business, at a higher price.

This is related to my current concern about the so-called "gas price stimulus". The hit to the oil and gas sector is immediate and certain. The shift of spending to other sectors, and the associated "multiplier effects", is a probability of multiple events occurring in the future.

Dragged by infectious incuriosity, the financial press ran with the story that falling gasoline prices (50% drop in 6 months) is "the best economic stimulus one can get". See former Deputy Treasury Secretary Robert Altman on CNBC, Business Insider's "cheap gas boost", Wall Street Journal citing the "low oil prices as an effective tax cut for consumers", New York Times quoting a Citigroup analyst claiming a global > $1 trillion stimulus, etc. etc.

This is the kind of story that one should believe only if half asleep. Here are three reasons why this conjecture is likely to be wrong:

1. Forgetting the big picture

There was a McDonalds next to a Burger King in a small town. The Burger King went out of business. The McDonalds suddenly did twice the usual business. Surely, McDonalds was the winner here but did the economy of the town expand? Unfortunately not. The consumers merely shifted their spend from Burger King to McDonalds.

Now, consider a household that spends $200 a month on gas before the oil price crash. Let's say the same amount of gas now costs $100. According to those rosy-cheeked economists and journalists, the household now has an extra $100 to spend on other things, and this "extra" spending stimulates the economy.

But the total amount of expenditure is still $200. The only thing that changes here is the mix of spending. GDP is based on total spend, not the mix of spend. Some sectors of the economy will benefit but at the expense of the oil and gas sector.

2. Imperfect substitution

Consider our household again. The total economy size remains the same only if the household spends every dollar of the $100. If the household saves even one of those dollars, the economy shrinks, compared to before.

3. Making bad assumptions about the future

It's unclear from any of those articles how the analysts came up with the size of this oil-drop stimulus. Every one of them must make a forecast about future oil prices. I bet many of them take the current price as the new normal, and use that price as the future price.

If I tell you, you should not take an extreme value and treat it as the average, you'd scold me for stating the obvious.

As with most economic arguments, one could posit a much more complex chain of relationships that would argue how one goes from 50% drop in oil prices to trillions of economic stimulus. It is the business journalist's job to explain that complicated chain. The connection is clearly not as simple as reported. If one establishes a chain, such as A up ->B down ->C down -> D up, etc., each one of those causal links should be supported with evidence.

***

The same type of fallacious thinking pervades the business sector. For example, we keep hearing about the growth in retail sales from mobile devices. We don't know if consumers are shifting from the Web channel to the mobile channel, and how much of the mobile sales are incremental.

That is the question in my head when I read an article like USA Today's "Jobless Claims Fall, Suggests Strong Hiring". (link)

The headline makes the connection between newly-released jobless claims data and the conclusion of "strong hiring". But it turns out the new data is merely window-dressing, and the conclusion is based on longer-term trends.

Here is the new data, as reported by the USA Today reporter:

applications for unemployment benefits fell 4,000 last week to a seasonally adjusted 294,000.

The four-week average, a less volatile measure, slipped 250 to 290,500

Without even looking up the source, one should immediately see that a change of 250 on the four-week moving average is just statistical noise. The 4000 change for the last week is also statistically insignificant because the weekly series is highly volatile. The proper conclusion about this release of data is that the employment situation was stable from last week to the week before.

Now, one could go backwards in time and make an argument for "stronger hiring". This is exactly what the journalist did, by citing "total job growth in 2014 at just shy of 3 million, the best performance since 1999" and "[The 4-week moving] average [of jobless claims] has plunged 16 percent in the past 12 months, as averages have stayed at historically low sub-300,000 levels since September".

Take a look at this chart of the 4-week average in the last five years. The trend has been the same for five years (just draw a straight line through the series) and there is nothing at the right tail of the time series to indicate that the latest data release changed anything:

I'm also amazed that at this point, a journalist can write an article about employment without once mentioning the workforce participation rate. (Anyone who is excluded from the work force is not eligible to be "unemployed". The workforce participation rate has gone down without recovering.)

Notice that this time series was essentially flat until the recession.

I have a whole chapter on employment statistics in Numbersense (link).

On my sister blog last week, I wrote about how to screw up a column chart. The chart designer apparently wanted to explore whether Rotten Tomato Scores are correlated with box office success, and whether the running time of a movie is correlated with box office success. In either case, the set of movies is a small one, those directed by Chris Nolan. Here is a better view of the data:

There were a few questions on Twitter, which I will address in this post. Someone complained about the horizontal axis thinking that year data is continuous and the axis should not be discrete. That would be true if the data is truly continuous. The way I interpret the year data here is ordinal: Chris Nolan only makes a movie once every 1 to 3 years; his career is also developing during this period of time; so I think of the horizontal axis as ordinal, that is, his first film, his second film, etc.

Another Twitter user tried a scatter plot. It's very satisfying to see the following chart. It shows a strong positive correlation between the running time of a movie and the box office receipts.

In fact, a hockey-stick line fits the data even better. This implies a multiplicative relationship between running time and box office receipts. We can fit this by first taking the logarithm of box office receipts. So, the following chart showing how good this fit is:

If you run a regression, the R-squared is 90 percent, and the effect of running time is extremely significant (p < 0.0002). So we have proved that Chris Nolan should make longer movies because the longer the running time of his movies, the bigger the box office.

You might have noticed that both running time and box office numbers have gone up over time. (That is to say, running time and box office numbers are highly correlated.) Do you think that is because moviegoers are motivated to see longer films, or because movies are just getting longer?

And these items:

1) Chris Nolan's career experienced a hockey-stick growth during this time.

2) Movies have become longer and longer in general.

3) Rotten Tomatoes also experienced a hockey-stick growth in users during this time period. In January 2000, the site had 250K visitors, at which point the founders said they "started in earnest as a company". (link) Today, according to Wikipedia, they had almost 20 million monthly visitors. In other words, several of the early movies by Chris Nolan came out before RT came into its own.

Nice article in the New York Times about the "overdiagnosis" problem in cancer screening. The particular case is thyroid cancer in South Korea.

There are a number of things about any form of screening tests that one should always bear in mind:

Death rate is measured as the number of deaths divided by the number of people with the disease. The latter number increases with better diagnosis techniques.

Better diagnosis techniques for cancers inevitably identify tiny tumors, almost all of which will never develop into cancer during anyone's lifetime. Peggy Orenstein had a fabulous article a year ago about the same issue in breast cancer. In that case, those tiny tumors weren't even called cancer until the diagnosis movement sprang alive.

Once a tumor is labeled cancerous, patients will opt to fix it. Because these tumors would never have killed the patients, they inflate the number of diagnosed cases without increasing the number of deaths. So just by virtue of increased diagnoses, the death rate is brought down.

The immediate outcome of a nationwide screening program is to dramatically increase cases. The article is a little unclear about whether it is the number or the rate of death that did not fall. It doesn't really matter; either case leads us to conclude that the screening has failed to improve health.

One must not forget that screening tests, subsequent confirmatory tests, treatments, etc. all cost money, so there is a financial incentive to over-diagnose and over-treat.

In addition to the financial incentive, there is the issue that I raised in Chapter 4 of Numbers Rule Your World (link). A false negative is a very public error on the part of the medical establishment while a false positive (followed by say removal of the thyroid) is an unobservable error. So there is a statistical incentive to over-diagnose and over-treat.

Andrew Gelman touches on one of my favorite topics: prediction accuracy, and experts who cling to their predictions. Here's Andrew at the Monkey Cage blog.

His starting point is a piece by sociologist Jay Livingston on how various well-known economists made vague predictions (e.g. "I see inflation around the corner") and kept clinging to them (eventually, there will be inflation).

Several theories are given to explain this behavior. One is the idea going back to Kuhn on how scientists stick to their beliefs in the face of negative evidence. Another is that the top economists have invested a lot in those now-questioned beliefs. A third idea is that when the facts and the theories collide, it is cognitively easier to manipulate the facts than to try to change one's theories.

Andrew speculates that "the cost of being wrong is less than the cost of admitting you were wrong." I think there is a certain truth to it. What if you are a famous economist whom people in the field look up to, and frequently cite in their own arguments? You know that no matter what you say, your followers will repeat, and if it's jarringly wrong, they will just keep their mouths shut, then you are under no pressure to correct yourself.

Back to the question of predictive accuracy. I read recently a press release by Microsoft in which they boasted that they predicted the NO vote in the Scottish independence referendum. They said they issued something like an 80 percent chance of NO a few days (or a week) prior to the vote. And since the result was a NO, they were proven correct! What if I had made a 60-percent NO prediction? I probably would have declared victory too.

When we make a statement about predictive accuracy, there has got to be a formula for determining what it means to be accurate. For one-time events like the NO vote, it seems hard to come up with a formula. But without such a formula, you shouldn't believe anyone's claim of accuracy!

Some behind-the-scenes comments on my recent article on New York's restaurant inspection grades; it appeared on FiveThirtyEight this Tuesday.

***

The Nature of Ratings

This article is about the ratings of things. I devoted a considerable amount of pages to this topic in Numbersense (link) - Chapter 1 is all about the US News ranking of schools. A few key points are:

All rating schemes are completely subjective.

There is no "correct" rating scheme, therefore no one can prove that their rating scheme is better than someone else's rating scheme.

A good rating scheme is one that has popular acceptance. If people don't trust a rating scheme, it won't be used. (This is a variant of George Box's quote: "all models are false but some are useful".)

Think of a rating scheme as a way to impose a structure on unwieldy data. It represents a point of view.

All rating schemes will be gamed to death, assuming the formulae are made public.

Based on that, you can expect that my goal in writing the 538 article is not to praise or damn the city's health rating scheme. My intention is to describe how the rating scheme works based on the outcomes. I want to give readers information to judge whether they like the rating scheme or not.

OCCAM Data

The restaurant grade dataset is an example of OCCAM data. It is Observational, it has no Controls, it has seemingly all the data (i.e. Complete), it will be Adapted for other uses and will be Merged with other data sets to generate "insights". In my article, I did not do A or M.

Hidden Biases in Observational Data

Each month (or week, check), the department puts up a dataset on the Open Data website. There is only one dataset available and the most recent copy replaces the previous week's dataset. The size of the dataset therefore expands over time.

Anyone who analyzes grade data up to the most recent few months is in for a nasty surprise. As the chart on the right shows, the proportion of grades that are not A, B or C (labeled O and gray) spikes up by 10 times the normal amount during the last two months. This chart is for an August dataset, and is not an anomaly. It's an accurate description of the ongoing reality.

On first inspection, if a restaurant is given a B or C, the restaurant has the right to go through a reinspection and arbitration process. During this time, the restaurant is allowed to display the "Grade Pending" sign. It appears that it can take up to four months for most of the B- or C-graded restaurants to be finished with this process. Over this period of time, many of the pending grades will flip to one of A, B or Cs. The chance that they will flip to B or C is much higher than the average restaurant (i.e. for which we don't know they have a Grade Pending).

Indeed, the proportion of As in the most recent two months is vastly biased upwards as a result of the lengthy reinspection process.

For this reason, I removed the last two months from my analysis.

How might this bias affect your analysis?

If you drop all Pending grades from your analysis (while retaining the A, B, and C grades), you have created an artificial trend in the last two months.

If you keep the last available grade for each restaurant, you have not escaped the problem at all. In fact, you introduce yet another complication: B- and C- graded restaurants have older inspection dates than the A-graded restaurants. Meanwhile, those Pending grades are still dropped.

If you automatically port this data to a mapping tool, or similar, you are displaying the biased data and the unknowing users are misled. In fact, the visualization no longer can be interpreted.

IMPORTANT NOTE: The data is NOT WRONG. Data cleaning/pre-processing does not just mean finding bad data. Much of what statisticans do when they explore the data is to identify biases or other tricky features.

The Nature of Statistical Analysis

[Captain Hindsight here.] Of course, I didn't know or guess that the Grade Pending bias would be a problem. I did the first analysis of the data using a July dataset, and by the time I was drafting the article for FiveThirtyEight, it was already August so I "refreshed" the analysis with the latest dataset. That's when I discovered some discrepancies that led me to the discovery.

This is the norm in statistical analysis. Every time you sit down to write something up, you notice additional nuances or nits. Sometimes, the problem is severe enough I have to re-run everything. Other times, you just decide to gloss over it and move on.