Parables vs. stories

Let me explain with a story. A few months ago I read the new book, Doing Data Science, by Rachel Schutt and Cathy O’Neil, and I came across the following motivation for comprehensive integration of data sources, a story that is reminiscent of the parables we sometimes see in business books:

By some estimates, one or two patients died per week in a certain smallish town because of the lack of information flow between the hospital’s emergency room and the nearby mental health clinic. In other words, if the records had been easier to match, they’d have been able to save more lives. On the other hand, if it had been easy to match records, other breaches of confidence might also have occurred. Of course it’s hard to know exactly how many lives are at stake, but it’s nontrivial.

The moral:

We can assume we think privacy is a generally good thing. . . . But privacy takes lives too, as we see from this story of emergency room deaths.

This particular story, though, set off my Spidey-sense. As I wrote at the time:

One or two patients per week? 75 people [in a year] is a lot! To calibrate, I’d like to get a denominator, the total number of deaths each year.

I’m not sure how large the “smallish town” is. Here’s Wikipedia: “A town is a human settlement larger than a village but smaller than a city. The size definition for what constitutes a ‘town’ varies considerably in different parts of the world. . . . In the United States of America, the term “town” refers to an area of population distinct from others in some meaningful dimension, typically population or type of government. . . . In some instances, the term “town” refers to a small incorporated municipality of less than 10,000 people, while in others a town can be significantly larger. Some states do not use the term ‘town’ at all, while in others the term has no official meaning and is used informally to refer to a populated place, of any size, whether incorporated or unincorporated. . . .” Wikipedia then goes state by state, for example, “In Alabama, the legal use of the terms ‘town’ and ‘city’ are based on population. A municipality with a population of 2,000 or more is a city, while less than 2,000 is a town.”

Just to go forward on this, I’ll assume the “smallish town” has 10,000 people. If approximately 1/70 of the population is dying every year, that’s 140 deaths a year. So that can’t be right—there’s no way that half the deaths in this town are caused by poor record-keeping in a hospital. If the town had 20,000 people (which would seem to be near the upper limit of the population of a town that one would call “smallish,” at least in the United States), then we’re talking 1/4 of the deaths, which still seems way too large a proportion. Even if it is a town with lots of old people, so that much more than 1/70 of the population is dropping off each year, the numbers just don’t seem to add up. . . .

Based on my calculations, I feel like there is something missing in the story that was told about the hospital records. I could be wrong, though. I might be missing something subtle or even something obvious. It’s hard for me to know, though, because the story is not sourced.

OK, picky picky picky. So what’s my point?

My point is that, if we want to truly learn from a story, in the “God is in every leaf of every tree” sense, we can’t just relax and soak in the message, we need to push push push. I don’t know what’s going on here, whether the story is entirely made up or maybe the numbers got garbled and are off by one or two orders of magnitude or maybe two stories got mashed up and something was lost in translation, or maybe there’s some key aspect I’m not understanding. Rachel put me in touch with David Crawshaw, the source of this story, but he did not reply to my questions or maybe they did not reach him.

The point is, we’ve hit a brick wall. Without sourcing, without any way to get more information, we don’t have a story at all, we have a parable.

Here’s Webster:

par·a·ble noun \ˈpa-rə-bəl\
: a short story that teaches a moral or spiritual lesson; especially : one of the stories told by Jesus Christ and recorded in the Bible . . .

specifically : a usually short fictitious story that illustrates a moral attitude or a religious principle

OK, so there’s nothing religious in this parable (nothing about Bayes or Emacs or Linux, yuk yuk yuk) but, yes, it illustrates a moral attitude. The key is that the story is simple and has just one purpose. The purpose is the moral. It’s not really considered desirable for a parable to have depth. If deeper investigation into the parable were to change or complexify its message, that would defeat the purpose.

I’m a statistician, and I like stories more than parables. I like that when I look into a story or a statistical example carefully, I can keep learning. I like the fractality of stories, the way that the deeper we look, the more we can learn (and this is the subject of one of my papers with Basbøll).

In telling this particular story, I’m not trying to beat up on Rachel, or Cathy, or David Crawshaw. Indeed, for the purpose of writing a general-interest book, maybe a parable works better than an endlessly-complexifying story. I have no idea. But I think it’s been a good real-life story to illustrate the distinction between parables and real-life stories. Originally I was hoping to find out from Crawshaw what was up with this hospital, but since he has not (yet) responded (which is fair enough; it’s hardly his job to respond to a question from someone he’s never met, regarding a parable he told that appeared in somebody else’s book), that’s part of the story too. If I do hear from him, I’ll post an addendum here.

P.S. Nobody seems to have heard from David Crawshaw on this but Cathy just informed me that she’s inserting a footnote in the second printing, pointing readers to this post so they can make their own judgments.

25 Comments

But I think you ought to beat up on Rachel! I take a strict liability view of books: if you put a fact in there, either you check it well, or at least when pointed out that it seems incongruous it behooves you to dig deeper seriously.

What’s Rachel’s views on that factoid? Has she followed up with David Crawshaw? Has she emailed you / blogged about it? It seems just too easy and convenient to point to David Crawshaw as the source and wash her hands off it.

Yes, I contacted Rachel and she contacted Crawshaw, but I haven’t heard back from him. I don’t think Rachel can do much more than that, at least not until she produces a second edition of the book!

I wouldn’t beat up on Rachel and Cathy for not checking all the facts in their book. I mean, sure, it’s ultimately their responsibility, but we all make mistakes. A few years ago I edited a book that had many authors. I love the book, I really love it, and my co-editor Jeronimo Cortina and I went to a lot lot lot of effort to clean up all the chapters (as well as to write long sections of our own). Still, the parts written by others are full of factual claims that I didn’t check. That’s just the way it is.

I’m still curious about Crawshaw’s side of the story. My guess is that it’s an off-the-cuff second- or third-hand story that he told, and that the details got distorted in retelling. I do that sometimes when I lecture: I tell stories where I don’t know all the details.

Maybe the real problem was that Crawshaw’s authorship of that chapter was not stated clearly enough (or, at least, not clearly enough for me to have noticed when reviewing the book). As a result, Crawshaw didn’t take responsibility for the published story because it was under Rachel and Cathy’s name, and Rachel and Cathy didn’t take responsibility because it was Crawshaw’s story. So, rather than saying anyone in particular did something wrong, it’s more like the baton was dropped in a relay race.

I don’t know, but I’m guessing that what is really bothering them at this point are the aspects of the social interactions. I personally know and like both Rachel and Cathy, but I’m not the most tactful person (even when I try to be!), so even though I have tried to present this as neutrally as possible in my interactions with them, I haven’t heard back from them and now I’m worried that they’re angry with me.

One might think that I shouldn’t be sharing this in a a blog comment (and I’d actually agree that I’m probably showing poor judgment by sharing this with everyone), but I’m doing so anyway because I think that my personal experience here is related to the larger point, which involves the social hurdles that impede open discussion.

Again, I don’t know where Crawshaw’s example comes from (nor have I ever met Crawshaw), so I wouldn’t say that his parable is “wrong” or “false”; I just don’t know enough to interpret it. And if my social awkwardness is making it harder for the underlying story to get out, that’s too bad.

On the other hand, had I not brought this up, very few people would’ve heard about this example anyway, and I doubt the background of the example ever would’ve come up either. In any case, I’m still curious.

Yes, but David Brooks (for example) has a chance every week to run a correction in his column. But it’s not so easy for book authors. Also, I’m still not sure that a correction is in order. I don’t know the full story here, maybe it’s correct and I’m just missing some key aspect.

Hmm…Can’t they just post errata on a webpage? Lot’s of books seem to have associated webpages. Or they could just blog about it. Or even post it here as a comment for starters. Or email you and I’m sure you’d be glad to post their reply.

I don’t buy your “it’s not so easy for book authors”. Where there’s a will…..Perhaps at heart is the desire & willingness to engage your critics.

Well, I suppose that if they did get the numbers just right, the reader would still be left with the message that there is a cost to privacy. However, by getting the numbers right, one can assess the magnitude of this cost. Nevertheless, the story seems to focus more on communicating the message that the cost to privacy actually exists, and not so much on particulars related to the weight of this cost.

nicely put. And that unfortunately also summarizes a lot of social science publications and presentations. A conclusion or confirmation about how the world really works, or what people are really like is discussed, but often very little weight attached to the details of the data (other than the existence of asterisks in the tables). The easy ones to pick out are those where the reported effect size is too big to believe.

There’s a basic schema here: “Really? You think 75 deaths per year occur in a town because of privacy issues?” “Calm down there Andrew, at least we’re sure the number is more than zero.”

And we’re basically left with nothing more than our prior belief as to whether or not something is going on.

If I may add, the nature of a parable is that it will be misinterpreted, perhaps because the facts are so reduced they can be used as needed. And that also means they can be examined in great depth. Example: the good Samaritan. We can agree on the basics: a guy is lying in the street – the story says he was robbed but how could anyone know that? – 2 people see him but do nothing and the 3rd, a Samaritan, assists him. The meanings are simplified in Christian teaching. For example, if you search for “the good Samaritan”, you can find links on the first results page that talk about “shame” that would be suffered by the 1st passer-by, etc.

We tend to read the story as condemning the behavior of the 1st two but that’s not necessarily what it means. Jesus grew up in this society and we can only guess at its complexity but if the first people touched the man and he was dead that would have meant not just shame but ritual defilement of the entire community. That would invoke all sorts of elaborate purification rituals. And why is the 3rd guy a Samaritan? That’s a key point because Samaritans were not part of the community, though the reasons for this are highly distorted in most Christian sources. The essential point is that Samaritans believed – and still believe because there are some left – in a very fundamentalist vision in which they argue they are the truer Jews, the keepers of the original materials and worship from before the Babylonian exile. So this Samaritan helps out because he can touch a dead body … and so can most people because they’re not priests or Levites … and you are like the Samaritans and they are like you. The importance of him being a Samaritan is an echo of Ruth, who was a Moabite – the people described as enemies – but who demonstrated her worthiness and was accepted into the people.

So we often get some of the points right, like it’s better to help than walk away, but we add in layers of condemnation and judgement that don’t actually seem to have been intended. That is, no one did anything wrong. The story would be the same if the man lying on the road was a Samaritan helped out by a non-Levite Jew but then it wouldn’t be as inclusive about the nature of good people. I could go on but my main point is some story is distorted to tell another story, usually in this case one in which Judaism is rigid and Christianity is about love, etc. That this isn’t what the story says is besides the point.

I see something similar to the hospital story … or the idea that x number of people are homeless … or the clusters of leukemia people became obsessed with. I remember hearing homeless numbers 30 years ago and doing a quick mental calculation that if I allocated by state population N. Dakota would have 50k homeless. I couldn’t imagine where 50k homeless would go in N. Dakota given there are 3 cities with 50k and a total population of under 700k. With leukemia clusters, it didn’t take much to see people confused random with even spreading, something anyone who has ever looked up in the night sky could figure out.

We have a need to tell stories but we also have a need to fit those stories into other stories. It makes a better story if people die because of lack of coordination between an ER and some other facility – though why a mental health place is beyond me.

But to take this another step, when we distort – and we must distort – we tend to obscure the actuality. So we missed for countless years the idea that surgeons should wash hands before operating and then, even knowing that, we missed the obvious things like tracking how often central lines are cleaned and installed and how often hands are washed in general. Why? One reason is these important bits didn’t fit into the stories we were telling ourselves – even when in retrospect we wonder “why wasn’t it obvious that central lines get infected?” Whatever story we tell necessarily excludes parts that are true.

To be really “odd” about this, I relate this to tuning. Most people, since they don’t play an instrument, aren’t particularly aware of tuning at all. Many people don’t grasp that strings can be tuned exactly to some pitches but that then means they can’t play in certain keys. They aren’t aware that pianos aren’t tuned to what we think of as pitch – in general – but are, at least ideally, set so each key is the same distance away from really dead on. And so you have strings tuning to each other and some pianos feel better because they’re off or on in a way that fits your head. So if I like D major, then maybe I’m drawn to the piano that tells me a better D major story. Which means it leaves out some other stories because every tuning is a choice of a story to tell and every story is a way of presenting a point of view, not all.

A story or claim with incomplete info or questionable statistics is not thereby a parable. A trade-off between privacy of some sort and negative consequence of some other sort seems obvious enough, though frankly I don’t see how sharing records between an ER and a mental health clinic saves lives. So it’s not a very good illustration of the point.

What makes it parable-like to me is that it’s a narrative, presented as having actually occurred and with specific-looking facts (“some estimates . . . a certain smallish town”) but being unsourced and (upon reflection) improbable, that leads smoothly toward the counterintuitive lesson that the storyteller wanted to impart. Like you, I don’t see how sharing records between an ER and a mental health clinic saves lives—certainly not as manly lives as claimed in the parable—but that’s the point. If you just listen to the parable without considering its plausibility, it leads straight to the point that the teller wants to convey. in contrast, an actual sourced story has all the prickliness of reality and typically does not send such a clear message. I can see why business writers like parables, but as a statistician and data-lover I prefer real, sourced stories.

The problem with the drug interaction theory is (based on the 1-2 preventable deaths per week statistic) the total US deaths attributable to adverse interaction with mental health drugs would have to be ~500,000 a year! That’s sounds bizarre.

Andrew, the thing that I find about this that is so amazing is that you are so shocked by these numbers. Let’s call it 75 people per year. Let’s say that this particular town was singled out because it had a relatively high number, and let’s say that it was near an important county mental health facility.

So we have a “smallish town” which I interpret as not as small as a small town but not as big as a “city”. Intuitively that means about 50,000 people to me because I live in SoCal and Pasadena is around 140k people and it’s a city but it could even be called a “town” in common usage. Arcadia is usually called a town and it’s 57k people.

So suppose you’ve got this small town, it has a hospital, it isn’t a fancy hospital. If you have a serious trauma case they would be taken to a larger city hospital.

Next, let’s dig into the death rates in the US population. we go to the CDC life tables:

Let’s pretend the whole population of people who die in this hospital consist of 30 year olds (elderly tend to die in hospice or elderly care facilities). Out of 100k people age 30 about 100 should die per year according to the 2009 life table. This means about 50 should die in our small town. However, many of them should be taken to more fancy trauma facilities… so perhaps only say 25 actually die in this local hospital.

Now, we have the information that there exists a mental illness care facility nearby. The mentally ill tend to have significant health problems. Their rate of death should be perhaps up to 10x the rate of the general population. So how many mentally ill patients are there? In this small town of 50,000 which has a mental illness care facility nearby, let’s say there are maybe as many as 5000 mental illness patients in this area. These patients would NOT be taken to other hospitals, and so would tend to die in the hospital.

Out of 5000 mental illness patients, at 10x the rate, we have 50 mental illness patients who die in this hospital, and they make up 66% of the people who die in the hospital. This is order-of-magnitude along the lines of “one or two per week”.

It’s not so out of whack to me. On the other hand, it’s not so out of whack when I take a moderately detailed look at the thing, so maybe god is in every leaf of every tree after all… ;-)

Still doesn’t show that this number died because of the lack of information flow between the ER and the mental health clinic. We still don’t have evidence that “if the records had been easier to match, they’d have been able to save more lives”.

I don’t know, I have a bit of indirect experience with seriously mentally ill people, so it doesn’t seem very unlikely to me that just about every mentally ill person who dies in a hospital might have been a preventable death due to information flow in some sense. Typical example:

Patient doesn’t get refill of meds, winds up psychotic, goes to hospital. Hospital has no idea what is wrong with patient, pumps them full of psych meds to keep them from freaking out. Patient appears to recover and because patient appears indigent is released. Patient still doesn’t have meds. Several days later patient jumps off balcony believing she can fly. Winds up in hospital again, during treatment has adverse reaction to meds and dies during surgery designed to treat cranial bleed…

I won’t get into details but I have people I know who work with psych patients, and I have family members or close friends who have close relationships with psych patients. That kind of thing sounds like a really typical day in the life of people who treat psych patients to me.

This kind of discussion is why I prefer grounded stories to floating parables. I want to know where is the town, what is the hospital, etc etc. I remain unconvinced that 10% of all deaths in a city of 50,000 (let alone a “smallish town”) could be “because of the lack of information flow between the hospital’s emergency room and the nearby mental health clinic.” But if the story were sourced, it’s something that we could discuss!

“Parable” is nicer than “statistical crime” as a way of describing this. A parable is a way of expressing an assertion which superficially resembles presentation of evidence in support of the assertion. Minimally, when using parables, one should be clear that one is doing so. But this may not be enough to stop people’s opinions being influenced by parables as if they were evidence.

Andrew, you are a trusting soul. Such a strange statistic usual indicates some peanut pushing. Eli can find a likely suspect. One David Crawshaw is a Google engineer who “builds infrastructure for better understanding search” and gave a talk at Columbia evidently hosted by Rachel Schutt. You may have seen it.

There is another who is an editor at the WSJ, but that appears less likely.

Given what he does, it makes perfect sense that he would look not to critically for applications of his tool