Surveys and Thought Experiments

I’m generally sceptical of the value of surveys, as currently conducted by practitioners of ‘experimental philosophy’ as a way of getting clear about what’s going on in philosophically interesting thought experiments. The most systematic reason for this scepticism comes from thinking about what exactly is going on in thought experiments.

Following Jonathan Ichikawa and Ben Jarvis, I think examples in philosophical works should be thought of as small fictions. In particular, they’re a type of genre fiction. The genre is the same one, broadly, that fables and parables fall into. (This way of thinking about thought experiments makes Aesop an important figure in the Western philosophical canon, which isn’t a bad result I think.) Like any kind of genre fiction, there are important interpretative constraints on these fictions. If you present the story in a different ‘mode’, it should be, and will be, interpreted differently.

The way in which fables/parables/thought experiments should be interpreted has some particularly quirky features.

At least some of the time, realism isn’t important. We don’t object to Aesop’s stories because they feature talking animals. We don’t object to parables if the story doesn’t really make sense from the perspective of non-central characters. (The striking effect we get when retelling a familiar biblical or mythical story from the perspective of a non-central character is something Nick Cave has made good use of over the years.) And we shouldn’t object to thought experiments because they require a curious series of coincidences. Indeed, the best fables/parables/thought experiments are often quite unrealistic because everything about the back story is so ‘neat’. They don’t have characters like Ulysses’ M’Intosh who simply don’t fit into the story, and people simply know, in a way that doesn’t leave the possibility of doubt even open, the things that are stipulated as true.

These stories are meant to have a point, or a moral. The intended point guides interpretation of the story. We’re meant to interpret the story in a way that makes it a fitting illustration of the moral. It would be wrong to interpret the story of the fox and the grapes as one in which the fox gets evidence that the grapes are sour and therefore leaves. And that would be wrong in part because the point is about our attitude towards what we cannot have.

The same thing is true in philosophical experiments I think. It would be wrong to interpret the Gettier example as one in which the subject has independent evidence for the justified true belief that isn’t known, or in which they aren’t justified in inferring the target proposition because the evidence for it is from a source they have independent reason to doubt. It isn’t even really necessary to state this in the example, because once we know it’s an attempt to show that justified true belief without knowledge is possible, general principles of interpretation will fill in the details.

But that means we have to know what principle the example is meant to show. And that’s why I suspect there’s a deep problem here for experimentation on the examples. We suspect that telling people the point that an example is meant to show will seriously interfere with how they evaluate the example. (I assume this is why subjects in existing surveys are not normally told what hypothesis is being tested.) But not telling people the intended point of the example will interfere with how they interpret the example.

I suspect the best way out of this problem is to investigate people who do know the intended point of the example, and hence who know how it is meant to be interpreted. But I don’t think they are suitable subjects for a controlled experiment.

This point is related to the worry that it takes a bit of training to be able to distinguish between different hypotheses that the thought experiment might be intended to show. But I suspect it goes a little deeper. In principle we could explain the distinctions without telling people the intended outcome of the experiment, and hence ‘contaminating’ it. Not so if the intended outcome is an essential part of interpreting the story being told.

Having said all that, I want to strongly agree with something that Alan White said in the previous thread. Experimental work on subjects who aren’t familiar with the debate can tell us a lot about how people interpret these thought experiments. And that can be incredibly useful for communicating the results and arguments, either to colleagues in other disciplines, or to students. I know that’s not what experimental philosophers are aiming to show, but I think it’s a valuable side-effect of their work. Indeed, if I’m right about thought experiments being genre fictions, and the genre being one philosophical training makes you much more familiar with, we’re probably all making lots of mistakes about how people interpret our examples. Experimental work is an incredibly good way of correcting these misimpressions.

To be sure, I know that experimental philosophers are aiming much higher than merely clarifying philosophical examples. But I think those of us who are sceptical of some experimental work should not overlook its values that aren’t affected by extant criticism.

8 Replies to “Surveys and Thought Experiments”

I really like the idea of thought-experiment-as-genre; indeed, I like it so much I devoted a section to it in a paper of mine on the imagination, “Configuring the Cognitive Imagination” in New Waves in Aesthetics. I say this not as an excuse for a bit of self-promotion (well, not just for such an excuse, anyway!), but because I take a different line there than you do here about what the upshot of that idea is. In short, there’s no reason to respond to this situation with the kind of skeptical throwing up of the hands that you suggest here, but rather do with it exactly what good scientific practice would have us do: enumerate hypotheses and test for them. There isn’t any interesting danger that the subjects will on the whole be interpreting the scenarios willy-nilly. Rather, one can consider specific ways in which it seems plausible that they might be interpreting them differently than one wants, and then one can do further studies with further manipulations, comprehension checks, etc. to evaluate those hypotheses. This is totally bog-standard psychological practice, but it’s not something philsophers are used to thinking about, so the difficulties can seem more insuperable in theory than they really are in practice, oddly enough.

I would note that for many of the extant x-phi studies, there’s really no particular reason to think that there’s much of a “genre mastery” effect in the results. It’s an important aspect of the cross-cultural work, for example, that the “Western” subjects overwhelmingly gave the canonical philosophical responses. If a segment of the subjects in those studies are interpreting the scenarios at odds with how philosophers standardly do, it seems due to a cultural difference in interpretations, not a difference in understanding what is to be done with the scenario.

Hi Brian, I am guessing many experimental philosophers would agree with your idea that there are problems with using thought experiments designed for philosophers in controlled experiments. As you point out those thought experiments carry with it a lot of background assumptions that we should not expect ordinary subjects to make. I think your diagnosis of why this happens is interesting (and by the way can be tested). I just wanted to point out that a lot of experimental philosophers instead construct their own vignettes which are especially designed for non-experts but are still closely connected to familiar thought experiments in philosophy. Schaffer and Knobe, for example, came up with their own vignette (closely related to familiar ones) to test whether raising the possibility of error affects knowledge ascription. It worked well for them precisely because they noted that in the philosopher’s thought experiment the salience of error was not made salient enough if ordinary subjects read them (its already made salient for philosophers because as you pointed out, they know the hypothesis). So here we have a case in which a familiar thought experiment was modified in specific ways and got them the expected result. And it seemed to work well even though subjects were not told the purpose of the experiment.

I would have thought another important role of thought experiments in philsophy is that of intuition pumps. They are devices that guide us toward the truth, especially modal truths. (Maybe you don’t think this role is so important because intuitions aren’t important?) The picture you present appears to be in tension with this supposed role.

Suppose when reading an article defending dualism, I encounter a thought experiment that is supposed to elicit the judgment that the brain is necessarily distinct from the mind. I have no antecedent theoretical commitments either way, but on a first read, I don’t come to have the intended judgment. Now, have I done something wrong? I certainly know that the thought experiment is intended to show that dualism is true. Am I supposed to modify my interpretation of the thought experiment so that I would come to the intended judgment? Even if I could do so, it is not clear what evidential value that judgment has; I might understand the author’s intention better, but I seem no closer to the truth.

In general, if a convention of this genre is to interpret the story so that you have the intuition that is intended, then thought experiments seem ill-suited as devices that guide us toward the truth. What evidential value should we assign to the judgments elicited?

Setting that theoretical issue aside, I have to agree with Jonathan that the picture you present raises no significant worry for experimental philosophy as it is practiced. Experimenters can design manipulations and controls to make sure that participants read the vignettes as intended. In fact, that is all the more reason for philosophers to play a role in designing experiments: their philosophical expertise can ensure that the manipulations and controls are appropriate. If the aim is to elicit folk judgments that pertain to philosophical theorizing, we better not leave the task of experiment design to psychologists alone, many of whom are not as philosophically-sophisticated as philosophers. (Though we should certainly borrow their expertise, through collaboration or otherwise, with other facets of experiment design and analysis.)

Interesting post. I like the idea. But it doesn’t seem quite right to say that, “not telling people the intended point of the example will interfere with how they interpret the example.” After all, lots of papers start with the example, and only tell us the point after (consider for example DeRose’s “Contextualism and Knowledge Attributions,” since it was relevant to the discussion this post spun out of). Presumably, we don’t generally have difficulty interpreting these thought experiments, even when the hypothesis isn’t familiar to us. Maybe you think the abstract is essential. Or maybe the best thing to attribute a purported difficulty of interpretation to isn’t fundamentally a lack of knowledge of the hypothesis, but rather a lack of experience with the genre. Philosophers, in virtue of being more familiar with the sorts of things philosophers argue for, may be better at ruling out irrelevant interpretations. This has a danger of just slipping into the expertise worry. And, of course, that philosophers are better at ruling out irrelevant interpretations, doesn’t mean others can’t do so (this suggests that the strategies Angel and Jonathan suggest may be fruitful). But I also worry that this way of explaining difficulties in interpretation may introduce a deep worry with thought-experiments. Specifically, it strains the analogy with most genres. It’s not as though an unfamiliarity with other works in the genre interferes deeply with interpreting fantasy, mystery, or sci-fi. (I haven’t read much sci-fi in the past, but I recently got on a Philip Dick kick w/o much interpretive difficulty.) But there is one genre where a lack of experience can pose a real interpretive difficulty: mythology. But here, it’s not familiarity with the genre that’s primarily relevant. Rather, familiarity with the specific culture is what’s important. But this suggests that it’s better to construe thought experiments not as examples of a genre, but as examples of a particular culture’s mythology. And that has obvious deep worries. (Obviously this, like the original post, doesn’t approach an argument for any position—and certainly not any position I’d adopt.)

I’d also just point out that, while experimental philosophy focused on philosophically interesting thought experiments is the predominant type, there are other experimental philosophical projects as well that don’t introduce worries about differences in interpretation of stories. Here are three examples:

In principle, lots of interesting philosophical hypotheses could be tested in ways that don’t require garnering people’s intuitive assessments of vignettes. (Lots of hypotheses in the philosophy of language come to mind, and linguists have also pursued experimental work on such topics.)

There’s also something to be said for surveying philosophers themselves, no? Perhaps this is especially so if your theory suggests results of an experimental manipulation or a between-groups difference that would not be antecedently obvious without the benefit of that theory.

Very interesting proposal. A call for clarification: Do you mean it to be merely descriptive? It seems that you are. That is, it seems you’re offering this characterization of thought experiments as a merely descriptive claim, such as: this is what philosophers have been doing (presumably in only recent history or so). If so, I’d worry about there being any unified conception that would cover even general trends. My sense is that philosophers have quite different views about the role of hypothetical cases in their arguments. Even if we can get a general trend for a certain literature (e.g. contextualism), this role might not carry over to others (e.g. moral responsibility). The former, for example, seems to have focused much more on linguistic practice while the latter seems a bit more like the story-heuristic kind of model.

You could be making a more normative claim, however, such as: this is how we should use thought experiments. And even then I’d worry about being a bit too restrictive. Perhaps the role should be more varied and tailored to the particular issue. For example, in the previous post about the NY Times article the issue was raised that maybe it does make more sense for epistemologists to be more concerned with the ordinary use of “knows” than those working in agency to be concerned with “free will,” “moral responsibility,” etc. If so, then the role of thought experiments might not be so uniform.

Furthermore, sometimes philosophy proceeds in a more destructive way. For example, influential philosopher McX might seem to make use of thought experiments in a way other than you describe (or prescribe). If her use is more amenable to testing via an x-phi survey methodology, then the use of that survey method might be useful to philosophy. We could at least potentially learn something about McX’s argument—-e.g. whether it’s bunk.

So why not have a more pluralistic view about the use of thought experiments? If we should, then we probably can’t make many true sweeping claims about the lack of value of surveys, since it might be more useful for one project rather than another.

I’m not sure just how much interpretive work is being called on in reading thought-experiment genre fiction. If I give some gettier cases to men and women on the street, would it be sufficient for getting the right interpretation to put the Gettier cases inside another story? Say with something like, “One day, Sam and George were talking about knowledge. Sam said that knowledge was justified true belief. George disagreed and told the following story: …” Following that, I ask, “Did Smith know that the man who would be promoted had ten coins in his pocket?” (Assuming that the Gettier case is more or less the one that Gettier offered.)

If I were to give the same Gettier case with and without the intro script to two sets of otherwise similar untrained people, would you predict a difference in the answers? In what direction would you predict the difference to go, and why?

There is something you are saying that I like: when we as trained philosophers read examples, we “simply know, in a way that doesn’t leave the possibility of doubt even open, the things that are stipulated as true.” I mean, we know the whole scenario is a fiction, and sometimes we get the crawling suspicion that it involves stipulations that will end up being inconsistent, but you are surely right that we work cooperatively with the author of the book or article to try to take all stipulations on board to the best of our abilities, as far as we can.

This way of reading may take effort, and casual readers may or may not feel inclined to expend it. As Jonathan Weinberg points out, it’s an empirical question just to what extent any particular subject or group of subjects has absorbed the stipulations of a case, and we can get somewhere by investigating the level of attention devoted to a story (looking at response times, motivation, eye-tracking, tests of recall of scenario content, etc.). I’d actually be inclined to extract the opposite conclusion from Jonathan’s on cross-cultural work, though — following the work of people like Hugo Mercier, my own sense of the literature there is that differing patterns of response in different cultures have generally been achieved by failing to motivate/engage the different cultural groups equally.

Mercier claims that across cultures, argumentative contexts optimize performance on syllogistic reasoning tasks. If that’s right, then to the extent that philosophical scenarios involve grasping logical relations it may help performance with them to be in an argumentative mindset, conscious that the author of the article is attempting to convince us one way or the other. Mercier thinks the argumentative mindset is one that produces the same sorts of effects on educated urban people and illiterate members of small-scale societies. (Wim de Neys has also done some interesting work against the idea of strong qualitative differences in human reasoning — focusing on individuals with high and low IQ.)

I like Jlive’s suggestion that people primed to think about an argument will perform differently — I would expect them to be more inclined to give the right answer to the case (that is, Gettier’s response — although the original Gettier cases are awkward, and there are later and better cases that produce stronger responses, like Lehrer’s Havit/Nogot).

Directing attention to the proposition at stake might also make them more attentive to the case. I’ve been doing comprehension screens on my Gettier cases and it’s pretty amazing how many people fail. Like, 10% don’t keep track of which guy actually had the car — or maybe they are just being totally rational and clicking randomly so they can finish the stupid survey and get out of there. But of course, as Jonathan points out, we have ways of telling who those people are.

Attention is a big issue: especially if one is running a complex Gettier or stakes case, if there are more than 6 things that the reader has to keep in mind simultaneously, watch out for performance failure. Or be at least aware that subjects have to be pretty alert and motivated to get the @#6&! story straight. I sympathize with them entirely — I’m an extremely anxious trained professional, as motivated as can be, but I feel more than a bit queasy when I have to work through a really complex scenario (Shope has some Gettier cases that make me want to lie down in a darkened room with a damp cloth over my face).

Finally, about genre, I have to tell you an anecdote. In an experiment I ran this spring, I had a bunch of filler questions which involved (what I thought were) scenarios depicting the attainment of knowledge by various means (to counterbalance the preponderance of non-knowledge deviant belief and JFB cases in the scenarios I was really interested in). For my fillers, as for my test cases, I had scenarios involving testimony, perception, etc., etc. and realized at the end that I should have some fillers involving knowledge via deduction. I wrote a scenario about a detective whose evidence “conclusively established” that the culprit had to be one of three people, and then the scenario indicated (with nice factive verbs) that he had succeeded in proving that two of these could not have committed the crime. The detective was then said to have deduced that the third party was the culprit. I expected to see knowledge ascription rates of 85-90% on this (about the ceiling). I got only 28% ascribing knowledge, the rest ascribed mere belief. What the hell?

I asked my 12-year-old son about this scenario, because it was the only one he answered “incorrectly” when he piloted the study for me. “Mom, you know in a story like that, when the author really makes it look like the one guy did it, it always has to end up being someone completely different… .”

When I told my collaborator in Psych about this later, he was pretty amused that I had ever expected my subjects to automatically engage in what they call “closed-world reasoning”, taking the stipulations as law… . (But he also said that these sorts of effects of genre are well-known hazards… .)