May 2018

I find it fascinating when an idea seems to be “in the air,” when I run into repeated instances of a concept for no obvious reason. Not too long ago, in a relatively short span of time, I encountered three cases that illustrate the importance of remembering that affirming the consequent is (still) a logical fallacy.

You can follow that link, of course, but the gist of it is that the following argument is not valid:

If P, then Q

Q

Therefore, P

In words: If P implies Q and you observe Q, you cannot conclude that P holds. Antecedents other than P may also imply the consequent Q.

Of course, P is a sufficient condition for Q, and Q is a necessary condition for P, so both of the following are valid:

modus ponens

If P, then Q

P

Therefore, Q

modus tollens

If P, then Q

not Q

Therefore, not P

The wikipedia article linked above gives some less abstract examples.

Okay, so one encounter I had with affirmding the consequent was in an Twitter argument about p values, another was in a blog post about scientific realism, and the third was in one of the funnier academic papers I’ve ever read.

Argument about p values

In honor of Ronald Fisher’s birthday, Deborah Mayotweeted “Ban bad science, not statistical significance tests. R.A. Fisher: 17 February 1890–29 July 1962”, along with an image containing this quote from Fisher: “No isolated experiment, however significant in itself, can suffice for the experimental demonstration of any natural phenomenon.”

I responded “Fisher’s position seems to imply that stat[istical] sig[nificance] is irrelevant for any individual result. A p value can provide graded information about a discrepancy, but there’s no reason to categorize as stat[istically] sig[nificant] or not. A reader can decide if a p is small enough for his/her purposes, right?”

To which Mayo responded “To spoze it’s irrelevant is to say any necessary but insufficient piece of info is irrelevant. A genuine effect in sci[ence] differs from an indication worth following up. Evidence of an exper[imental] phen[omenon] for Fisher requires being able to trigger results that rarely fail to reach stat[istical] sig[nificance].”

You can follow the links to see a couple more responses, but this last one is my focus here.

I don’t remember seeing it stated so plainly before that a small p-value is a necessary, but not sufficient, condition of there being a true effect. In case you need convincing, here are a couple shiny apps you can fiddle with to illustrate this fact: one, two.

This seems very obvious to me in retrospect, but I’m not sure I really appreciated the implications of this until recently. Specifically, the logic of modus tollens and modus ponens, and the illogic of affirming the consequent, tell us that even if a true effect (probabilistically) implies a small p value, we can’t conclude (only) from a small p value that a true effect exists. Small p values can occur for reasons other than the presence of a true effect (see, e.g., this paper for some discussion of this).

On the other hand, this suggests that we can conclude from the absence of a small p value (i.e., not Q) that there is no true effect (i.e., not P). This is just (probabilistic) modus tollens. This in turn suggests that null hypothesis significance testing (NHST) gets everything exactly backwards (and that maybe McShane and Gal (2015, 2017) should be praising their subjects’ grasp of logic rather than deriding their misunderstanding of statistical inference).

But her comment about the “necessary but insufficient” relationship between small p values and true effects reminded me that it’s important to remember not to affirm the consequent. If NHST can justify rejecting the null (e.g., via severity; see section 1.3 of this pdf), it cannot do so because the presence of a true effect implies a small p value.

Scientific realism

In a very interesting blog post about scientific realism and statistical methods, Daniel Lakens writes:

Scientific realism is the idea that successful scientific theories that have made novel predictions give us a good reason to believe these theories make statements about the world that are at least partially true… [and that scientific realists] think that corroborating novel and risky predictions makes it reasonable to believe that a theory has some ‘truth-likeness’, or verisimilitude. The concept of verisimilitude is based on the intuition that a theory is closer to a true statement when the theory allows us to make more true predictions, and less false predictions.

Putting aside the argument that theories cannot be tested in isolation (i.e., the Duhem-Quine thesis), I think it is pretty reasonable, at least to a first approximation, to argue that a true theory will make accurate predictions about observable phenomena. But the history of science is full of examples of theories that (we have decided) are not true but that nonetheless made at least some accurate predictions.

This is a key component of Laudan’s pessimistic meta-induction, which Lakens summarizes as “If theories that were deemed successful in the past turn out to be false, then we can reasonably expect all our current successful theories to be false as well.”

Lakens doesn’t explain why he disagrees with Laudan here, and it seems to me that he (Lakens) doesn’t fully appreciate that inferring truth or truthlikeness from accurate predictions is an(other) instance of affirming the consequent. As long as false theories can, on occasion, make accurate predictions, accurate predictions cannot license inferences of verisimilitude.

(Since I’m already attacking scientific realism, I may as well point out that it’s never been clear to me how truthlikeness is acceptable to realists. If a theory is only partially true, then it’salso partially untrue. But then which aspects of a theory do we require to be truthlike? And which aspects can we accept as not truthlike? And, even if, per Lakens, “only realism can explain the success of science,” it’s not at all clear that realism can explain how or why the verisimilitude of theories can steadily increase, on the one hand, while theory change can consist of (partially) mutually incompatible theories, on the other. If a later theory comes to replace an earlier theory, and if the later theory rules out some properties or mechanisms of the earlier theory, in what way can the earlier theory be said to have been truthlike? I think these are substantial, perhaps insurmountable, difficulties for the scientific realist.)

Survey validation

Maul (2017; pdf) is one of the funniest academic papers I’ve read. Granted, this is a low bar to clear, and the paper has non-humorous implications, but it made me laugh.

In the first study, Maul adapted a survey designed to measure “growth mindset”, the belief that intelligence is malleable rather than fixed. The adaptation consisted of substituting the nonsense word gavagai for “intelligence” or “intelligence level” in each survey item. In the second study, the survey items were replaced in their entirety with lorem ipsum text (i.e., nosense). In the third study, the items consisted of nothing at all – just an item number and a six-point Likert scale ranging from “strongly disagree” to “strongly agree” (this is the bit that made me laugh).

In each study, Maul calculated reliability statistics, did exploratory factor analysis (ostensibly to test the dimensionality of the constructs measured by the surveys), and calculated correlations between the survey items and (ostensibly) theoretically-related outcomes. That is, Maul carried out standard procedures for estimating survey reliability and validity.

Maul found acceptable reliability, acceptable single-factor solutions, and in the first two studies, statistically significant correlations with various measures of “Big Five” personality traits. He goes into some detail in discussing the implications of all this, and I think the paper is well worth reading.

But if you’ve made it this far, then you won’t be surprised to find out that my purpose in bringing it up here is to point out that it’s affirming the consequent to infer from acceptable reliability statistics, factor analysis solutions, and/or other-outcome correlations that a survey is valid.

If you have a valid survey, then you should get acceptable reliability and validity statistics. But Maul’s results illustrate very clearly that the converse is not true. A valid survey is not the only way to get such acceptable statistics.

Conclusion

I teach undergraduate and graduate versions of a research methods course, and in an early lecture I emphasize that research is systematic. A key component of this systematicity is basic logic. It’s not always easy, but it is always important, to remember the lessons of basic logic. Key among these lessons is the fact that affirming the consequent is now, and will forever be, a fallacy.