What Philosophy of Science Can Say for Software Engineers

I spent part of my year in Cambridge reading the History and Philosophy of Science course. It has been a thrilling and enlightening course, and I cannot recommend it highly enough for anyone lucky enough to take the HPS strand at Cambridge. Of course, I was a bit of an odd one out, since the course is designed for Natural Science majors, and I am, of course, a Computer Scientist.

In the next two posts, I’d like to highlight some of the major themes of the Philosophy of Science course, and how they may be applicable to software engineers. (Notably not computer scientists: it seems likely that their philosophy is one rooted in the Philosophy of Maths.) Not all of the questions are relevant: an old tripos question asks “Is there a unified philosophy of science, or disparate philosophies of the sciences?”—I would likely answer “both.” But I think the existing corpus of knowledge can give some insights to some tenacious questions facing us: What constitutes the cause of a bug? How does a software engineer debug? How do we know if a particular measurement or assessment of software is reliable? What reason do we have for extending our realm of experience with a software to areas for which we have no experience? Can all explanations about the high-level behavior of code be reduced to the abstractions behind them? I should be careful not to overstate my case: undoubtedly some of you may think some of these questions are not interesting at all, and others may think the arguments I draw in not insightful at all. I humbly ask for your patience—I am, after all, being examined on this topic tomorrow.

Causation

What does it mean when we say an event causes another? This is one of those questions that seem so far removed from practicality to be another one of those useless philosophical exercises. But the answer is not so simple. The philosopher David Hume observes that when we speak of causation, there is some necessary connection between the cause and effect: the bug made the program crash. But can we ever observe this “necessary connection” directly? Hume argues no: we only ever see a succession of one event to another; unlike the programmer, we cannot inspect the source code of the universe and actually see “Ah yes, there’s the binding of that cause to that effect.”

One simple model of causation is the regularity theory, inspired by a comment Hume makes in the Enquiry: a cause is “an object, followed by another, and where all the objects similar to the first are followed by objects similar to the second.” I observe that every event of “me pressing the button” is immediately followed by “the program crashing”, then I might reasonably infer that pressing the button is the cause of the crash. There is nothing unreasonable here, but now the philosopher sees the point of attack. There are many, many cases where such a simple regularity theory fails. Consider the following cases:

I press the button, but the program only crashes some of the time. Even if the bug is not 100% reproduceable, I might still reasonably say it causes the crash.

An alert dialog pops up, I press the button, and the program crashes. But it was not my pressing the button that caused the crash: rather, it’s more likely it was whatever caused the alert dialog to pop up. (You may have had an experience explaining this to a less computer-savvy family member.)

I have only pressed the button once, and that one time the program crashed. It is indeed the case that whenever I pushed the button, a crash came afterwards: but it’s possible for me to press the button now and no crash to occur.

Perhaps no reasonably practiced software engineer uses this model of causation. Here is a more plausible model of causation, the counterfactual model (proposed by David Lewis). Here we pose a hypothetical “if” question: if pushing the button causes a crash, we may equally say “if the button had not been pressed, then the crash would not have happened.” As an exercise, the reader should verify that the above cases are neatly resolved by this improved model of causality. Alas, the counterfactual model is not without its problems as well:

Suppose that our crashing program has two bugs (here we use “bug” in the sense of “source code defect”). Is it true that the first bug causes the crash? Well, if we removed that bug, the program would continue to crash. Thus, under the counterfactual theory of causation, the first bug doesn’t cause the crash. Neither does the second bug, for that matter. We have a case of causal overdetermination. (Lewis claims the true cause of the bug is the disjunction of the two bugs. Perhaps not too surprising for a computer scientist, but this sounds genuinely weird when applied to every-day life.)

Suppose that our crashing program has a bug. However, removing the first bug exposes a latent bug elsewhere, which also causes crashes. It’s false to say removing the first bug would cause the crashing to go away, so it does not cause the crash. This situation is called causal preemption. (Lewis’s situation here is to distinguish between causal dependence and causal chains.)

What a software engineer realizes when reading these philosophers is that the convoluted and strange examples of causation are in fact very similar to the knots of causality he is attached to on a day-to-day basis. The analysis here is not too complicated, but it sets the stage for theories of laws of nature, and also nicely introduces the kind of philosophical thinking that encourages consideration of edge-cases: a virtuous trait for software engineers!

Methodology and confirmation

One of the most famous debates in philosophy of science to spill over into popular discourse is the debate on scientific methodology—how scientists carry out their work and how theories are chosen. I find this debate has direct parallels into the art of debugging, one of the most notoriously difficult skills to teach fledgling programmers. Here we’ll treat two of the players: inductivism (or confirmation theory) and falsificationism (put forth by Karl Popper.)

Sherlock Holmes once said this about theories: “Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” He advocated an inductivist methodology, in which the observer dispassionately collects before attempting to extract some pattern of them—induction itself is generalization from a limited number of cases. Under this banner, one is simply not allowed to jump to conclusions while they are still collecting data. This seems like a plausible thing to ask of people, especially perhaps profilers who are collecting performance data. The slogan, as A.F. Chalmers puts it, is “Science is derived from facts.”

Unfortunately, it is well known among philosophers of science that pure inductivism is deeply problematic. These objects range from perhaps unresolvable foundational issues (Hume’s problem of induction) to extremely practical problems regarding what scientists actually practice. Here is a small sampling of the problems:

What are facts? On one level, facts are merely sense expressions, and it’s an unreasonable amount of skepticism to doubt those. But raw sense expressions are not accessible to most individuals: rather, they are combined with our current knowledge and disposition to form facts. An expert programmer will “see” something very different from an error message than a normal end-user. Fact-gathering is not egalitarian.

Facts can be fallible. Have you ever analyzed a situation, derived some facts from it, only to come back later and realize, wait, your initial assessment was wrong? The senses can lie, and even low-level interpretations can be mistaken. Inductivism doesn’t say how we should throw out suspicious facts.

Under what circumstances do we grant more weight to facts? The inductivist says that all facts are equal, but surely this is not true: we value more highly facts which resulted from public, active investigation, than we do facts that were picked up from a private, passive experience. Furthermore, an end-user may report a plethora of facts, all true, which an expert can instantly identify as useless.

And, for a pure bit of philosophy, the problem of induction says that we have no reason to believe induction is rational. How do we know induction works? We’ve used in the past successfully. But the act of generalizing this past success to the future is itself induction, and thus the justification is circular.

This is not to say that inductivism cannot be patched up to account for some of these criticisms. But certainly the simple picture is incomplete. (You may also accuse me of strawman beating. In an educational context, I don”t think there is anything wrong here, since the act of beating a strawman can also draw out weaknesses in more sophisticated positions—the strawman serves as an exemplar for certain types of arguments that may be employed.)

Karl Popper proposed falsificationism as a way to sidestep the issues plaguing induction. This method should be another one that any software engineer should be familiar with: given a theory, you then seek an observation or experiment that would falsify it. If it is falsified, it is abandoned, and you search for another theory. If it is not, you simply look for something else (Popper is careful to say that we cannot say that the theory was confirmed by this success).

Falsification improves over inductivism by embracing the theory-dependence of observation. Falsificationists don’t care where you get your theory from, as long as you then attempt to falsify it, and also accept the fact that there is no way to determine if a theory is actually true in light of evidence. This latter point is worth emphasizing: whereas induction attempts to make a non-deductive step from a few cases to a universal, falsification can make a deductive step from a negative case to a negative universal. To use a favorite example, it is logically true that if there is a white raven, then not all ravens are black. Furthermore, a theory is better if it is more falsifiable: it suggests a specific set of tests.

As might be expected, naive falsificationism has its problems too, some which are reminiscent of some problems earlier.

In light of a falsification, we can always modify our theory to account for this particular falsifying instance. This is the so-called ad hoc modification. “All ravens are black, except for this particular raven that I saw today.” Unfortunately, ad hoc modifications may be fair play: after all, there is no reason why software cannot be modified for a particular special case. Better crack open the source code.

Falsificationism suggests we should always throw out a theory once we have seen falsifying evidence. But as we saw for inductivism, evidence can be wrong. There are many historic cases where new theories were proposed, and it was found that they didn’t actually fit the evidence at hand (Copernicus’s heliocentric model of the universe was one—it did no better than the existing Ptolemaic model at calculating where the planets would be.) Should these new theories have been thrown out? Real scientists are tenacious; they cling to theories, and many times this tenacity is useful.

To turn this argument on its head, it is never the case that we can test a theory in isolation; rather, an experimental test covers both the theory and any number of auxiliary assumptions about the test setup. When a falsifying test is found, any one of the theory or auxiliary assumptions may be wrong—but we don’t know which! The Duhem-Quine thesis states that given any set of observations, we are always able to modify the auxiliary assumptions to make our theory fit (this thesis may or may not be true, but it is interesting to consider.)

All of these problems highlight how hard it is to come up with an accurate account of what is called the “scientific method.” Simple descriptions do not seem to be adequate: they sound intuitively appealing but have downsides. The practicing scientist is something of an opportunist: he does what works. So is the debugger.

Next time, I hope to talk about quantification, measurement and reduction.