what do they got, a lot of sand? we got a lot of sand too

When water freezes, it expands. Thus, my latest invention places a weight on a column of water, then freezes the water to lift the weight, doing work. By using a weight heavy enough that the work done is greater than the energy expenditure to freeze the water, I create a perpetual motion machine.

The Second Law of Thermodynamics tells you that this is impossible. But it tells you so in a non-constructive way. It doesn’t say which step is mistaken, what unmentioned physical mechanism is required for its satisfaction, or exactly how it works out mathematically. Even partial explanations—say, water under pressure takes more energy to freeze—aren’t completely satisfactory, leaving questions like: Why does it work out no matter how much weight I load it with? How would it be different for something like mercury that contracts when freezing? The Second Law only tells you it’s hopeless.a

That’s fine when you’re deciding whether or not to invest in my startup, but if you’re trying to learn about the world, you should feel a persistent itch, a gap where an explanation should go. And it’s not the simple sensation of not knowing. You might describe it as tension, confusion, or even paradox.

There are (at least) two major reasons to pay attention to the use of paradox in teaching and learning. The first is that itch—it’s motivating. There’s not merely a blank space in your picture of the world—there’s an apparent contradiction. If you try to fill in the gap from different directions, you get different answers. One of my major qualitative takeaways from my old self-experiment on noticing confusion was that in cases like this I didn’t even need any kind of trigger-action plan to follow up on the noticing, since the need to resolve it was so powerful—and even having written that, I continue to underestimate the power of this effect.

So if you’re psychologically anything like me, for the sake of motivation alone, I suggest you look for ways to manufacture contradictions out of gaps in your knowledge.

The second thing to pay attention to is what happens when you resolve the paradox. On the most benign level, you get a fuller picture of why things are the way they are when you see that they couldn’t be some other way. In a sense, no explanation is complete without this. Going a bit deeper, doing this helps you develop a more integrated mental model—you don’t have recipes for particular calculations, but mechanisms that need to be considered wherever applicable. Explanations can’t just be one-off descriptions of unexplained phenomena, but instead are part of the machinery that keeps everything adding up to normality.

So when debugging, try to prove that your code works. If you don’t know the answer to a question, find a reasonable-seeming answer that evidently doesn’t work and ask why. If you don’t know what something is, ask how it’s different from something that seems like it should be similar. Imagine counterexamples and attempt proofs by contradiction. “Why do things this way?” can manufacture questions of the form “why not do things this other way?”—even if the other way is overly simple, or you don’t have a good explanation for doing it the other way either, replacing the former question with the latter can be very effective. “How does the argument for this position work?” gives you “why doesn’t your argument go through in this other domain?”

Be especially watchful for incomplete explanations—the ones that point to the right answer but don’t know their own limits. The sky is blue rather than red because of the wavelength dependence of Rayleigh scattering, but why is it blue rather than violet?

But be careful—the feeling of resolving paradoxes can be extremely satisfying, and the impression of depth of explanation that you get can be unduly persuasive. It’s important to avoid accidentally drilling “things that seem confusing are always nothing to worry about”, even if it’s by resolving the confusion instead of suppressing it. Instead, you want to frame this kind of pedagogy as “things that seem unsatisfactory are worth poking”—using points of confusion not to show the power of the right simple foundations to resolve them, but to expose new ideas and emphasize that they’re necessary to make things work out right.

A lot of E. T. Jaynes’s writing works by setting up veridical paradoxes and knocking them down. It has a powerful psychological effect on a certain mindsetb—as exposition it’s both engaging and persuasive. But it can easily leave one in ignorance of why the advocated approach might be unsatisfactory. You begin to feel like anything opaque or confusing has some natural explanation in terms of what you already know. You feel safe black-boxing any mystery as essentially already resolved, as it awaits only the application of the same foundations as everything else.

This resolving-paradoxes-as-rhetorical-style is especially explicit in Jaynes’s “Clearing Up Mysteries”, from a 1988 MAXENT workshop:

The recent emphasis on the data analysis aspect stems from the availability of computers and the failure of orthodox” statistics to keep up with the needs of science. This created many opportunities for us, about which other speakers will have a great deal to say here. But while pursuing these important applications we should not lose sight of the original goal, which is in a sense even more fundamental to science. Therefore in this opening talk we want to point out a field ripe for exploration by giving three examples, from widely different areas, of how scientic mysteries are cleared up, and paradoxes become platitudes, when we adopt the Jeffreys viewpoint. Once the logic of it is seen, it becomes evident that there are many other mysteries, in all sciences, calling out for the same treatment.

In his first example, diffusion, he states a naive paradox: our particles have a symmetric velocity distribution, so how do we get nonzero flux anywhere? He then describes Einstein’s derivation of Fick’s diffusion law as unsatisfactory, and then gives a clear derivation using Bayes’ theorem.

I had to look up Einstein’s argumentc, and it’s true that it derives the diffusion equation in terms of velocity distributions and particle density over time without reference to any instantaneous flux, whereas Jaynes’ version is from the perspective of a single particle at a single instant in time (the rest coming in through the prior—a particle is more likely to have come from a denser region) and gives the flux among other results.

But the derivation/presentation I first learned in stat mechdis at least as clear about what I see as the main point—given a surface, more particles pass through from the denser side—without invoking Bayes. It’s also more immediately obvious to me how to adapt it to other problems, although that might just be because I’m not as used to the methods Jaynes likes.

Did Jaynes find that one unsatisfactory too? He doesn’t bring it up, maybe because he hadn’t seen it or because it made a weaker point than talking about how handicapped Einstein was by not knowing the true logic of science. I’d guess it was unsatisfactory to him, since from a certain perspective it still depends on “forward integration” that Jaynes prefers to think about as prior information, although it can be trivially rephrased.

But why should I care about deriving a flux from a probability distribution for a single particle rather than from the perspective of a single surface? That seems less important to me than the clarity of the moral lesson and the general adaptability. And the “paradox” is still resolved by the same moral lesson and by noting that while that the statistical velocity distribution of all particles has zero mean, you can still have flux across a specified surface—the mean velocity only tells you about the motion of the particles’ center of mass, which indeed won’t move under free diffusion! It’s natural that taking some subset of the particles should give you a different distribution, and there are natural ways of thinking about it that aren’t in Jaynes’s terms—just think of the motion of rocks in Saturn’s rings.e

Thus, a slightly more specific warning: a benevolent & complete Paradox Pedagogy needs to remember that the resolution to a paradox (e.g. how do we get net particle flux anywhere if the velocity distribution is symmetric) has to explain

why it’s not a paradox; i.e., why you shouldn’t have expected the naive reasoning to workh

ideally making it clear that the reason it’s not a paradox is somewhat independent of any particular resolution itself.

If you only give one or two of these, then you’re at greater risk of being unduly persuasive regarding the One-True-Way-ness of your methods. And it’s not just a matter of intellectual etiquette or scrupulous hedging. You really won’t have explained everything—there will be lingering confusion, noticed or otherwise.

How far could this be taken? Could one organize a physics course around rejecting naive arguments for false conclusions? Of course, a lot of classic paradoxes are uninteresting edge cases, but you can take care not to make it a game of “spot the hidden violated assumption that’s never violated in practice”.

How much thermodynamics can you get with “why doesn’t this perpetual motion device work?” What do we lose by presenting quantum mechanics by fiat? Do we beat into students an instinct for suppressing their confusion and accepting non-explanations? My experience is that typical undergrad courses tend to leave students confused about what exactly is special or surprising about quantum mechanics, at least until they start trying to show whether various experimental results have [semi]classical explanations.i

It certainly feels like a lot of my new insights come from asking why I can’t just do the stupid, obvious thing, or how two vaguely contradictory things can be compatible—in contrast to, say, thinking really hard about how to solve a given problem. But it’s not the answer to everything; plenty of approaches are better than just thinking real hard.

And you can get even trickier: you notice that pressure lowers the freezing point. So if you’re operating at the right point on the phase diagram, you can freeze the water by decreasing pressure above it, thus lifting your weight and doing work. Then you let it warm up and melt, then increase the pressure and cool it back down to get where you started in the liquid phase, giving you a different engine cycle where it’s arguably harder to show why it can never give perfect (or better) efficiency. (back)

as it sounds like Eliezer Yudkowsky experienced and [I think unfortunately] tried to emulate (back)

“the overall mean velocity tells us about the motion of the mean of the particle distribution, involving total flux across all surfaces, which is indeed zero in the absence of boundaries or the density going off to infinity, and is totally consistent with nonzero flux across any given surface; but for a given surface, nearby particles may be more likely to have come from one direction, so it doesn’t make sense to reason from just the overall distribution without incorporating that additional information from the particles of interest” (back)

Jaynes himself spent a while attempting a semiclassical account of the Lamb shift, which despite some progress ended with losing his $100 bet on it. (back)

And you can get even trickier: you notice that pressure lowers the freezing point. So if you’re operating at the right point on the phase diagram, you can freeze the water by decreasing pressure above it, thus lifting your weight and doing work. Then you let it warm up and melt, then increase the pressure and cool it back down to get where you started in the liquid phase, giving you a different engine cycle where it’s arguably harder to show why it can never give perfect (or better) efficiency.

as it sounds like Eliezer Yudkowsky experienced and [I think unfortunately] tried to emulate

section 4 in his Brownian motion paper, which is roughly how I learned to think about transport in condensed matter, I think mostly because it gives a good entry point into the Sommerfeld approximation

which is basically the one on Wikipedia, and more recent than Einstein’s but still prior to 1988

From this perspective, Jaynes avoids making clear why you shouldn’t have expected the total velocity distribution alone to tell you about flux.

“you can incorporate information about where a particle came from to get a velocity distribution from the perspective of a single particle, which can be used to calculate the flux”

“forward integration”, at least to Jaynes

“the overall mean velocity tells us about the motion of the mean of the particle distribution, involving total flux across all surfaces, which is indeed zero in the absence of boundaries or the density going off to infinity, and is totally consistent with nonzero flux across any given surface; but for a given surface, nearby particles may be more likely to have come from one direction, so it doesn’t make sense to reason from just the overall distribution without incorporating that additional information from the particles of interest”

Jaynes himself spent a while attempting a semiclassical account of the Lamb shift, which despite some progress ended with losing his $100 bet on it.

William Thurston’s On Proof and Progress in Mathematics (1994) [17-page PDF] is good, unassuming philosophy of science. I think it’s pretty well-known, but given my goals for link posts I’m not worried about being redundant.

Some art doesn’t need to be experienced to be mostly apprehended.a A good enough description suffices. Maybe the thing follows convention too closely, or the author is in a rut. But less redundant art can still internally have a variety of flavors of compressibility:

Signposting: Rather than do the thing, loudly proclaim that you’re doing the thing. Candidates: open-world cRPGs, Inception, Guardians of the Galaxy, a lot of “experimental” things. “Pretension” may be a subset of this.

Inflation: Anything you can say in a sentence, say in a paragraph. Anything you can say in a paragraph, say in an essay/chapter/post. Anything you can say in a blog post, say in a book. Examples: Most popular nonfiction. A lot of TV also seems to do this, mostly out of lack of confidence that people will get it.

Commodification: Package your work such that it becomes fungible with similar work. Examples: [insert thing from genre you don’t like].

Gimmickry: Distinguish yourself by doing something whose main feature is that it distinguishes yourself. Examples: I can’t actually think of any that I agree with.

Seasoning: Use drop-in elements that have the same effect regardless of context. The context-insensitivity lets it feel manipulative even if it works. (The seasoning elements don’t need to be clichés, which I’d instead describe as conveying the same meaning regardless of context. Both do generally rely on external conventions.) Examples: some Pixar shorts (and movies), lots of movie and game soundtracks.

Whether or not these are “bad” depends on your goal. Done right you can use these as tools to cheaply make things more powerfully affect a broader audience—it isn’t trickery, it’s effective artistry.

Moreover, the above terms are just ways of pointing to what didn’t work for you and the sense in which it failed, and pretty much can’t themselves constitute criticism. A gimmick is usually intended to have an effect, but for whatever reason it was outweighed (for you) by its own distinctiveness. By calling it a gimmick you haven’t actually explained how it failed to have that effect. By calling something seasoning you’re just asserting that it failed to connect with the work’s substance. Why is the thing “inflated” and not “taking advantage of the effects of repetition and examples” or “helpfully spelling out the entailments of each scene”? And so on.

It’s hard for someone to know what you mean (let alone be persuaded) if all you’re doing is gesturing towards possible critiques, unless the gesture happens to resonate with them. Substantiating the thing you’re pointing to takes more astute introspection and close reading than this. So the first thing is to recognize that there is a gap between the gesture and the substance, and the next is to recognize that this can be bridged. It’s not a matter of objective substance, but neither is it a matter of inscrutable taste and ineffable experience.

We often call this “bad” art. Still, sometimes it can be hard to tell the difference between a commodity and a classic. We’re only able to engage with some authors by reference alone because they were so successful. (back)

We often call this “bad” art. Still, sometimes it can be hard to tell the difference between a commodity and a classic. We’re only able to engage with some authors by reference alone because they were so successful.

In the above table (from Mary Hesse, Models and Analogies in Science), we notice that there are a lot of apparent correspondences between water waves, sound, and light. The “horizontal” notion of similarity lets us notice that sound echoes and light reflects, or that these things all have some sort of “intensity” and “flavor”.

But it’s important to introduce the “vertical” analogy—the items in each column are related by some causal, organizing principle, and there’s a correspondence between those principles. We expect all sorts of things to have similar traits entirely by accident. You won’t get very far filling in a table’s gaps by arguing “this is like that”—better to say “this model is like that model”. You’re taking advantage of a ready-made language (with entailed internal relationships) for a metaphoric redescription of a new subject. In this way analogies can be a useful guide to teaching and learning new models in new domains. (You’ll realize, for example, that “produced by moving flame” isn’t really the appropriate correspondence there, because the motion of the flame doesn’t have to do with color in the way that the motion of a gong has to do with pitch, and eventually you’ll learn something about the production of light.) But what’s this about the medium of light—“ether”?

Well, if we observe that the vertical analogy works for the first two models, and it works for the third up to that point, then light having a medium starts seeming plausible enough for us to start looking into it. And it additionally suggests to us how to go about looking. But while the vertical analogy gives us a stronger inductive inference than does the horizontal analogy alone, it’s still quite weak. Light, it turns out, doesn’t seem to propagate through an ether. (But even the “negative analogy”—the apparent hole in the vertical analogy where light’s medium would go—suggests that “why not?” is an interesting question.)

There are two parts to what I just said, so I’ll work them out a little further:

The use of analogy in science is partly pedagogical—it’s about explaining thingsa in terms of better-understood things, through their models’ shared structure, or their horizontal points of similarity or difference (positive and negative analogies). If the structure of the relation you’re trying to draw is somewhat confusing on its own terms, or not readily distinguished from a similar model, it could be easier to communicate with reference to another domain. It’s easier to understand what we could possibly mean by “light is a wave” if you already know about water waves. And if we’re being careful, we say “light is a wave like how water waves are waves,” not “light is like water”—we care about communicating the vertical relation, not arguing for horizontal similarity.

And the use of analogy is also about guiding discovery—“neutral” analogies becoming definitely positive or negative as they’re used to pinpoint places to investigate. It can be useful to make tentative inferences based on similarity of causal relations to those of a better-understood model, but these inferences really are provisional. You don’t know if light has a medium, but the analogy has worked so far, so you design experiments (guided by the analogy) that would detect such a medium. You get a handy working hypothesis and a set of questions to study, and not much else. Your analogy is usually not a very good piece of evidence about its subjects—not good enough to use for engineering—but often still good enough to help decide what’s worth investigating. (And, as often when people talk about history and philosophy of science, the big, obvious examples are recapitulated in the everyday work of science on much smaller scales. It’s not always about major physical models like electromagnetism and quantum mechanics, but rather implicit in the kind of reasoning that guides investigation from week to week.)

(Philosophers of science also [used to] like to argue about whether analogy is necessary for explanation and/or discovery—that’s the dialogue Hesse was participating in above. This is out of scope for us, unless I’m being sneaky.)

Can we take this understanding of analogy outside of science? When is it worthwhile to inject an argument by analogy into your internet? And when is it worthwhile to dispute an analogy?

First, when you need better exposition. If you’re making an argument that’s hard to spell out in its own language, or easily confused for a more common argument you decidedly don’t want to make, then an analogy might be clarifying. (And usually more compelling to read, with that stimulating sensation of insight we get from novel connections, which should make all analogies suspect.) This is where it helps to say “this argument is like that argument” rather than “this is like that”. But be careful not to mistake this for substantiating your argument.

Second, to point to questions to investigate. If you’re not sure how an argument should come out, you can find other arguments in other domains that look like they flow in the same way. Then the points of analogy are good places to look for the evidence your argument hinges on. And disputing the analogy—saying a point of analogy is neutral or negative—is how your interlocutor points to where they think the contrary evidence lies.

Maybe this is a tedious distinction to keep making, but the usefulness of analogy is not, primarily, in making an inductive inference based on the fact that one model looks mostly like another, where the correctness of that inference depends on the success of the analogy between models. Usually, rather than argument by analogy, you want exposition or guidance by analogy.

Along these lines, it could be generally useful to distinguish between different levels of putting forward and substantiating a claim. You can talk about a position, an argument for that position, or evidence that the argument hinges on. In doing so you can be anywhere between just pointing to, or describing, or actually demonstrating the bit you’re talking about. If someone thinks you’re further down the list than you are, then you’re liable to get mired in a bad discussion. (Most debates don’t get past pointing to where evidence can be found, and most blogging (including this post) doesn’t get past pointing to positions or arguments either. Maybe that’s fine. Pointing is cheap, both to write and to read. Going deeper can be superfluous, if you’re pointing to the obvious. [And if you are really just pointing, please consider whether you need so many words.] Starting out by pointing could get you to the crucial evidence for resolving a disagreement faster. And so on.)

Keep an eye out for large, ordered collections of bite-sized chunks of similar (but not too similar) intellectual material. This is the kind of thing that I like to use for exercises, like a paper with many proposed engineering techniques, a site with many social science results, the many answers to Edge.org questions, or news and commentary pieces from a journal’s website. These can be used as a large yet sufficiently sophisticated supply of fresh examples to practice on, kind of like how I used a hymnal to get better at sight reading back when I had a tendency to memorize everything I played. In a similar sense in Thinking on the page, I mention film review collections and textbook physics problems in almost the same breath. Or, more generally, these can be useful not as literal exercises but as something where you might learn more from many small instances than from one large account; this is one reason I like the histories in Beyond Discovery. Notice such collections in the spirit of effectuation, not optimization: rather than work backward from the thing you want to practice, cultivate awareness of the affordances of material around you.

“Thinking on the page” is a handle that I’ve found useful in improving my writing and introspection more generally. When I write, for the most part, I’m trying to put something that I already feel is true into words. But when I think on the page, the words are getting ahead of my internal sense of what’s true. I’m writing something that just sounds good, or even just something that logically flows from what I’ve already written but has come untethered from my internal sense. It’s kind of a generalized verbal overshadowing.

I don’t think this is challenging only to people who think [of themselves as thinking] non-verbally, considering how much more universal are experiences like “this puts exactly what I believe into words better than I ever could” or even the satisfaction of finding a word on the tip of the tongue. Some people seem to be better than others not just at describing their internal sense of truth, but at tapping into it at all. But if you think only in internal monologue, you may have a very different perspective on “thinking on the page”—I’d be interested to hear about it.

At best, this is what happens in what Terry Tao calls the “rigorous” stage of mathematics education, writing, “The point of rigour is not to destroy all intuition; instead, it should be used to destroy bad intuition while clarifying and elevating good intuition.” At worst, it’s argument based on wordplay. Thinking on the page can be vital when you’re working beyond your intuition, but it’s equally vital to notice that you’re doing it. If what you’re writing doesn’t correspond to your internal sense of what’s true, is that because you’re using your words wrong, or because you need to use the page as prosthetic working memory to build a picture that can inform your internal sense?

The two places this becomes clearest for me are in academic writing and in art critique. Jargon has the effect of constantly pulling me back towards the page. If it doesn’t describe a native concept, I can either heroically translate my entire sense of things and arguments about them into jargon, or I can translate the bare raw materials and then manipulate them on the page—so much easier. As for art, the raw material of the work is already there in front of me—so tempting to take what’s easy to point to and sketch meaning from it, while ignoring my experience of the work, let alone what the raw material had to do with that experience.

A lack of examples often goes hand in hand with thinking on the page. Just look at that last paragraph: “translate”, “raw materials”, “manipulate”—what am I even talking about? An example of both the jargon and art failure modes might be my essay about Yu-Gi-Oh! Duel Monsters. My analysis isn’t entirely a joke, but it’s not a realistic reading in terms of the show’s experiential or rhetorical effect on the audience, intended or otherwise. The protagonist’s belief in the heart of the cards and his belief in his friends are genuinely thematically linked, but neither one is the kind of “shaping reality by creative utterance” that has anything to do with how the characters talk their way around the procedural weirdness of the in-show card game as game. But when I put all these things in the same language, I can draw those connections just fine. I’m playing a game with symbols like “creative utterance”.

How can one notice when this is happening? Some clues for me:

I feel like I’m moving from sentence to sentence rather than back and forth between thought and sentence

I feel something like “that’s not quite right”

I feel a persistent “tip of the tongue” sensation even after writing

I feel clever

I haven’t used an example in a while

I’m using jargon or metaphor heavily

What can one do after noticing?

Try to pull the words back into your internal picture, to check whether they fit or not—they might, and then you’ve learned something

Rewrite without self-editing until something feels right, with permission to use as many words or circumlocutions as it takes

Try to jar the wording you want mentally into place by trying more diverse inputs or contexts (look at distracting media, related essays, a thesaurus)

Ask “but is that true?”

Connect with specific examples

Focus on the nonverbal feeling you want to express; try to ask it questions

What’s a good way to practice?

Write reviews of art/media you encounter, then read other people’s reviews. As far as “not being led astray by thinking on the page” is more than the skill of writing-as-generically-putting-things-into-words, I think this is a good place to practice what’s particular to it. People seem to have a good enough sense of what they liked and why for good criticism to resonate, but often not enough to articulate that for themselves, at least without practice. So it can be good to pay attention to where the attempted articulations go wrong.

Write/read mathematical proofs or textbook physics problems, paying attention to how the steps correspond to your sense of why the thing is true (or using the steps to create a sense of why it’s true)

If it seems like the sort of thing that would do something for you, find or develop a meditation practice that involves working with “felt senses” (I don’t have a real recommendation here, but it’s the kind of thing Focusing aims for)

The goal isn’t to eliminate thinking on the page, but to be more deliberate about doing it or not. It can be useful, even if I haven’t said as much about that.

One thing I don’t recommend is using “you’re thinking on the page” as an argument against someone else. If you find yourself thinking that, it’s probably an opportunity to try harder to get in their head. Like most of these things, as a way thinking can go wrong, this is a concept best wielded introspectively.

(Here’s a puzzle: if this is a first-level skill, can you go up another level? If I’m saying something like “felt senses/thoughts want to make words available”, then what things “want to make felt senses available”? Can you do anything with them?)

Since it’s been a while, I’ll reaffirm my pre-hiatus policy on linkposts:

I’d like to use your attention responsibly. To that end, I want to avoid spraying links to whatever’s recently hijacked my brain. When I share a link, I’ll do my best to make it a classic: a text I’ve had time to consider and appreciate, whose value has withstood the vagaries of the obsession du jour, my own forgetfulness and mental flux; something that changed my mind or (and) continues to inform my outlook; a joy to read and re-read. A piece of my personal Internet canon. Anyway, don’t get your hopes up.

My previous post, on fair tests with unfair outcomes, is a narrow piece in a broad dialogue about quantitative measures in prediction and decision-making. I pointed out one specific statistical fact that shows up in certain situations, and observed that it’s valuable to consider “statistical fairness” not just of a test but of the decisions and consequences that follow from it. Outcomes that are fair in the ways we care about don’t necessarily flow from tests that give unbiased predictions, especially when we care about error rates of decisions more than point estimates from predictions (i.e. more than what some people think of as the the part of the whole business that needs to be fair or not).

One much more important piece of this dialogue is Tetlock’s Expert Political Judgment. I strongly recommend it, for its substantive contribution to what we can say on the subject, its practical grounding, and its thread of doing better through intellectual humility. It also serves as a great example of thorough and even-handed analysis, and the patient exhaustion of alternative explanations.

Another such piece is Laurence Tribe’s Trial By Mathematics. Tribe systematically considers mathematical methods in civil and criminal trials. His analysis applies much more broadly. Along the way he makes a number of subtle points about probability, Bayesianism, and utility, as well as about the tradeoffs between trial sensitivity and specificity. In his words:

A perhaps understandable pre-occupation with the novelties and factual nuances of the particular cases has marked the opinions in this field, to the virtual exclusion of any broader analysis of what mathematics can or cannot achieve at trial—and at what price. As the number and variety of cases continue to mount, the difficulty of dealing intelligently with them in the absence of any coherent theory is becoming increasingly apparent. Believing that a more general analysis than the cases provide is therefore called for, I begin by examining—and ultimately rejecting—the several arguments most commonly advanced against mathematical proof. I then undertake an assessment of what I regard as the real costs of such proof, and reach several tentative conclusions about the balance of costs and benefits.

He then stands back and asks about the broader effects of adopting a quantitative rule, not in terms of a utility calculation but as a matter of legitimacy and confidence in the justice system:

As much of the preceding analysis has indicated, rules of trial procedure in particular have importance largely as expressive entities and only in part as means of influencing independently significant conduct and outcomes. Some of those rules, to be sure, reflect only “an arid ritual of meaningless form,” but others express profoundly significant moral relationships and principles—principles too subtle to be translated into anything less complex than the intricate symbolism of the trial process. Far from being either barren or obsolete, much of what goes on in the trial of a lawsuit—particularly in a criminal case—is partly ceremonial or ritualistic in this deeply positive sense, and partly educational as well; procedure can serve a vital role as conventionalized communication among a trial’s participants, and as something like a reminder to the community of the principles it holds important. The presumption of innocence, the rights to counsel and confrontation, the privilege against selfincrimination, and a variety of other trial rights, matter not only as devices for achieving or avoiding certain kinds of trial outcomes, but also as affirmations of respect for the accused as a human being—affirmations that remind him and the public about the sort of society we want to become and, indeed, about the sort of society we are.

I’m always reluctant to excerpt things like this—particularly summaries of arguments like in the second quote—because it feels like I’m implying that the summary is the extent of what you’ll get out of it, and that it’s not worth reading in its entirety. Let there be no confusion: Trial By Mathematics belongs on any reading list about decision-making under uncertainty, and it’s only 65 pages.

[Summary: Say you use a fair test to predict a quality for which other non-tested factors matter, and then you make a decision based on this prediction. Then people who do worse on the test measure (but not necessarily the other factors) are subject to different error rates, even if you estimate their qualities just as well. If that’s already obvious, great; I additionally try to present the notion of fairness that lets one stop at “the test is fair; all is as it should be” as a somewhat arbitrary line to draw with respect to a broader class of notions of statistical fairness.]

[Status: I’m sure this is well known, but it doesn’t seem to have a popularized handle. I’d appreciate pointers to explanations by people who are less likely to make statistical or terminological errors. I sometimes worry I do too much background reading before thinking aloud about something, so I’m experimenting with switching it up. A quick search turns up a number of papers rediscovering something like this, like Fair prediction with disparate impact; see also this article about COMPAS and related paper about the cost of fairness.]

What’s a fair test?a Well, it probably shouldn’t become easier or harder based on who’s taking it. More strongly, we might say it should have the same test validity for the same interpretation of the same test outcome, regardless of the test-taker. For example, using different flavors of validity:

Construct validity: To what extent does the test measure what it claims to be measuring? “Construct unfairness” would occur when construct validity varies between test-takers. If you’re measuring “agility” by watching an animal climb a tree, that could be valid for cats, but less so (hence unfair) for cats and dogs together.b

Predictive validity: To what extent is the measure related to the prediction or decision criterion the test is to be used for? Imagine a test that measures what it claims to measure and isn’t biased against anyone, but isn’t predictive for some subset of of the population. Filtering everyone through this test could be considered unfair. If we consider the test as administered and not just in the abstract, we also run into predictive unfairness due to differential selection bias for test-takers from different groups.

As an example of predictive unfairness, say I’m hiring college students for a programming internship, and I use in-major GPA for a cutoff.c I can say it has construct fairness if I don’t pretend it’s a measure of anything more than performance in their major.d But that number is much more predictive of job performance for Computer Science majors than for Physics majors.

This is “unfair” in a couple ways. Many CS students will be more prepared for the job by virtue of experience, but will be outranked by non-coder Physics students with high GPA. At the same time, the best Physics majors for the job can be very good at programming, but that mostly won’t show up in their physics GPA.

Can we make the use of a GPA cutoff “fair”? Well, say the correlation between physics GPA and coding experience is small but nonzero. We can raise the cutoff for non-CS majors until the expectation for job performance at the two cutoffs are the same. From the employer’s point of view, that’s the smart thing to do, assuming threshold GPA-based hiring.e Then we have a new “test” that “measures” [GPA + CS_major_bonus] that has better predictive fairness with respect to job performance.f We’re still doing poorly by the secretly-coding physicists, but it’s hard to see how one could scoop up more of them without hiring even more false positive physicists.g

Intuitively, “fairness” wants to minimize the probability that you will fail the test despite your actual performance—the thing the test wanted to predict—being good enough, or that I will succeed despite falling short of the target outcome, perhaps weighted by how far you surpass or I fall short of the requirements. In these terms, we also want to minimize the effects of chance by using all the best information available to us. Varying the cutoff by major seems to have done all that.

So is it a problem that the part of the test that’s directly under the students’ control—their GPA (for the sake of argument, their major is fixed)—is now easier or harder depending on who’s taking it? In this case it seems reasonable.

But there’s still at least one thing about fairness we didn’t capture: we may want the error probabilities not to depend on which groups we fall into. Our model of fairness doesn’t say anything about why we might or might not want that. Perhaps there’s still a way to do better in terms of equalizing the two kinds of error rates between the two populations. Hmm…

What’s a test? For now, it’s a piece of imperfect information about some real quality that you want to use to make a decision or prediction. (back)

Dogs do have less of things like joint and spine flexibility, which may matter equally well for tree-climbing and general agility, but they also lack cats’ claws, which mostly help with the trees. (back)

That is, the average grade a student has received in classes in her major. But this is meant to be a generic example; these issues are not far-fetched and can show up in different and even qualitative forms in school, work, court, politics, daily life, charity evaluation… (back)

There’s some gerrymandering here: is it a fair construct for measuring academic performance, or an unfair construct for coding ability? (back)

More realistically, the employer cares about expected utility of letting the candidate past the GPA screen. If false negatives are really bad, and qualifications beyond the minimum don’t matter, then they’d equalize the false negative rate, requiring an even higher cutoff for Physics. If there’s a not-too-costly interview after the GPA screen, then physics majors may need a lower cutoff: the CS GPA tells you all you need to know but the Physics GPA tells you little. Improving your instrument is good, if you can afford it. (back)

Note that this isn’t a correction for CS majors having a higher mean performance, despite looking kind of like it—it’s just an ad-hoc adjustment for the fact that GPA is less correlated with coding skill for Physics majors, who could be just as good at programming, on average. (back)

What’s a test? For now, it’s a piece of imperfect information about some real quality that you want to use to make a decision or prediction.

Dogs do have less of things like joint and spine flexibility, which may matter equally well for tree-climbing and general agility, but they also lack cats’ claws, which mostly help with the trees.

That is, the average grade a student has received in classes in her major. But this is meant to be a generic example; these issues are not far-fetched and can show up in different and even qualitative forms in school, work, court, politics, daily life, charity evaluation…

There’s some gerrymandering here: is it a fair construct for measuring academic performance, or an unfair construct for coding ability?

More realistically, the employer cares about expected utility of letting the candidate past the GPA screen. If false negatives are really bad, and qualifications beyond the minimum don’t matter, then they’d equalize the false negative rate, requiring an even higher cutoff for Physics. If there’s a not-too-costly interview after the GPA screen, then physics majors may need a lower cutoff: the CS GPA tells you all you need to know but the Physics GPA tells you little. Improving your instrument is good, if you can afford it.

Note that this isn’t a correction for CS majors having a higher mean performance, despite looking kind of like it—it’s just an ad-hoc adjustment for the fact that GPA is less correlated with coding skill for Physics majors, who could be just as good at programming, on average.

Robin is very fond on powerful theories which invoke a very small number of basic elements and give those elements great force. He likes to focus on one very central mechanism in seeking an explanation or developing policy advice. Modern physics and Darwin hold too strong a sway in his underlying mental models. He is also very fond of hypotheses involving the idea of a great transformation sometime in the future, and these transformations are often driven by the mechanism he has in mind. I tend to see good social science explanations or proposals as intrinsically messy and complex and involving many different perspectives, not all of which can be reduced to a single common framework. I know that many of my claims sound vague to Robin’s logical atomism, but I believe that, given our current state of knowledge, Robin is seeking a false precision and he is sometimes missing out on an important multiplicity of perspectives. Many of his views should be more cautious.

We find ourselves managing complex networks of beliefs. Bryan’s picture seems to be of a long metal chain linked at only one end to a solid foundation; chains of reasoning mainly introduce errors, so we do best to find and hold close to our few most confident intuitions. My picture is more like Quine’s “fabric,” a large hammock made of string tied to hundreds of leaves of invisible trees; we can’t trust each leaf much, but even so we can stay aloft by connecting each piece of string to many others and continually checking for and repairing broken strings.