U-PHIL: Deconstructing Larry Wasserman

The temptation is strong, but I shall refrain from using the whole post to deconstruct Al Franken’s 2003 quip about media bias (from Lies and Lying Liars Who Tell Them: A Fair and Balanced Look at the Right), with which Larry Wasserman begins his paper “Low Assumptions, High Dimensions” (2011) in his contribution to Rationality, Markets and Morals (RMM) Special Topic: Statistical Science and Philosophy of Science:

Wasserman: There is a joke about media bias from the comedian Al Franken: ‘To make the argument that the media has a left- or right-wing, or a liberal or a conservative bias, is like asking if the problem with Al-Qaeda is: do they use too much oil in their hummus?’

According to Wasserman, “a similar comment could be applied to the usual debates in the foundations of statistical inference.”

Although it’s not altogether clear what Wasserman means by his analogy with comedian (now senator) Franken, it’s clear enough what Franken meant if we follow up the quip with the next sentence in his text (which Wasserman omits): “The problem with al Qaeda is that they’re trying to kill us!” (p. 1). The rest of Franken’s opening chapter is not about al Qaeda but about bias in media. Conservatives, he says, decry what they claim is a liberal bias in mainstream media. Franken rejects their claim.

The mainstream media does not have a liberal bias. And for all their other biases . . . , the mainstream media . . . at least try to be fair. …There is, however, a right-wing media. . . . They are biased. And they have an agenda…The members of the right-wing media are not interested in conveying the truth… . They are an indispensable component of the right-wing machine that has taken over our country… . We have to be vigilant. And we have to be more than vigilant. We have to fight back… . Let’s call them what they are: liars. Lying, lying, liars. (Franken, pp. 3-4)

When I read this in 2004 (when Bush was in office), I couldn’t have agreed more. How things change*. Now, of course, any argument that swerves from the politically correct is by definition unsound, irrelevant, and/ or biased. [ii]

But what does this have to do with Bayesian-frequentist foundations? What is Wasserman, deep down, really trying to tell us by way of this analogy (if only subliminally)? Such are my ponderings—and thus this deconstruction. (I will invite your “U-Phils” at the end.) I will allude to passages from my contribution to RMM (2011) http://www.rmm-journal.de/htdocs/st01.html (in red).

A.What Is the Foundational Issue?

Wasserman: To me, the most pressing foundational question is: how do we reconcile the two most powerful needs in modern statistics: the need to make methods assumption free and the need to make methods work in high dimensions… . The Bayes-Frequentist debate is not irrelevant but it is not as central as it once was. (p. 201)

One may wonder why he calls this a foundational issue, as opposed to, say, a technical one. I will assume he means what he says and attempt to extract his meaning by looking through a foundational lens.

Let us examine the urgency of reconciling the need to make methods assumption-free and that of making them work in complex high dimensions. The problem of assumptions of course arises when they are made about unknowns that can introduce threats of error and/or misuse of methods.

Wasserman: These days, statisticians often deal with complex, high dimensional datasets. Researchers in statistics and machine learning have responded by creating many new methods … . However, many of these new methods depend on strong assumptions. The challenge of bringing low assumption inference to high dimensional settings requires new ways to think about the foundations of statistics. (p. 201)

It is not clear if Wasserman thinks these new methods run into trouble as a result of unwarranted assumptions. This is a substantive issue about Wasserman’s applications that foundational discussions are unlikely to answer. Still, he sees the issue as one of foundations, so I shall take him at his word.

The last decade or more has also given rise to many new problem areas that call for novel methods (e.g., machine learning). Do they call for new foundations? Or, can existing foundations be relevant here too? (See Larry Wasserman’s contribution.) A lack of clarity on the foundations of existing methods tends to leave these new domains in foundational limbo. (Mayo 2011, 92)

I may seem to be at odds with Wasserman’s call to move on past frequentist-Bayesian debates:

Debates over the philosophical foundations of statistics have a long and fascinating history; the decline of a lively exchange between philosophers of science and statisticians is relatively recent. Is there something special about 2011 (and beyond) that calls for renewed engagement in these fields? I say yes. (Mayo, p. 80)

Perhaps this may be Wasserman’s meaning: new types of problems and methods call for a more pragmatic perspective on learning from data. One cannot begin at the point at which different interpretations of probability (Bayesian or frequentist) enter; so frequentist-Bayesian debates are not as central to current practice.

I would never claim there is any obstacle to practice in not having a clear statistical philosophy. But that is different from maintaining both that practice calls for recognition of underlying foundational issues, while also denying Bayesian-frequentist issues are especially important to them. The fact is, key underlying issues come to the surface and are illuminated within frequentist-Bayesian contrasts, as are issues surrounding objective/subjective, deduction/induction, and truth/idealizations, deliberately discussed on this blog. It may be insisted we are beyond them, but they invariably lurk in the background, they are the elephants in the room.

We deliberately used ‘statistical science’ in our forum title because it may be understood broadly to include the full gamut of statistical methods, from experimental design, generation, analysis, and modeling of data to using statistical inference to answer scientific questions. (Even more broadly, we might include a variety of formal but nonprobabilistic methods in computer science and engineering, as well as machine learning.) (Mayo, p. 85)

The recognition that “the model is always wrong”–in the sense of being an idealization– was clear to the founders of “classical” statistics*(see relevant remarks from Cox, Fisher, and Neyman elsewhere on this blog). Although this recognition discredits the idea that inference is all about assigning degrees of belief or confirmation to hypotheses and models, it supports the use of probability in standard error statistics—or so I argue. One can learn true things from idealized models.

Wasserman: A more extreme example of using weak assumptions is to abandon probability completely… . Why are scholars in foundations ignoring this? (pp. 203-4)

By and large, the idea that data were literally “generated from a distribution is usually a fiction” (p. 203) is also not news to error statisticians; in a sense, observations are always deterministic. Viewing the sample as if it were generated probabilistically may simply be to cope with incomplete information, and the incorrect inferences that can result. Probability is introduced as attached to methods (which, in this example, would be for a type of prediction or classification tool).

The machine learners say that there is little need to understand what actually produced the numbers. Fine, then methods are apt that enable an increasingly successful error-rate reduction. Under error statistics’ big umbrella, machine learning appears to fall under the subset of the philosophy of “inductive behavior,” the goals of which involve controlling/improving performance and setting bounds for error rates, and trading off precision and accuracy where appropriate to the particular case. This is in contrast to the subset that is the main focus of my work: that which uses error rates to assess and control how severely claims have passed tests. The latter are contexts of scientific inference. In the prediction-classification example, however, the error-rate guarantees are just the ticket. (I would not rule out inferences about the case at hand.) Yet in the domains of both inductive behavior and scientific inference, the error statistician regards models as approximations and idealizations, or, as Neyman saw them, “picturesque” ways of talking about actual experiments.

Wasserman has proved many intriguing results about the problems of and prospects for low-assumption methods. Whether methods that invoke assumptions could do better, perhaps along side these (checking or making allowances later), is not something on which I can speculate. As complex as the classification prediction problems are, they enjoy an outcome that’s normally absent: we get to find out if we’ve been successful. Background knowledge enters in qualitative ways, not obviously as prior probability distributions in parameters.

C. Is It Bayesian?

Wasserman: In principle, low assumption Bayesian inference is possible. We simply put a prior π on the set of all distributions P. The rest follows from Bayes theorem. But this is clearly unsatisfactory. The resulting priors have no guarantees, except the solipsistic guarantee that the answer is consistent with the assumed prior. (p. 206) [iii]

One big reason some may turn aside from frequentist-Bayesian contrasts is that today even most Bayesians grant the importance of good performance characteristics (though their meaning may differ distinctly). The traditional idea that statistical learning is well-captured by Bayes theorem is rarely upheld (we have seen exceptions, most recently Lindley, also Kadane) [iv].

Today’s debates clearly differ from the Bayesian-frequentist debates of old. In fact, some of those same discussants of statistical philosophy, who only a decade ago were arguing for the ‘irreconcilability’ of frequentist p-values and (Bayesian) measures of evidence, are now calling for ways to ‘unify’ or ‘reconcile’ frequentist and Bayesian accounts… .(Mayo p. 82)

In some cases the nonsubjective posteriors may have good error-statistical properties of the proper frequentist sort, at least in the asymptotic long run. But then another concern arises: If the default Bayesian has merely given us technical tricks to achieve frequentist goals, as some suspect, then why consider them Bayesian (Cox 2006)? Wasserman (2008, 464) puts it bluntly: If the Bayes’ estimator has good frequency-error probabilities, then we might as well use the frequentist method. If it has bad frequency behavior then we shouldn’t use it. (The situation is even more problematic for those of us who insist on a relevant severity warrant.) (Mayo, p. 90)

Wasserman: [In other cases] the answers are usually checked against held out data. This is quite sensible but then this is Bayesian in form not substance. (p. 206)

In this context, insofar as I understand it, the goal is to be able to assess how well the rule can predict “test sets” and indicate an estimate of prediction error. The substance is of an error-statistical kind: through various strategies (e.g., cross validation) we may learn approximately how well a predictive model will perform in cases other than those already used to fit the model. It connects with a general set of strategies for preventing too-easy fits and avoiding (pejorative) double-counting, “over fitting,” and nongeneralizable results.

Deconstructing Wasserman

So where does this leave us in deconstructing Wasserman’s call for new-fangled foundations?

Franken deconstructed: Let us imagine Franken as representing a frequentist error statistician[v]. He begins by noting that while Bayesians may detect a frequentist bias (in certain circles), he detects no such thing. Besides, such a quibble would be akin to worrying about Al-Qaeda using too much oil in their hummus!

Frequentists, he says, are at least trying to meet a fundamental scientific requirement for controlling error, and are open to any number of ways of accomplishing this. But Bayesians—at least dyed-in-the-wool (or staunch subjective or “philosophical”) Bayesians—have an agenda, Franken is saying, by analogy. They charge frequentists with legitimating a hodgepodge of “incoherent” and “inadmissible” methods; they say that frequentists care only for low error rates in the long run, have no way of incorporating background information, invariably misinterpret their own methods, and top it all off with a litany of howlers (that the Bayesian easily avoids). If the discourse on frequentist foundations seems biased, our frequentist Franken continues, it is only to correct the many blatant misinterpretations of its methods.

**********************************

Now Wasserman comes in and utters the scientific equivalent of “Let’s move on.” (as with the Clinton scandal, which gave rise to MoveOn.org, i.e., “Get over it.”) The Bayesian requirements and philosophy do not underwrite the substance of the most promising new complex methods. So if our focus is to justify, interpret, and extend these new contexts, we are allowed to leave the old (frequentist-Bayesian) scandals behind. But, as Wasserman seems further to imply, finding oneself in an essentially frequentist, error-statistical world is not enough either, especially when it comes to the kinds of complex classification and prediction problems of machine learning, data mining, and the like. At any rate, new foundational concerns must loom large….

**********************************

So let me inject myself into the interpretive mix I’ve created.

I concur with the deconstructed Franken and Wasserman. Taking seriously Wasserman’s intimation that there is not only a technical-statistical problem here (which only statisticians can solve), but also a foundational problem, he seeks a ground for applications where probabilistic bounds, however, crude, do not directly describe a data-generating mechanism, but assess/reduce/balance procedural error rates.

The “long-run” relative frequencies have probabilistic implications for bounding the next test set. The old accusation that good error-statistical properties are irrelevant to the case at hand goes by the wayside. Anyone who takes a broad view of error-statistical methods would have no problem finding a home for the variety of methods of creative control and assessment of approximate sampling distributions and error rates. This falls more clearly under what may be called “a behavioristic” context than one of scientific inference (though the latter is not precluded) . It would require breaking out of traditional notions of frequentist statistics and in so doing simultaneously scotch the oft repeated howlers.[vi]

Ironically many seem prepared to allow that Bayesianism still gets it right for epistemology, even as statistical practice calls for methods more closely aligned with frequentist principles. What I would like the reader to consider is that what is right for epistemology is also what is right for statistical learning in practice. That is, statistical inference in practice deserves its own epistemology. (Mayo, p. 100)

Constructing such a framework, would be one payoff of genuinely transcending the frequentist-Bayesian debates, rather than rendering them taboo, or closed.

*7/29 I modified this assertion, and will explicate the different senses in which Neyman and Pearson viewed the relationship between approximate models and correct/incorrect claims about the world later on.

[i] See an earlier post for the way we are using “deconstructing” here.

[ii] Says Franken: “And what shocked me most…was the silence from those conservatives who complain about the ugliness of political discourse in this country.” (19) Oh pleeeze (to use Franken’s expression).

[iii] Some examples of methods applicable to large numbers of variables in econometrics under the error statistical umbrella see the two contributions to the special topic by Aris Spanos, and David Hendry. It would be interesting to hear of relationships.

Post navigation

21 thoughts on “U-PHIL: Deconstructing Larry Wasserman”

I thank Deborah Mayo for deconstructing me and Al Franken. (And for
the record, I couldn’t be further from Franken politically; I just
liked his joke.)

I have never been deconstructed before. I feel a bit like Humpty
Dumpty. Anyway, I think agree with everything Deborah wrote. I’ll
just clarify two points.

First, my main point was just that the cutting edge of statistics
today is dealing with complex, high-dimensional data. My essay was an
invitation to Philosophers to turn their analytical skills towards the
problems that arise in these modern statistical problems.

Deborah wonders whether these are technical rather than foundational
issues. I don’t know. When physicists went from studying medium
sized, slow-moving objects to studying the very small, the very fast
and the very massive, they found a plethora of interesting questions,
both technical and foundational. Perhaps inference for
high-dimensional, complex data can also serve as a venue for both both
technical and foundational questions.

Second, I downplayed the Bayes-Frequentist perhaps more than I should
have. Indeed, this debate still persists. But I also feel that only
a small subset of statisticians care about the debate (because, they
do what they were taught to do, without questioning it) and those that
do care, will never be swayed by debate. The way I see it is that
there are basically two goals:

Goal 1: Find ways to quantify your subjective degrees of belief.

Goal 2: Find procedures with good frequency properties.

If you think that Goal 1 is a good goal, you’ll be a Bayesian. If you
think that Goal 2 is a good goal, you’ll be a frequentist. The debate
is about which goal is a good goal. Once people decide which goal
they think is the “right” goal, it is rare that they will change
their minds. So if I downplayed the debate, it is probably because I
am pessimistic about their being any real, open-minded debate (at least
in statistics). But perhaps I am being too pessimistic.

Well, as I say, I agree with what Deborah wrote and I thank her for
the interesting deconstruction. Now I’ll try to put myself back
together.

Larry: Your comment is provocative in a constructive way, since your laying out of goals gets to the heart of things (requiring much more than this little note):
“Goal 1: Find ways to quantify your subjective degrees of belief.
Goal 2: Find procedures with good frequency properties.
If you think that Goal 1 is a good goal, you’ll be a Bayesian. If you
think that Goal 2 is a good goal, you’ll be a frequentist. …Once people decide which goal they think is the “right” goal, it is rare that they will change their minds.”

Whether or not people change they ought to care if these goals are:
(a) the only choices, (b) obtainable, and (c) if obtainable, desirable. I don’t think these are the only choices, and would deny the first goal is obtainable in any way that would be desirable. But I only want to consider goal 2, because as a frequentist error statistician, you are saying that is my goal in life. It is not. Or rather, unless that goal is importantly qualified, it might at most be necessary and is not sufficient. A crucial qualification for scientific inference, as I see it, is that the error probabilistic assessments are relevant for indicating how well a given error (or erroneous interpretation of the data) has been probed (and perhaps ruled out) relative to the claim or interest, in the case at hand.
Using error probabilistic considerations in this way is a day-to-day occurrence (e.g., learning about my weight right now, thanks to assessing what these scales are capable of in general).

If one claims the frequentist account is only interested in crass “behavioristic” low long run error rates (as ill formed as that goal can be), then we get stuck with howlers like the one in the link below.* Here, as in the Cox 1958 example, one reports a given, highly unreliable, measurement was actually reliable because with some probability a much better measuring tool might have been used, so on average it’s OK.

“how well a given error (or erroneous interpretation of the data) has been probed (and perhaps ruled out) relative to the claim or interest”

I thought that within a frequentist frameworks, there’s no measure of confidence in a claim of interest relative to other erroneous claims. In other words, as far as I can tell, there’s no interest within frequentist frameworks of deciding how certain one is in a claim.

Isn’t what you’re describing closer to the bayesian notion of a “probability of a hypothesis”, in that there you can attempt to quantify the relative strength of a claim relative to erroneous interpretations?

rv: No, the quantities of reliability, precision, detectable discrepancy, well-testedness, degrees of inconsistency/consistency, corroboration, severity and the like are all quantitative assessments within frequentist error statistics. There are basically only two classes of such quantities, I’m just listing several possible terms.

In the spirit of Wasserman’s idea that there are other distinctions than the Bayes/Frequentist one… I would like to suggest another one..

Is the role of statistics to
a) discover the truth
b) predict the future and guide decision making

A lot of Bayes/Frequentist arguments particularly in philosphy of science I see seem to assume a). This is an interesting area, and fits well within the historical view of “sexy” science where new models or laws were discovered… e.g. relativity, brownian motion etc…

As (originally) an engineer, I follow (b). I consider the debate between the (a) and (b) to be a debate amongst reasonable people, but I think there is a strong argument that (b) is important in practice and attempts to solve (a) may not result in solutions to (b).

Surely you accept there are statistical methods (Bayesian and otherwise) that address a) inference and b) prediction and c) decision-making? The “role of statistics” is not any single one of these, so it is pointless to debate which is correct – again, whether one is being Bayesian or not. The answer to which method you want depends on what you’re interested in knowing/doing in the analysis at hand.

David: One of the reasons that a)–finding out what is the case wrt some phenomenon of interest–has played such a large part in frequentist-Bayesian debates is that the former is regularly criticized as relevant only for “deciding how to behave” wrt a hypothesis or question, in such a way that one will not “behave incorrectly too often” in the long run. In fact, this is a metaphor introduced by Neyman for various reasons, one of which was to distinguish what N-P were doing from “inductive inference” understood as assigning a degree of probability, belief, or the like to statistical hypotheses. Instead of adjusting our beliefs, Neyman would say, we are adjusting our behavior wrt a phenomenon. Accordingly, “reject/do not reject” were identified with specific “actions” (be it to announce an effect, publish an article, or decide more information is needed, etc.).

I think that, ironically, people are so involved nowadays with pragmatic, “behavioristic” goals that there’s a tendency to forget that a major, major issue was and still is to show the relevance of frequentist methods for inference, evidence, and knowledge (i.e, for “epistemology”). For a very interesting recent example, consider Robert Kass’ claim that N-P’s “behavioral interpretation seem quaint”, (p. 8) Statistical Science, Feb 2011. His criticism is that they disregard appraising truth in favor of behavioral goals. I would deny this, but my point here is to remind people of one very big central criticism of frequentist methods for science.

David. Maybe, but in order for that servant to serve me properly, I require decisions that are based on reliable knowledge of whatever the decision depends on. It really irks me when i hear some texts say, but we are so much better because we go straight to decisions! But they do not say how they will first warrant the basis for any wise decision. See my conversation with D.R. Cox on this…. (I noticed, by the way, that even Lindley makes the distinction in the article of his posted recently–although his knowledge base is subjective).

I am a little bit confused by your response… I take it your comments are connected with severity… I will read over it again when I get the chance…

In my mind the distinction breaks down like this….

(a) P(H|D), confidence intervals, p-values

(b) predictive distributions.

i.e. I think Neyman-Pearson is also (a) because you are referencing non-observables accept/reject rules etc….

p.s. you may be interested in this, if you are not aware of it already:http://videolectures.net/mlss09uk_jordan_bfway/
I disagree with a lot of it for focusing on (a) for a start.
(as I am sure you will – for other reasons … but it is well done)

David: There’s a lot that’s confused here. Among much else, I doubt predictive distributions are restricted to “observables”, or that they are regarded as decision-making. If you stick with recording observations, you’d never have any info of regularities at all. It’s only from regularities, however partial, that we can predict beyond observed data. Streams of unconnected sense data won’t do. Well, I can’t begin….

I don’t think I understand. Take precision, for example – I can quantify the precision of a binomial parameter estimate p, given data and assuming a binomial model. However, this doesn’t directly tell me whether the claim that the model is binomial is a correct or erroneous interpretation. Perhaps there’s something basic in your interpretation I don’t understand, so feel free to direct me to the relevant sources.

@David I understand where you’re getting at, and I’m sympathetic to the notion that not every study has to be a confirmatory study. However even if one is primarily interested in prediction, the distinction between predictive correlation and “true” (i.e. causal) effects is still important because causal relationships have robustness/stability properties that make them more generalizable than other correlation relationships. But this gets into issues and distinctions that don’t have much to do with the bayes/frequentist debate.

rv: I was just giving different kinds of claims to which a frequentist error statistician might care to attach error probabilities (for appraising warrant). At times it is of interest to use error probabilities to ascertain how good a job a given method has done in distinguishing or avoiding “erroneous interpretations of this data x” –whether or not the concern is an underlying model assumption or statistical hypothesis. The reason I like the phrase is that it emphasizes the piece-meal aspect of error statistics: one need not have to distinguish an exhaustive set of hypotheses or all mistakes that could ever be of interest wrt a phenomenon. (One needn’t consider the so-called “catchall hypothesis”.). One asks if such and such is or is not an erroneous interpetation wrt this one aspect of the source. (If interested, I do have publication links on this page.)

Mayo: I doubt predictive distributions are restricted to “observables”

Yes, they are by definition.

Mayo:or that they are regarded as decision-making.

No, but it is a primitive for real decision making e.g. should I invest, should I build the machine etc…

I have doubts about the merits of decision theory applied to parameters either Bayesian or frequentist. That is I have doubts about questions such as is this theory true, what is the best estimate etc…

I am favour of decision theory applied to observations. Questions of the form: what is the cheapest, fastest, best outcome in terms of saving lives etc… Which must be Bayesian.

David. “This is not ‘sticking to recording observations'”
Right. It was you who had said reference to unobservables takes you to goal (a). One might speak of “observable” predictions, but the models, laws, rules on which they are based, even if probabilistic, involve some kind of generalization and/or claims about regularities.

You’re so awesome! I don’t suppose I’ve read through a single thing like that before. So great to find someone with unique thoughts on this subject matter. Seriously.. many thanks for starting this up. This site is something that is needed on the web, someone with a little originality!

The basic explanation of what happens in a precision machine shop is quite simple. It is an assembly-line process, in which numerical commands are fed into a machine, which translates those commands into robotic actions, resulting in the creation of an object or objects from a solid block of raw metal.