Monthly Archives: July 2012

I just noticed Andrew Gelman’s blog today. ..too good to let pass without quick comment: He asks:

What is a Bayesian?

Deborah Mayo recommended that I consider coming up with a new name for the statistical methods that I used, given that the term “Bayesian” has all sorts of associations that I dislike (as discussed, for example, in section 1 of this article).

I replied that I agree on Bayesian, I never liked the term and always wanted something better, but I couldn’t think of any convenient alternative. Also, I was finding that Bayesians (even the Bayesians I disagreed with) were reading my research articles, while non-Bayesians were simply ignoring them. So I thought it was best to identify with, and communicate with, those people who were willing to engage with me.

More formally, I’m happy defining “Bayesian” as “using inference from the posterior distribution, p(theta|y)”. This says nothing about where the probability distributions come from (thus, no requirement to be “subjective” or “objective”) and it says nothing about the models (thus, no requirement to use the discrete models that have been favored by the Bayesian model selection crew). Based on my minimal definition, I’m as Bayesian as anyone else.

He may be “as Bayesian as anyone else,” but does he really want to be as Bayesian as anyone? (slight, deliberate equivocation). As a good Popperian, I concur (with Popper), that names should not matter, but Gelman’s remarks suggest he should distinguish himself, at least philosophically[i].

As in note [iv] of my Wasserman deconstruction: “Even where Bayesian methods are usefully applied, some say ‘most of the standard philosophy of Bayes is wrong’ (Gelman and Shalizi 2012, 2 n2)”.

… we see science—and applied statistics—as resolving anomalies via the creation of improved models which of- ten include their predecessors as special cases. This view corresponds closely to the error-statistics idea of Mayo (1996). (Gelman 2011, 70)

If the foundations for these methods are error statistical, then shouldn’t that come out in the description? (error-statistical Bayes?) It seems sufficiently novel to warrant some greater gesture, than ‘this too is Bayesian’.)

Ironically many seem prepared to allow that Bayesianism still gets it right for epistemology, even as statistical practice calls for methods more closely aligned with frequentist principles. What I would like the reader to consider is that what is right for epistemology is also what is right for statistical learning in practice. That is, statistical inference in practice deserves its own epistemology. (Mayo, 2011p. 100)

What do people think?

[i] To Gelman’s credit, he is one of the few contemporary statisticians to (openly) recognize the potential value of philosophy of statistics for statistical practice!

The temptation is strong, but I shall refrain from using the whole post to deconstruct Al Franken’s 2003 quip about media bias (from Lies and Lying Liars Who Tell Them: A Fair and Balanced Look at the Right), with which Larry Wasserman begins his paper “Low Assumptions, High Dimensions” (2011) in his contribution to Rationality, Markets and Morals (RMM) Special Topic: Statistical Science and Philosophy of Science:

Wasserman: There is a joke about media bias from the comedian Al Franken: ‘To make the argument that the media has a left- or right-wing, or a liberal or a conservative bias, is like asking if the problem with Al-Qaeda is: do they use too much oil in their hummus?’

According to Wasserman, “a similar comment could be applied to the usual debates in the foundations of statistical inference.”

Although it’s not altogether clear what Wasserman means by his analogy with comedian (now senator) Franken, it’s clear enough what Franken meant if we follow up the quip with the next sentence in his text (which Wasserman omits): “The problem with al Qaeda is that they’re trying to kill us!” (p. 1). The rest of Franken’s opening chapter is not about al Qaeda but about bias in media. Conservatives, he says, decry what they claim is a liberal bias in mainstream media. Franken rejects their claim.

The mainstream media does not have a liberal bias. And for all their other biases . . . , the mainstream media . . . at least try to be fair. …There is, however, a right-wing media. . . . They are biased. And they have an agenda…The members of the right-wing media are not interested in conveying the truth… . They are an indispensable component of the right-wing machine that has taken over our country… . We have to be vigilant. And we have to be more than vigilant. We have to fight back… . Let’s call them what they are: liars. Lying, lying, liars. (Franken, pp. 3-4)

When I read this in 2004 (when Bush was in office), I couldn’t have agreed more. How things change*. Now, of course, any argument that swerves from the politically correct is by definition unsound, irrelevant, and/ or biased. [ii]

But what does this have to do with Bayesian-frequentist foundations? What is Wasserman, deep down, really trying to tell us by way of this analogy (if only subliminally)? Such are my ponderings—and thus this deconstruction. (I will invite your “U-Phils” at the end.) I will allude to passages from my contribution to RMM (2011) http://www.rmm-journal.de/htdocs/st01.html (in red).

A.What Is the Foundational Issue?

Wasserman: To me, the most pressing foundational question is: how do we reconcile the two most powerful needs in modern statistics: the need to make methods assumption free and the need to make methods work in high dimensions… . The Bayes-Frequentist debate is not irrelevant but it is not as central as it once was. (p. 201)

One may wonder why he calls this a foundational issue, as opposed to, say, a technical one. I will assume he means what he says and attempt to extract his meaning by looking through a foundational lens.

Let us examine the urgency of reconciling the need to make methods assumption-free and that of making them work in complex high dimensions. The problem of assumptions of course arises when they are made about unknowns that can introduce threats of error and/or misuse of methods. Continue reading →

Working on the last two chapters of my book on philosophy of statistical inference, I’m revisiting such topics as weak conditioning, Birnbaum, likelihood principle, etc., and was reading from the Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer (1985)[i]. In a paper I had not seen (or had forgotten), Jim Berger “The Frequentist Viewpoint and Conditioning,” writes that the quoting of a P-value “may be felt to be a frequentist procedure by some, since it involves an averaging over the sample space. The reporting of P-values can be given no long-run frequency interpretation [in any of the set-ups generally considered]. A P-value actually lies closer to conditional (Bayesian) measures than to frequentist measures.” (Berger 1985, 23). These views are echoed in Berger’s more recent “Could Fisher,Jeffreys and Neyman Have Agreed on Testing?”(2003). This is at odds with what Fisher, N-P, Cox, Lehmann, etc. have held, and if true, would also seem to entail that a severity assessment had no frequentist interpretation! The flaw lies in that all-too-common behavioristic, predesignated conception…

The first thing one wants to know about a search method is what it is searching for, what would count as getting it right. One might want to estimate a probability distribution, or get correct forecasts of some probabilistic function of the distribution (e.g., out-of-sample means), or a causal structure, or some probabilistic function of the distribution resulting from some class of interventions. Secondly, one wants to know about what decision theorists call a loss function, but less precisely, what is the comparative importance of various errors of measurement, or, in other terms, what makes some approximations better than others. Third, one wants a limiting consistency proof: sufficient conditions for the search to reach the goal in the large sample limit. There are various kinds of consistency—pointwise versus uniform for example—and one wants to know which of those, if any, hold for a search method under what assumptions about the hypothesis space and the sampling distribution. Fourth, one wants to know as much as possible about the behavior of the search method on finite samples. In simple cases of statistical estimation there are analytic results; more often for search methods only simulation results are possible, but if so, one wants them to explore the bounds of failure, not just easy cases. And, of course, one wants a rationale for limiting the search space, as well as, some sense of how wrong the search can be if those limits are violated in various ways.

There are other important economic features of search procedures. Probability distributions (or likelihood functions) can instantiate any number of constraints—vanishing partial correlations for example, or inequalities of correlations. Suppose the hypothesis space delimits some big class of probability distributions. Suppose the search proceeds by testing constraints (the points that follow apply as well if the procedure computes posterior probabilities for particular hypotheses and applies a decision rule.) There is a natural partial ordering of classes of constraints: B is weaker than A if and only if every distribution that satisfies class A satisfies class B. Other things equal, a weakest class might be preferred because it requires fewer tests. But more important is what the test of a constraint does in efficiently guiding the search. A test that eliminates a particular hypothesis is not much help. A test that eliminates a big class of hypotheses is a lot of help.

Other factors: the power of the requisite tests; the numbers of tests (or posterior probability assessments) required; the computational requirements of individual tests (or posterior probability assessments.) And so on. And, finally, search algorithms have varying degrees of generality. For example, there are general algorithms, such as the widely used PC search algorithm for graphical causal models, that are essentially search schema: stick in whatever decision procedure for conditional independence and PC becomes a search procedure using that conditional independence oracle. By contrast, some searches are so embedded in a particular hypothesis space that it is difficult to see the generality.

I am sure I am not qualified to comment on the details of Hendry’s search procedure, and even if I were, for reasons of space his presentation is too compressed for that. Still, I can make some general remarks. I do not know from his essay the answers to many of the questions pertinent to evaluating a search procedure that I raised above. For example, his success criterion is “congruence” and I have no idea what that is. That is likely my fault, since I have read only one of his books, and that long ago.

David Hendry dismisses “priors,” meaning, I think, Bayesian methods, with an argument from language acquisition. Kids don’t need priors to learn a language. I am not sure of Hendry’s logic. Particular grammars within a parametric “universal grammar” could in principle be learned by a Bayesian procedure, although I have no reason to think they are. But one way or the other, that has no import for whether Bayesian procedures are the most advantageous for various search problems by any of the criteria I have noted above. Sometimes they may be, sometimes not, there is no uniform answer, in part because computational requirements vary. I could give examples, but space forbids.

Abstractly, one could think there are two possible ways of searching when the set of relationships to be uncovered may form a complex web: start by positing all possible relationships and eliminate from there, or start by positing no relationships and build up. Hendry dismisses the latter, with what generality I do not know. What I do know is that the relations between “bottom-up” and “top-down” or “forward” and “backward” search can be intricate, and in some cases one may need both for consistency. Sometimes either will do. Graphical models, for example can be searched starting with the assumption that every variable influences every other and eliminating, or starting with the assumption that no variable influences any other and adding. There are pointwise consistent searches in both directions. The real difference is in complexity.

Professor Hendry* endorses a distinction between the “context of discovery” and the “context of evaluation” which he attributes to Herschel and to Popper and could as well have attributed also to Reichenbach and to most contemporary methodological commentators in the social sciences. The “context” distinction codes two theses.

1.“Discovery” is a mysterious psychological process of generating hypotheses; “evaluation” is about the less mysterious process of warranting them.

2. Of the three possible relations with data that could conceivably warrant a hypothesis—how it was generated, its explanatory connections with the data used to generate it, and its predictions—only the last counts.

Einstein maintained the first but not the second. Popper maintained the first but that nothing warrants a hypothesis. Hendry seems to maintain neither–he has a method for discovery in econometrics, a search procedure briefly summarized in the second part of his essay, which is not evaluated by forecasts. Methods may be esoteric but they are not mysterious. And yet Hendry endorses the distinction. Let’s consider it.

As a general principle rather than a series of anecdotes, the distinction between discovery and justification or evaluation has never been clear and what has been said in its favor of its implied theses has not made much sense, ever. Let’s start with the father of one of Hendry’s endorsers, William Herschel. William Herschel discovered Uranus, or something. Actually, the discovery of the planet Uranus was a collective effort with, subject to vicissitudes of error and individual opinion, was a rational search strategy. On March 13, 1781, in the course of a sky survey for double stars Hershel reports in his journal the observation of a “nebulous star or perhaps a comet.” The object came to his notice how it appeared through the telescope, perhaps the appearance of a disc. Herschel changed the magnification of his telescope, and finding that the brightness of the object changed more than the brightness of fixed stars, concluded he had seen a comet or “nebulous star.” Observations that, on later nights, it had moved eliminated the “nebulous star” alternative and Herschel concluded that he had seen a comet. Why not a planet? Because lots of comets had been hitherto observed—Edmund Halley computed orbits for half a dozen including his eponymous comet—but never a planet. A comet was much the more likely on frequency grounds. Further, Herschel had made a large error in his estimate of the distance of the body based on parallax values using his micrometer. A planet could not be so close.

This gets to a distinction I have tried to articulate, between explaining a known effect (like looking for a known object), and searching for an unknown effect (that may well not exist). In the latter, possible effects of “selection” or searching need to be taken account of. Of course, searching for the Higgs is akin to the latter, not the former, hence the joke in the recent New Yorker cartoon.

This is a follow-up on Vladimir Cherkassky’s comments on Deborah’s blog. First of all let me thank Vladimir for taking the time to clarify his position. Still, there’s one issue where we disagree and which, at the same time, I think, needs clarification, so I decided to write this follow-up.[related posts 1]

The issue is about how central VC (Vapnik-Chervonenkis)-theory is to inductive inference.

I agree with Vladimir that VC-theory is one of the most important achievements in the field ever, and indeed, that it fundamentally changed our way of thinking about learning from data. Yet I also think that there are many problems of inductive inference to which it has no direct bearing. Some of these are concerned with hypothesis testing, but even when one is concerned with prediction accuracy – which Vladimir considers the basic goal – there are situations where I do not see how it plays a direct role. One of these is sequential prediction with log-loss or its generalization, Cover’s loss. This loss function plays a fundamental role in (1) language modeling, (2) on-line data compression, (3a) gambling and (3b) sequential investment on the stock market (here we need Cover’s loss). [a superquick intro to log-loss as well as some references are given below under [A]; see also my talk at the Ockham workshop (slides 16-26 about weather forecasting!) )

In my July 8, 2012 post “Metablog: Up and Coming,” I wrote: “I will attempt a (daring) deconstruction of Professor Wasserman’s paper[i] and at that time will invite your “U-Phils” for posting around a week after (<1000 words).” These could reflect on Wasserman’s paper and/or my deconstruction of it. See an earlier post for the way we are using “deconstructing” here. For some guides, see “so you want to do a philosophical analysis“.

So my Wasserman deconstruction notes have been sitting in the “draft” version of this blog for several days as we focused on other things. Here’s how it starts…

Deconstructing Larry Wasserman–it starts like this…

1.AlFranken’sJoke

The temptation is strong, but I shall refrain from using the whole post to deconstruct Al Franken’s 2003 quip about media bias (from Lies and Lying Liars Who Tell Them: A Fair and Balanced Look at the Right), with which Larry Wasserman begins his paper “Low Assumptions, High Dimensions” (2011):

To make the argument that the media has a left- or right-wing, or a liberal or a conservative bias, is like asking if the problem with Al-Qaeda is: do they use too much oil in their hummus?

According to Wasserman, “a similar comment could be applied to the usual debates in the foundations of statistical inference.”

Although it’s not altogether clear what Wasserman means by his analogy with comedian (now senator) Franken, it’s clear enough what Franken means if we follow up the quip with the next sentence in his text (which Wasserman omits): “The problem with al Qaeda is that they’re trying to kill us!” (p. 1) The rest of Franken’s opening chapter is not about al Qaeda but about bias in media.

But what does this have to do with the usual debates in the foundations of statistical inference? What is Wasserman, deep down, perhaps unconsciously, really, really, possibly implicitly, trying to tell us by way of this analogy? Such are the ponderings in my deconstruction of him…

Yet the footnote to my July 8 blog also said that my post assumed ” I don’t chicken out”. So I will put it aside until I get a chorus of encouragement to post it…

Yesterday’s slight detour [i] presents an opportunity to (re)read Lindley’s “Philosophy of Statistics” (2000) (see also an earlier post). I recommend the full article and discussion. There is actually much here on which we agree.

The Philosophy of Statistics

Dennis V. Lindley

The Statistician (2000) 49:293-319

Summary. This paper puts forward an overall view of statistics. It is argued that statistics is the study of uncertainty. The many demonstrations that uncertainties can only combine according to the rules of the probability calculus are summarized. The conclusion is that statistical inference is firmly based on probability alone. Progress is therefore dependent on the construction of a probability model; methods for doing this are considered. It is argued that the probabilities are personal. The roles of likelihood and exchangeability are explained. Inference is only of value if it can be used, so the extension to decision analysis, incorporating utility, is related to risk and to the use of statistics in science and law. The paper has been written in the hope that it will be intelligible to all who are interested in statistics.

Around eight pages in we get another useful summary:

Let us summarize the position reached.

(a) Statistics is the study of uncertainty.

(b) Uncertainty should be measured by probability.

(c) Data uncertainty is so measured, conditional on the parameters.

(d) Parameter uncertainty is similarly measured by probability.

(e) Inference is performed within the probability calculus, mainly by equations (1) and (2) (301).

I suppose[ed] this was somewhat of a joke from the ISBA, prompted by Dennis Lindley, but as I [now] accord the actual extent of jokiness to be only ~10%, I’m sharing it on the blog [i]. Lindley (according to O’Hagan) wonders why scientists require so high a level of statistical significance before claiming to have evidence of a Higgs boson. It is asked: “Are the particle physics community completely wedded to frequentist analysis? If so, has anyone tried to explain what bad science that is?”

Bad science? I’d really like to understand what these representatives from the ISBA would recommend, if there is even a shred of seriousness here (or is Lindley just peeved that significance levels are getting so much press in connection with so important a discovery in particle physics?)

Well, read the letter and see what you think.

On Jul 10, 2012, at 9:46 PM, ISBA Webmaster wrote:

Dear Bayesians,

A question from Dennis Lindley prompts me to consult this list in search ofanswers.

We’ve heard a lot about the Higgs boson. The news reports say that the LHC needed convincing evidence before they would announce that a particle had been found that looks like (in the sense of having some of the right characteristics of) the elusive Higgs boson. Specifically, the news referred to a confidence interval with 5-sigma limits.

Now this appears to correspond to a frequentist significance test with an extreme significance level. Five standard deviations, assuming normality, means a p-value of around 0.0000005. A number of questions spring to mind.

1. Why such an extreme evidence requirement? We know from a Bayesian perspective that this only makes sense if (a) the existence of the Higgs boson (or some other particle sharing some of its properties) has extremely small prior probability and/or (b) the consequences of erroneously announcing its discovery are dire in the extreme. Neither seems to be the case, so why 5-sigma?

2. Rather than ad hoc justification of a p-value, it is of course better to do a proper Bayesian analysis. Are the particle physics community completely wedded to frequentist analysis? If so, has anyone tried to explain what bad science that is? Continue reading →

A quick perusal of the “Manual” on Nathan Schachtman’slegal blogshows it to be chock full of revealing points of contemporary legal statistical philosophy. The following are some excerpts, read the full bloghere. I make two comments at the end.

In her introductory chapter, the late Professor Margaret A. Berger raises the question of the role statistical significance should play in evaluating a study’s support for causal conclusions:

“What role should statistical significance play in assessing the value of a study? Epidemiological studies that are not conclusive but show some increased risk do not prove a lack of causation. Some courts find that they therefore have some probative value, 62 at least in proving general causation. 63”

This seems rather backwards. Berger’s suggestion that inconclusive studies do not prove lack of causation seems nothing more than a tautology. And how can that tautology support the claim that inconclusive studies “therefore ” have some probative value? This is a fairly obvious logical invalid argument, or perhaps a passage badly in need of an editor.

…………

Chapter on Statistics

The RMSE’s chapter on statistics is relatively free of value judgments about significance probability, and, therefore, a great improvement upon Berger’s introduction. The authors carefully describe significance probability and p-values, and explain:

“Small p-values argue against the null hypothesis. Statistical significance is determined by reference to the p-value; significance testing (also called hypothesis testing) is the technique for computing p-values and determining statistical significance.”

David H. Kaye and David A. Freedman, “Reference Guide on Statistics,” in RMSE3d 211, 241 (3ed 2011). Although the chapter confuses and conflates Fisher’s interpretation of p-values with Neyman’s conceptualization of hypothesis testing as a dichotomous decision procedure, this treatment is unfortunately fairly standard in introductory textbooks.

Kaye and Freedman, however, do offer some important qualifications to the untoward consequences of using significance testing as a dichotomous outcome: Continue reading →

Stephen SennHead of the Methodology and Statistics Group, Competence Center for Methodology and Statistics (CCMS), Luxembourg

An issue sometimes raised about randomized clinical trials is the problem of indefinitely many confounders. This, for example is what John Worrall has to say:

Even if there is only a small probability that an individual factor is unbalanced, given that there are indefinitely many possible confounding factors, then it would seem to follow that the probability that there is some factor on which the two groups are unbalanced (when remember randomly constructed) might for all anyone knows be high. (Worrall J. What evidence is evidence based medicine. Philosophy of Science 2002; 69: S316-S330: see page S324 )

It seems to me, however, that this overlooks four matters. The first is that it is not indefinitely many variables we are interested in but only one, albeit one we can’t measure perfectly. This variable can be called ‘outcome’. We wish to see to what extent the difference observed in outcome between groups is compatible with the idea that chance alone explains it. The indefinitely many covariates can help us predict outcome but they are only of interest to the extent that they do so. However, although we can’t measure the difference we would have seen in outcome between groups in the absence of treatment, we can measure how much it varies within groups (where the variation cannot be due to differences between treatments). Thus we can say a great deal about random variation to the extent that group membership is indeed random.

The second point is that in the absence of a treatment effect, where randomization has taken place, the statistical theory predicts probabilistically how the variation in outcome between groups relates to the variation within. Continue reading →

Dear Reader: Over the next week, in addition to a regularly scheduled post by Professor Stephen Senn, we will be taking up two papers[i] from the contributions to the special topic: “Statistical Science and Philosophy of Science: Where Do (Should) They Meet in 2011 and Beyond?” in Rationality, Markets and Morals: Studies at the Intersection of Philosophy and Economics.

I will attempt a (daring) deconstruction of Professor Wasserman’s paper[ii] and at that time will invite your “U-Phils” for posting around a week after (<1000 words). I will be posting comments by Clark Glymour on Sir David Hendry’s paper later in the week. So you may want to study those papers in advance.

I thank Dr. Vladimir Cherkassky for taking up my general invitation to comment. I don’t have much to add to my original post[i], except to make two corrections at the end of this post. I invite readers’ comments.

Vladimir Cherkassky

As I could not participate in the discussion session on Sunday, I would like to address several technical issues and points of disagreement that became evident during this workshop. All opinions are mine, and may not be representative of the “machine learning community.” Unfortunately, the machine learning community at large is not very much interested in the philosophical and methodological issues. This breeds a lot of fragmentation and confusion, as evidenced by the existence of several technical fields: machine learning, statistics, data mining, artificial neural networks, computational intelligence, etc.—all of which are mainly concerned with the same problem of estimating good predictive models from data.

Occam’s Razor (OR) is a general metaphor in the philosophy of science, and it has been discussed for ages. One of the main goals of this workshop was to understand the role of OR as a general inductive principle in the philosophy of science and, in particular, its importance in data-analytic knowledge discovery for statistics and machine learning.

Data-analytic modeling is concerned with estimating good predictive models from finite data samples. This is directly related to the philosophical problem of inductive inference. The problem of learning (generalization) from finite data had been formally investigated in VC-theory ~ 40 years ago. This theory starts with a mathematical formulation of the problem of learning from finite samples, without making any assumptions about parametric distributions. This formalization is very general and relevant to many applications in machine learning, statistics, life sciences, etc. Further, this theory provides necessary and sufficient conditions for generalization. That is, a set of admissible models (hypotheses about the data) should be constrained, i.e., should have finite VC-dimension. Therefore, any inductive theory or algorithm designed to explain the data should satisfy VC-theoretical conditions. Continue reading →

I want to understand better Sober’s position on falsification. A pervasive idea to which many still subscribe, myself included, is that the heart of what makes inquiry scientific is the critical attitude: that if a claim or hypothesis or model fails to stand up to critical scrutiny it is rejected as false, and not propped up with various “face-saving” devices. Now

Sober writes “I agree that we can get rid of models that deductively entail (perhaps with the help of auxiliary assumptions) observational outcomes that do not happen. But as soon as the relation is nondeductive, is there ‘falsification’”?

My answer is yes, else we could scarcely retain the critical attitude for any but the most trivial scientific claims. While at one time philosophers imagined that “observational reports” were given, and could therefore form the basis for a deductive falsification of scientific claims, certainly since Popper, Kuhn and the rest of the post-positivists, we recognize that observations are error prone, as are appeals to auxiliary hypotheses. Here is Popper: Continue reading →

Here are a few comments on your recent blog about my ideas on parsimony. Thanks for inviting me to contribute!

You write that in model selection, “’parsimony fights likelihood,’ while, in adequate evolutionary theory, the two are thought to go hand in hand.” The second part of this statement isn’t correct. There are sufficient conditions (i.e., models of the evolutionary process) that entail that parsimony and maximum likelihood are ordinally equivalent, but there are cases in which they are not. Biologists often have data sets in which maximum parsimony and maximum likelihood disagree about which phylogenetic tree is best.

You also write that “error statisticians view hypothesis testing as between exhaustive hypotheses H and not-H (usually within a model).” I think that the criticism of Bayesianism that focuses on the problem of assessing the likelihoods of “catch-all hypotheses” applies to this description of your error statistical philosophy. The General Theory of Relativity, for example, may tell us how probable a set of observations is, but its negation does not. I note that you have “usually within a model” in parentheses. In many such cases, two alternatives within a model will not be exhaustive even within the confines of a model and of course they won’t be exhaustive if we consider a wider domain.

Elliott Sober has been writing on simplicity for a long time, so it was good to hear his latest thinking. If I understood him, he continues to endorse a comparative likelihoodist account, but he allows that, in model selection, “parsimony fights likelihood,” while, in adequate evolutionary theory, the two are thought to go hand in hand. Where it seems needed, therefore, he accepts a kind of “pluralism”. His discussion of the rival models in evolutionary theory and how they may give rise to competing likelihoods (for “tree taxonomies”) bears examination in its own right, but being in no position to accomplish this, I shall limit my remarks to the applicability of Sober’s insights (as my notes reflect them) to the philosophy of statistics and statistical evidence.

1. Comparativism: We can agree that a hypothesis is not appraised in isolation, but to say that appraisal is “contrastive” or “comparativist” is ambiguous. Error statisticians view hypothesis testing as between exhaustive hypotheses H and not-H (usually within a model), but deny that the most that can be said is that one hypothesis or model is comparatively better than another, among a group of hypotheses that is to be delineated at the outset. There’s an important difference here. The best-tested of the lot need not be well-tested!

2. Falsification: Sober made a point of saying that his account does not falsify models or hypotheses. We are to start out with all the possible models to be considered (hopefully including one that is true or approximately true), akin to the “closed universe” of standard Bayesian accounts[i], but do we not get rid of any as falsified, given data? It seems not.

I see that Nathan Schachtman has had many interesting posts during the time I was away. His recent post endorses the idea of “a hierarchy of evidence”–but philosophers of “evidence-based” medicine generally question or oppose it, at least partly because of disagreement as to where to place RCTs in the hierarchy. What do people think?

Litigation arising from the FDA’s refusal to approval “health claims” for foods and dietary supplements is a fertile area for disputes over the interpretation of statistical evidence. A ‘‘health claim’’ is ‘‘any claim made on the label or in labeling of a food, including a dietary supplement, that expressly or by implication … characterizes the relationship of any substance to a disease or health-related condition.’’ 21 C.F.R. § 101.14(a)(1); see also 21 U.S.C. § 343(r)(1)(A)-(B).

If the FDA’s refusal to approve a health claim requires pre-specified criteria of evaluation, then we should be asking ourselves why have the federal courts failed to develop a set of criteria for evaluating health effects claims as part of its Rule 702 (“Daubert“) gatekeeping responsibilities. Why, after close to 20 years after the Supreme Court decided Daubert, can lawyers make “health claims” without having to satisfy evidence-based criteria?

Follow Blog via Email

Unauthorized use and/or duplication of this material without express and written permission from this site’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Deborah G. Mayo and Error Statistics Philosophy with appropriate and specific direction to the original content.