I'm not a zealot for either frequentist or bayesian methods (nice to have both in the tool box as far as I'm concerned). However, I find it hard to take a pro-bayesian article seriously that is based on coming up with constructed scenarios for when confidence intervals fail. I could spend all day constructing seemingly reasonable but misleading priors.

I am also not particularly impressed by the construction of pathological cases. If we are going to criticize the CI approach, I think there is an easier and more important point to make: the CI only quantifies uncertainty due to sampling, and says nothing about measurement error, sampling bias, and modeling error. In many real world scenarios (can I say most?) these other sources of error are orders of magnitude bigger than sampling error.

Confidence intervals provide a relatively easy way to quantify one source of error. But that's all. If you expect them to do more, you will be disappointed.

A fixed (proper) prior doesn't lead to formally incoherent inference, though. And FWIW neither does using a whole class of priors as in Robust Bayes. The problem with prior specification is elicitation rather than a problem with the technique per se.

As stated int he link "We're both correct in the mathematical statements we're making, but we disagree about the appropriate way to quantify uncertainty."

My issue is the article seems to present confidence intervals as impractical nonsense compared to credible intervals. They provide some constructed examples to make confidence intervals look silly (when no reasonable statistician, frequentist or not, would behave that way). Meanwhile poor prior selection can give you crappy inference in the specific analysis at hand, regardless if they are formally coherent. This fact is not considered a problem by the authors (or at least it isn't discussed), but they hammer away at confidence intervals for this very reason. The article would be much stronger if they merely held the bayesian intervals up to the same scrutiny and discussed ways to minimize those problems.

If your prior leads to "crappy inference" then either your prior is wrong (i.e. you didn't encode your prior knowledge correctly) or your intuition about what constitutes "crappy inference" is wrong. Prior elicitation is hard, but this is a non-sequitur. The issue is that confidence intervals rely entirely on pre-experiment logic and by design don't tell you what you ought to believe or do once you've seen the data.

You say "no reasonable...frequentist" would behave this way and that the examples are contrived. This isn't true in general, particularly in clinical trials (sequential rejection schemes, in particular, constitute what you are claiming a smart frequentist would never do, yet they are used in practice all the time simply because they control the error rate correctly). I may write more on this later.

Not encoding your prior knowledge into a prior correctly is sort of my point. You could mess up the encoding it, your prior knowledge could be incorrect, etc. Saying everything is fine when your prior is right is borderline meaningless. I could just as easily say as long as my regression model is correct and everything is measured perfectly it's all fine. That's not the real world though, and neither is the world of perfect priors.

I'm not anti-Bayesian inference. It's really useful in some cases (where other approaches aren't). It isn't a silver bullet though, and the linked article does not paint a fair picture of the relative strength and weaknesses of the different approaches.

Saying everything is fine when your prior is right is borderline meaningless. I could just as easily say as long as my regression model is correct and everything is measured perfectly it's all fine. That's not the real world though, and neither is the world of perfect priors.

Understanding how things work in the idealized case is a first step to understanding what to do in complicated situations. You wouldn't balk in a Physics 101 class at the assumption of a frictionless surface just because it never arises in practice; you understand the principles that drive the frictionless surface first and then extend off it. And, once you did move to trying to understand surfaces with friction, you wouldn't revert to principles that didn't even work for a frictionless surface without a very good reason.

So we write down some axioms for logical inference, and Cox's theorem forces us to be Bayesian if we want to have any coherent way of ascribing plausibility to statements like [;[\theta \in A(X)];]. The problem in practice is that the prior - and the model as well, but for some reason this part gets ignored - is hard to write down. So we have two options: (1) move the goalposts (but at least acknowledge this is what we are doing!!!) or (2) tackle the very hard problem of prior specification or look for ways around it that allow us to at least approximate the correct Bayesian inference. Taking route (1) is fine and leads to very useful and fruitful results - it gives us things like guarantees about the scientific process. My problem is that many people using frequentist methods either don't know or are in denial about how far they have moved the goalposts.

I'm still finishing this article but have a few issues with the first example they've represented regarding submarines. First, why would a reasonable statistician choose a 50% confidence procedure, especially in the case of "hypothetical life" on the line. You're essentially saying "I'm right or I'm wrong, but I'll use it anyway". Choosing a more reasonable value, say 95% confidence, you would see that the value for t* in this case is approximately 13, as opposed to 1. This greatly changes the decision that would be made at the end of the day.

The second issue that I'm having is that the data generating this example would not be gaussian at all. Using the student t distribution in this case should be inappropriate. If the bubbles could indeed "form anywhere along the craft's length, independently, with equal probability" then we're dealing with a uniform distribution and estimating the mean should be handled different. Essentially if you've seen two bubbles that are close together, say 1 meter apart, and the submarine is 10 meters long, that interval would only cover the middle of the submarine 10% of the time. As a statistician, I wouldn't be too excited to base my credibility on those first two data points. I'm all for better education of confidence intervals, but I think this example is very misleading and misrepresents how a real trained statistician would handle this problem.

I really take issue with their argument of "but there is more than one type of CI in a given problem (wald, Agresti-Coull, etc)." As if there werent an infinite number of priors all resulting in different credible intervals.

It's quite seriously flawed. I can justify that 50% CI if the bubbles are generated independently, and I sure as hell don't need a distribution let alone normal or t to do it.

It's obvious the person never took mathematical statistics because this is a simple problem of estimating a+5 from a uniform (a,a+10) distribution, for which the bayesian and frequentist solutions shouldn't even differ that much (aside small N)

Suppose you obtain x1 = 4 and x2 = 7. Then any a in the range [-3, 4] has the same likelihood. There's no reason to choose one point in the range over the other. How would you construct a point estimate and CI?

A MLE isn't always guaranteed to be unique, but usually is asymptotically. That said inference really isn't made on point estimates and the original discussion was on CI's, not point estimates.

So what would your simple frequentist solution be?

To address this question (since I never mentioned MLE)

I said I could get the 50% CI without distributions. If the observations are independent and we assume that the bubbles form equally likely on both sides, then P(X_{(1)} < a+5 < X_{(2)}) = 1/2 trivially.

I'm not saying this is a good interval. Obviously if X_{(1)} is close to a and X_{(2)} is close to 10, this is ridiculous.

How would I get an arbitrary CI then? (Don't assume I need Maximum Likelihood). Let d = X_{(2)} - X_{(1)} and ah = (X_{(2)} + X_{(1)})/2. Then I use the interval ah +/- c * d where c is chosen to get the desired probability.

(I'm not going to do the work to compute that, but it's doable. It also depends on whether you want to do inference conditional on d or not. In this case it makes sense to do that, but both ways the interval could be chosen appropriately.)

Edit and if you want a point estimator, ah is a completely valid point estimator.

I should also point out that my unconditional (on d) estimator probably should be ah +/- c.

Wouldn't that method end up being the same as method A in the paper? In your notation, they take ah +/- abs(d)/2, which is claimed to have 50% coverage. (This claim isn't justified by the t distribution, though they note it would give the same result.)

This is the third or fourth paper that's come by this site where a Bayesian crusade is taken against frequentist methods. (I'm glad others have commented on this today.) I'm specifically thinking about a paper that just wailed on p-value interpretation.

I'm fine with pointing out the problems with a theory, especially how to interpret probability. But treating probability as belief has the same severity of difficulties as treating it as frequency in repetition. I don't know why someone would be compelled to write a whole paper about how a method is useless because its premise requires axiomatic assumptions. With methods, there's never a free lunch.

We have a random variable X, whose distribution is parameterized by a. For a given a, we can ask what are the 'typical' values of X. We'll call this set of values T(a). Informally, we want to say "For this value of a, here is a set of typical values of X, and we can say that 95% of the time, a draw from X will be within this 'typical' set." Formally, we could define T(a) as the smallest set such that:

P(X in T(a)|a) = 0.95.

We can then define a confidence interval for an observed x simply as the set of parameter values for which x is in T(a).

Informally, once we observe x, we go through all the possible parameter values a and ask ourselves "Is this value x typical of what we would see if a were the true parameter value?"

One of the points made in the blog is that we should be careful about interpreting narrow confidence intervals. It is tempting to believe such intervals represent great precision. You don't need a contrived example to show this is dangerous.

If a particular value of x is atypical under almost all possible parameter values, then you will get a narrow (or possibly empty!) interval. Even for the parameter values within the interval, the value x may only barely be typical.

Therefore, if you see a narrow or empty interval, your first thought should be "I appear to have stumbled into the 5% of cases where the interval is useless" rather than "I'm going to continue to assume I'm still in the 95% of cases where the values observed are quite typical and the interval contains the true value"

Instinctively, I guess we feel that a narrow interval means that all the "credibility" has been squashed together into a nice small interval. But this isn't a zero-sum game - if x is unlikely under one possible value of a, it may still be unlikely under all the other values of a also.

The statistician, on the authority of the many advocates of
confidence intervals, convinces the rescuers to drop the line
inside the 50% CI. The rescue line misses the hatch by 3.75
meters. By the time the rescue workers realize they have
missed the hatch, there is no more time for another attempt.
All crew on board the submersible are lost

No matter how good or bad your procedure, you will be 'wrong'/'right' some of the time. Whether you were wrong or right in some specific instance is not a valid way of determining whether you did something right or wrong. And it's horrible to be pushing this idea down anyone's impressionable throats.

None deal with the biggest issue with any interval estimates, which is that naive users treat them as having 100% coverage probability (instead of whatever the nominal coverage probability is), that is, unless the naive users are making the even bigger mistake of treating the sample as the population and confusing theta hat with theta.

I think the wording in the abstract is interesting: apparently the confidence coefficient "is thought to index the plausibility that the true parameter is in the interval." Am I right that "index" is being used to mean "has some unspecified relationship with," and "plausibility" means "something like a probability, but not a probability?"

I take some of your arguments and make a brief article on the point estimation problem of finding the handle point: http://ge.tt/80TYRbs1/v/0

T.L.D.R: You could use the minimum or maximum of the sample to get a much better point estimator, using beta distribution. Use both is better, but I can't justify it (sum of beta distribution, anyone?)