Confidence Intervals

Consider the following experiment, where we have 25 samples from a Normal distribution with \(\mu=1\) and \(\sigma^2=2\). As an experimenter, let’s pretend we know the variance but have to estimate the mean. Compute a 95% confidence interval.

This plots 200 realizations of a 95% confidence interval. If we are choosing any realization of the confidence interval, then there is a 95% chance the interval contains the true \(\mu\), because we can compute a lot of realizations and see what proportion of intervals contain the true \(\mu\). But when looking at only a specific interval, the ability to compute this probability by looking at repeated experiments is gone, and all we can do is observe the (deterministic) result of whether or not the interval contains the true value.

Why does our 95% confidence interval not imply that there is a 95% chance of including the true \(\mu\)? The simplest explanation in my opinion is that this question is irrelevant to the discussion of confidence intervals. All the math we have surrounding confidence intervals is to answer a different question: how do we construct an interval that includes the true \(\mu\) in 95% of repeated experiments? To answer the question of the probability of an interval including the true \(\mu\), one needs to construct probability differently such that you can compute a probability based off only the single experiment you observed. This requires some (Prof. Storey would say subjective) presupposition of what the experiment looks like (this is called the prior in Bayesian statistics). The ability to compute the probability of the interval containing the true \(\mu\) is part of the intuitive appeal of Bayesian statistics.

p-values and Hypothesis Testing

First, recall that we know the large sample distribution of the sample mean from the CLT:

We’ll stay with the normal random variables situation. We can use the knowledge of this distribution to compute the p-value of the simple test: \(H_0: \mu=1\) versus \(H_1: \mu \neq 1\). This is a two-sided test.