I know how to calculate t-statistics and p-values for linear regression, but I'm trying to understand a step in the derivation. I understand where Student's t-distribution comes from, namely I can show that if $Z$ is a random variable drawn from a standard normal distribution $Z \sim \mathcal{N}(0,1)$ and if $\chi$ is drawn from a $\chi^2_k$ distribution, then the new random variable

$$ T = \frac{Z}{\sqrt{\frac{\chi^2}{k}}} $$

will be drawn from a t-distribution with $k$ degrees of freedom.

The part that confuses me is the application of this general formula to the estimates for the coefficients of a simple linear regression. If I parametrize the linear model as $y = \alpha + \beta x + \varepsilon$, with $\varepsilon$ a random variable with zero mean characterizing the errors, then the best estimate for $\beta$ is

Here $\sigma = \sqrt{\operatorname{Var}(\varepsilon)}$. The part I am confused about is why

$$ t \equiv \frac{\hat{\beta}}{SE(\hat{\beta})} $$

is taken to be drawn from a t-distribution, assuming the null hypothesis. If I could write $t$ in the form of the above variable $T$, cleanly identifying the $Z$ and the $\chi$ variables, then everything would be clear.

$\begingroup$Your initial statement, concluding with "will be drawn from a t-distribution" etc. is true if the numerator and denominator are independent, but not generally true if they're not. $\qquad$$\endgroup$
– Michael HardyJul 2 '16 at 17:51

$\begingroup$Sure, sorry I should have specified that they are independent.$\endgroup$
– Surgical CommanderJul 2 '16 at 21:50

$\begingroup$@SurgicalCommander You can edit your question to contain the correction$\endgroup$
– Glen_b♦Jul 3 '16 at 1:35

Independence of these two things is seen by observing that the vector of residuals is independent of the vector of fitted values. To see that, find the covariance between the vector of residuals and the vector of fitted values, and recall that if two random vectors are jointly normally distributed then they are independent if they are uncorrelated.

(The whole story of why these things have the distributions asserted here would take somewhat longer.)

I know a way to show you why you get a t distribution for this statistic but it's going to require some linear algebra.

You are working with the model

$$y_i = \beta_0 + \beta_1 x_i + \epsilon_i,$$

and I will assume for now on that $\{\epsilon_1,\ldots,\epsilon_n\}$ are i.i.d. from the $N(0,\sigma^2)$ distribution.

Step 1 - distribution of $\hat\beta_1$:

You know that the least square approximation of $\beta_1$ can be written as:

$$\hat\beta_1 = \sum_{i=1}^n\frac{x_i - \bar x}{SXX}y_i,$$

and you can show from this equation that $\beta_1 \sim N_1(\beta_1,\frac{\sigma^2}{SXX}).$

Step 2 - distribution of $RSS$:

Now to continue my answer it is convenient to rewrite our model in matrix form:

$$y = X\beta + \epsilon,$$

And remember that the residual sum of squares can be written as:

$$RSS = y^T(I - P_X)y$$
where $P_X = X(X^TX)^{-1}X'$. Now since $P_X$ is a projection matrix of rank two, $I-P_X$ is also a projection matrix but of rank $n-2$. Now we can see that $RSS \sim \sigma^2\chi^2_{n-2}$ because it is a quadratic form of independent normal variables with common variance on a projection matrix. (The centrality parameter is 0 because $(I-P_X)X = 0$).

Step 3 - Independence of $\hat\beta_1$ and $RSS$:

It remains to prove that $\hat\beta_1$ and $RSS$ are independent. Remember that: