Regression analysis is used for modeling the relationship between a single variable Y (the outcome, or dependent variable) measured on a continuous or near-continuous scale and one or more predictor (independent or explanatory variable), X. If only one predictor is used, we have a simple regression model, and if we have more than one predictor, we have a multiple regression model. Let’s download the data coming together with the book. There’s a choice of various format, including R:

The output shows in order: the model specified, the distribution of residuals (which should be normal with mean = 0 and so median close to 0), the coefficients and their standard errors, the standard error of the residuals, the and adjusted and then a F test for the null hypothesis (all the coefficients equal zero except the intercept, i.e. is our model better to explain variance than a model predicting the dependent variable mean for all data points). But where’s the ANOVA table?

How would you calculate this F-test yourself? The F value is given by , with SST the total sum of square, SSE the sum of square for the residuals, n and p the degrees of freedom for total and residuals. We thus have to get a p-value.

Next we could plot actual values and intervals for prediction both for the mean and new observations. We first create a new data frame to hold the values of parity for which we want the predictions, then compute the prediction and confidence bands and plot them with ggplo2.

Finally, let’s have a look at a multiple regression and see how we can assess the significance of predictor variables. We will look at the effect of milk production, parity, retained placenta and vaginal discharge on the waiting period of the cows.

The F-test tells us if any of the predictors is useful in predicting the length of the waiting period. The F-statistic is significant and we can reject the null hypothesis that no predictors are useful. If we had failed to reject the null hypothesis, we would still be interested in the possibility of non-linear transformations of variables (already done here) or the presence of outliers, masking the true effect. Also we could have not enough data to demonstrate a real effect. Now that we rejected this null hypothesis, maybe not all predictors are necessary to predict the response.
We can see from the summary table which predictors are significant. However we can test a single predictor, with an F-test:

or with a t-test, as with degrees of freedom. Note that equals the F-statistic.
The same approach can be used to test a group of predictors. For example, if we want to see if we have to keep both milk production and its squared value in the model: