Introduction

In previous tutorials, we examined the use of OLS to estimate model parameters. One of the assumptions of the OLS model is that the error terms follow the normal distribution. This tutorial is designed to test the validity of that assumption.

In this tutorial we will examine the residuals for normality using three visualizations:

A histogram of the residuals.

A P-P plot.

Q-Q plot.

Estimate the model and store results

As with previous tutorials, we use the linear data generated from

$$ y_{i} = 1.3 + 5.7 x_{i} + \epsilon_{i} $$

where $ \epsilon_{i} $ is the random disturbance term.

This time we will stored the results from the GAUSS function ols for use in testing normality. The ols function has a total of 11 outputs. The variables are listed below along with the names we will assign to them:

Create a histogram plot of residuals

Our first diagnostic review of the residuals will be a histogram plot. Examining a histogram of the residuals allows us to check for visible evidence of deviation from the normal distribution. To generate the percentile histogram plot we use the plotHistP command. plotHistP takes the following inputs:

myPlot

Optional input, a plotControl structure

x

Mx1 vector of data.

v

Nx1 vector, breakpoints to be used to compute frequencies
or
scalar, the number of categories (bins).

The above code should produce a plot that looks similar to the one below. As is expected, these residuals show a tendency to clump around zero with a bell shape curve indicative of a normal distribution.

Create a standardized normal probability plot (P-P)

As we wish to confirm the normality of the error terms, we would like to know how this distribution compares to the normal density. A standardized normal probability (P-P) can be used to help determine how closely the residuals and the normal distribution agree. It is most sensitive to non-normality in the middle range of data. The (P-P) plot charts the standardized ordered residuals against the empirical probability $ p_{i} = \frac{i }{N+1}\ $ , where i is the position of the data value in the ordered list and N is the number of observations.

Plot the cumulative probabilities on the vertical axis against the empirical probabilities on the horizontal axis

1. Sort the residuals

GAUSS includes a built-in sorting function, sortc. We will use this function to sort the residuals and store them in a new vector, resid_sorted. sortc takes two inputs:

x

Matrix or vector to sort.

c

Scalar, the column on which to sort.

//Sort the residuals
resid_sorted = sortc(resid, 1);

Note: When sortc is used to sort a matrix, it does not return a matrix in which all columns are in ascending order. Instead, it sorts a specified column in ascending order and rearranges the other columns so that the rows stay together.

2. Calculate the p-value of standardized residuals

We need to find the cumulative normal probability associated with the standardized residuals using the cdfN function. However, we must first standardize the sorted residuals by subtracting their mean and dividing by the standard deviation, $ \frac{x-\hat{\mu}}{\hat{\sigma}}\ $. This is accomplished using GAUSS's data normalizing function, rescale. With this function, data can be quickly normalized using either pre-built methods or user-specified location and scaling factor. The available scaling methods are:

Tip: When evaluating competing models, it is sometimes preferred to use the same location and scale parameters to scale the test or validation set as were used to scale the training set. For this reason, rescale will return the location and scale parameters when a named scaling method is used.

3. Construct a vector of empirical probabilities

We next find the empirical probabilities, $ p_{i} = \frac{i}{N+1}\ $ , where $i$ is the position of the data value in the ordered list and $N$ is the number of observations.

Tip: We could have used a for loop to generate the empirical probabilities. However, it is most computationally efficient to use matrix operations in place of loops. For this reason we use the vector i rather than looping through single values of i.

4. Plot the cumulative probabilities on the vertical axis against the empirical probabilities

Our final step is to generate a scatter plot of the sorted residuals against the empirical probabilities. Our plot of shows a relatively straight line, again supporting the assumption of error term normality.

Note: In this graph we use the plotSetTextInterpreter to indicate to GAUSS that we wish to use LaTeX syntax to label our graph.

Create a normal quantile-quantile (Q-Q) plot

A normal quantile-quantile plot charts the quantiles of an observed sample against the quantiles from a theoretical normal distribution. The more linear the plot, the more closely the sample distribution matches the normal distribution. To construct the normal Q-Q plot we follow three steps:

Arrange residuals in ascending order.

Find the Z-scores corresponding to N+1 quantiles of the normal distribution, where N is the number of residuals.

Plot the sorted residuals on the vertical axis and the corresponding z-score on the horizontal axis.

1. Arrange residuals in ascending order

Because we did this above we can use the already created variable, resid_sorted.

2. Find the Z-scores corresponding the n+1 quantiles of the normal distribution, where n is the number of residuals.

To find Z-scores we will again use the cdfni function. We first need to find the probability levels that correspond to the n+1 quantiles. This can be done using the seqa function. To determine what size our quantiles should be, we divide 100% by the number of residuals.

3. Plot sorted residual vs. quantile Z-scores

The Q-Q plot charts the sorted residuals on the vertical axis against the quantile z-scores on the horizontal axis.
Where the P-P plot is good for observing deviations from the normal distribution in the middle, the Q-Q plot is better for observing deviations from the normal distribution in the tails.

Have a Specific Question?

Need Support?

Try GAUSS for 30 days for FREE

GAUSS is the product of decades of innovation and enhancement by Aptech Systems, a supportive team of experts dedicated to the success of the worldwide GAUSS user community. Aptech helps people achieve their goals by offering products and applications that define the leading edge of statistical analysis capabilities.