Producing normal density plots with shading

When teaching statistics, it is often useful to produce a normal density plot with shading under the curve. For example, consider a one-sided hypothesis test. An alpha value of .05 would correspond to a Z-score cutoff of 1.645. This means that 95% of a standard normal curve falls below a value of 1.645. This also means that 5% of a standard normal curve falls above 1.645. So, how might we demonstrate these concepts graphically in SAS?

Graphing a normal curve without any shading is straightforward. To begin with, we create a data set containing the values for the x-axis, the values of the standard normal pdf, and a final variable set to zero.

The variable LOWER, which was set to 0, was included to show that the PDF values asymptote at zero for high or low values of X. This is not essential to the plot, but it adds a little extra clarity. I also changed a few other plot options (i.e., removing the legend, removing the border, removing the y-axis, and specifying the colors for the lines) to simplify the appearance of the plot.

Adding shading to a normal PDF plot requires a few extra steps. SGPLOT does not allow us to directly specify a shape to be shaded, but it does allow for shading between two lines, or bands, using the BAND statement. For example, we can create a standard normal PDF by adding a band between 0 and the PDF values:

We are not usually interested in shading the entire area under the curve, however. Instead, we are more likely to want to shade an area that is below or above some cutoff. For example, I previously mentioned that 95% of a standard normal curve falls below a value of 1.645. To demonstrate this, we can shade the area of a standard normal curve that falls below the cutoff of 1.645. Unfortunately, we cannot simply tell SGPLOT to only display the band below a particular value of X. Instead, for values of X that should not be shaded we can set the size of the band to zero, producing a line rather than a band. To do this, we add a new variable, UPPER, to the data set. Upper is equal to the standard normal PDF for values of X that we wish to be shaded, and zero otherwise.For example:

Notice that we changed the inequality used in the IF statement from <= to >. Re-running the same SGPLOT syntax that we used earlier produces the following plot:

We can adjust the syntax to create more complex figures as well. To make these graphs easier to create (and to keep the syntax concise), I created a macro called NORMALPDF. The macro generates the DATA step and SGPLOT syntax necessary for creating these graphs. For example, we might want to create a plot of a standard normal PDF with shading between -1.96 and 1.96. These cutoffs correspond to a two-sided Z-test with alpha=.05. As such, 95% of the distribution falls between -1.96 and 1.96. We might further want to add vertical reference lines with labels at the two cutoff values. The macro call would be as follows:

The macro parameters LOWER_CUTOFF1 and UPPER_CUTOFF1 reflect that we are shading an area that is truncated on both its lower and upper ends. This syntax produces the following plot:

The previous plot showed the standard normal PDF with shading between -1.96 and 1.96. We might instead be interested in shading the tails of the distribution. That is, we might want to create a plot of a standard normal PDF with shading below -1.96 and above 1.96. The macro call would be as follows:

Notice that different parameters were used for the previous macro call. Rather than employing LOWER_CUTOFF1 and UPPER_CUTOFF1, I specified UPPER_CUTOFF1 and LOWER_CUTOFF2. This was done to reflect that two separate bands of the distribution were to be shaded. The first band was shaded from -4 (the lowest value on the graph) to -1.96. The second band was shaded from 1.96 to 4 (the highest value on the graph). The following graph would be generated:

The macro also allows the distribution moments and the x-axis values to be quickly updated. For example, the SAT was traditionally scored to have a mean of 500 and a standard deviation of 100. If a person were to score a 650 on the SAT, we might want to depict the proportion of test-takers falling below that score. To do this, we first change the mean and standard deviation of our distribution to be 500 and 100, respectively. We also need to ensure that the graph shows the appropriate range of values. So, we specify x-axis values ranging from 200 to 800, with a tick mark at every multiple of 100. Finally, we add a cutoff at 650 with a corresponding label. The syntax and resulting graph are below.

5 Comments

Thanks, Jorge, for taking the time to comment! I hope the macro is helpful. I've met multiple professors who have written code for producing these or similar plots. Unfortunately, nobody has posted their code online. Hopefully this post will save a lot of people the time of writing their own macros.

Congratulations on your first blog post, Stephen. You can use this same trick for showing probabilities for nonnormal distributions and for applications such as computing the definite integral under a curve. I like to use the REFLINE statement to display the lower axis, rather than a second SERIES statement. Also, I'd recommend that you use (&std / 100) as the stepsize in the macro, which is more robust than the hardcoded value 0.001.

Thanks, Rick! You are absolutely correct that this trick is not limited to the normal distribution. Using the REFLINE statement for the lower axis also makes a lot of sense. When shading is included in the figure, however, the variable upper sometimes takes on the value of the density and sometimes takes on a value of zero. Using a second SERIES statement to create a line with a density value of zero ensures that the line at zero is consistent across all values of X. As for changing the stepsize in the macro, it is already set to .001 * &sd. Switching to &sd/100 would result in fewer steps (800 steps rather than 8000), but would not improve the robustness of the macro.