Implement the truncated normal distribution in SAS

This article describes how to implement the truncated normal distribution in SAS. Although the implementation in this article uses the SAS/IML language, you can also implement the ideas and formulas by using the DATA step and PROC FCMP. For reference, I recommend the Wikipedia article on the truncated normal distribution.

The truncated normal distribution contains two parts: a normal distribution N(μ, σ), and an interval of truncation [a, b]. Denote the four-parameter truncated normal distribution by TN(μ, σ, a, b).
There are four essential functions that you need when you are working with a statistical distribution. You need to know how to generate random values, how to compute the density (PDF), how to compute the cumulative distribution (CDF), and how to compute quantiles (inverse CDF).

The example in this article specifies a finite interval, which results in doubly-truncated distribution. However, the functions are written to support a one-sided truncation. To generate a distribution that is truncated on the right, specify a = .M so that the truncation interval is (–∞, b].
To generate a distribution that is truncated on the left, specify b = .P so that the truncation interval is [a, ∞).

A histogram of the random sample is shown at the top of this post. The histogram is overlaid with a curve that shows the PDF for the truncated normal distribution, which is computed in the next section.

The PDF for the truncated normal distribution

Computing the density function for the truncated normal distribution is easy. You simply use the usual normal PDF on [a, b], but scale the density function so that it integrates to one:

Conclusions

The PDF, CDF, QUANTILE, and RAND functions in SAS support distributions that arise often in practice. As shown in this article, you can use these basic distributions as building blocks to define new distributions, such as the truncated normal distribution or the folded normal distribution.

Truncated normal distributions arise in various regression models, such as regression models for the heights of soldiers in the military because there are often height limits for recruits. You can also use the truncated normal distribution to model student scores on standardized tests such as the SAT, for which 200 is the lowest possible score on each test.

About Author

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

12 Comments

The CDF idea is brilliant. What if I want to generate a multivariate normal distribution? For example,
proc iml;
a=randnormal(100,0,1); *Actually a standard normal distribution;
b=randnormal(30,a,I(100)); *Generate a multivariate normal dist from a;
quit;

For the dist b, I would like to truncate b within the interval [-3,3]. I can write a do loop and if then statement to achieve that goal. In CDF call routines, I don't know if there is a multivariate normal distribution.

Multivariate normal distributions have CDFs (I've written about the 2D bivariate normal) but you can't use it as you describe. There are many points that all have the same CDF value. In 2D the inverse image (quantile) of a probability value is a curve, in 3D it is a surface, etc.

You can use the acceptance-rejection technique. I don't know any other nifty tricks, but I hope you will research the problem and report back if you find something.

Rick or viewers, I am new to IML and have not been able to write a code to implement a truncated multivariate normal distribution. I am trying to generate a sample of multivariate correlated data truncated below at 0 and above at 1. For instance, if I wanted to simulate a distribution of "probabilities of an event", then the resulting PDs would need to be truncated over the (0,1) interval. Can anyone provide any guidance? (By the way, thanks to my purchase of Rick's Simulating Data with SAS, I can produce univariate results that are truncated over the interval, but I cannot extend it to multivariate without some additional guidance. Thank you!

I should also mention that I am trying to generate 10,000 random draws, so I generated 25,000 and then simply excluded the draws outside of my desired interval. I have 19 variables, so I created a 25,000 x 19 matrix with each columns reflecting the marginal distribution for the corresponding variable.

I think I read somewhere (perhaps in Dr. Wicklin's text) that this approach could cause bias in the results.

So, I examined the marginal distributions for the individual columns in my output and I found that they do not appear to be normally distributed....nor do theie means and standard deviations match what I previously prescribed....this doesn't seem totally surprising to me. Should a truncated multivariate normal distribution retain the marginal distributions' characteristics? Further, should it retain the prescribed correlation structure that was imposed prior to excluding the results the existed outside of the chosen interval? I expect not.....So, let's assume I have 3 variables, each with a mean of 0.50 and a standard deviation of 0.75 and each with a pair wise correlation of 0.30. How do I simulate a truncated multivariate distribution with these characteristics so that resulting random variates are limited to the interval (0,1) and that correlation structure is retained? It doesn't seem reasonable to me to think that the standard deviation could be retained with the interval restriction. Am I missing something? Thanks!

Discussions of this type are more easily conducted at the SAS Support Communities. To answer your questions: When you truncate data, the marginal distributions and the covariance structure will no longer match what you originally specified. As you say, the variances and covariances must change when you restrict the data to a smaller range.

As stated in the fourth paragraph, the simplest way is to sample from the unrestricted distribution and then reject variates that are outside the truncation interval. See my article on efficient acceptance-rejection algorithms.