Regression owes its name to the phenomenon known as regression toward the mean that arises when a genetically determined characteristic, such as height, is correlated between parent and offspring. This results in a regression line, offspring height on parent height, that is characterized by a tall parent's offspring also being tall but less so, on average, than the parent and, similarly, a short parent's offspring also being short but not as short as the parent. Assuming that the univariate distributions of the parent and offspring are the same, this implies that the expected regression line will be less than the line for large values of the characteristic, and conversely for small values. The two yellow lines correspond to or equivalently , where is the correlation and , the mean, or equivalently . When , the thin purple line, the expected regression line, is always between the two yellow lines demonstrating regression toward the mean. The red ellipses show the ellipsoids of concentration corresponding to 95%, 50%, and 5% probability in the bivariate normal distribution. The blue points are data simulated from the regression, and the green line shows the fitted regression line. In some cases, especially when the sample size is small, regression toward the mean may not hold for the fitted regression line. A 3D visualization is also provided. When , only the purple expected regression line , where , and the concentration ellipsoids for the bivariate distribution are shown on a background generated by the DensityPlot[]. See the Details section for further discussion.

Snapshots

Details

"Mere varieties from a common typical centre blend freely in the offspring, and the offspring of every race whose statistical characters are constant, necessarily tend, as I have often shown, to regress towards their common typical centre." [1, p. 211].

Let be bivariate normal with means and standard deviations equal to 70 and 2, respectively, and correlation coefficient . The thumbnail shows a scatter plot of a random sample of size from this distribution. The parameters 70 and 2 were chosen to represent a hypothetical male population with average height 70" and standard deviation 2".

Snapshot 1: with , the green line shows the theoretical regression line

Snapshot 2: the 3D visualization shows the Histogram3D plot of the data shown in the thumbnail; drag for a better view of the green and yellow lines

Snapshot 3: with , a 3D plot of the bivariate normal distribution is shown; drag to rotate the green and yellow lines into view

Snapshot 4: regression toward the mean may not hold for the fitted regression line when the sample size is small