Regression to the Mean

Have you wondered why tall parents usually have smaller children, and vice versa? Galton discovered this and wondered why the children seemed so 'mediocre' than their parents. He then proceeded to coin this phenomenon 'regression towards mediocrity' [1].

Now named 'regression to the mean', this phenomenon is now considered an important factor to consider for valid research. When a variable that involves some randomness deviates significantly from a norm, regression to the mean states that it is likely a subsequent measurement will return a value that is closer to the expected norm.

Measurements 1 and 2 were taken from the same material and is represented by the red and blue lines, respectively.

This graph shows data for two measurements of a liquid and solid heterogeneous sample for its G'' values at different stresses. For each measurement, a minute amount of sample is taken from a container and placed into the measurement device. What is important is the massive fluctuations in the first measurement at low stresses is lessened and returns closer to its expected norm (represented by the gray dotted line) after the second measurement.

Why does this happen?

Heterogeneous mixtures can make taking a
representative sample for testing more difficult.

Significant fluctuations from the norm can be caused by many factors such as human error, however regression to the mean is most applicable to non-systemic random errors. An example of such as error would be taking non-representative samples for testing (a sampling error). In order to get a certain measurement from a sample we need to collect a smaller representative sample from it to test. Random errors arise when taking such a sample, especially with heterogeneous mixtures. This is most likely the case for the example above; instead of taking a representative sample from this mixture in both measurements, it is likely that the sample taken from the container in measurement 1 was non-representative and contributed to unexpected deviations from the norm. Lastly, if we consider a common normal distribution we know that potential outlier values can result on rare occasions. Subsequent measurements are more likely to fall near the mean since it is more probable such values are to occur. This is actually one of the methods taken advantage of in p-hacking, where researchers purposely manipulate their research in a way that produces a rare and significant result that is often non-reproducible.

How it applies to research

Experimental research, where tests are conducted on two different experimental groups, allows researchers to accurately determine causal relationships between variables. If research was only done on one group of participants, regression to the mean would most likely produce favorable results for the researcher. Consider a clinical trial of a drug that is used to cure a disease. The researcher would gather a bunch of participants sick from such a disease, and allow the participants to take the drug. Regression to the mean would state that the most unhealthiest of the participants would likely get better as their health would get closer to the 'average human' - healthy. The researcher may then conclude that the drug is the cause of the participants getting better, but this is simply a case of regression to the mean. Testing on two representative groups - where one is given the drug in question and the other either a placebo or a competitor's drug, eliminates any biased conclusions that are based off regression to the mean.

As another example, research was done on if praise or punishment was better for pilots in terms of their quality of flight landings. When pilots made a landing which was better than average, they were praised. If they made a particularly bad landing, they were punished. Since regression of the mean states that in either case it is likely the pilot will subsequently return to a more 'average' landing quality, it was incorrectly concluded that punishment is more effective as a teaching tool than praise, since punished pilots whose previous bad landing was out of the norm improved on subsequent landings [2].

An example of the landing quality score of the pilots versus their landing number. Regression of the mean played a big part in the research performed to identify the effectiveness of praise vs. punishment for pilots as shown in this image.

Is regression to the mean applicable to everyday life?

Of course! Have you ever tried searching for a perfect online webpage to do some work (be it a good online grammar checker, plagiarism checker, useful SEO tool, etc.) and stumbled on a horribly made one out of the norm? If you're randomly trying to find another webpage, you're likely to click on a better webpage than the one before. This assumes that the indexing of the webpages involves some randomness which there is. It also assumes that the search engine isn't choosing to showing all horrible online webpages for you.Interested in stock prices and finance? Mean reversion is based off of regression to the mean and states that sudden surges in stock prices will eventually die down and return back to the long-term average. This is based off the assumption that the short term price surge does not attract much attention media-wise and isn't announced prominently.Even these examples don’t top how this phenomenon is the most applicable to life. We've all had bad days and sometimes we wonder whether or not having a bad day yields a succession of these days. When you're having a particularly bad day out of the norm, rest assured mathematically due to regression to the mean, your next day is likely to be better than before! :)Have a nice day!