Second New Feature: Enter your own mean and standard deviation for a skewed population distribution

Often, you know the mean and standard deviation of some skewed population distribution and want to demonstrate what happens when sampling from it. For instance, the College Scorecard publishes mean earnings 10 years after graduation for all colleges in the United States.For graduates from Texas A&M University in College Station, TX, the (population) mean earnings are $60,000 with a (population) standard deviation of $47,000, indicating a right-skewed distribution. You can enter this population mean and standard deviation when checking the "Enter your own population mean and standard deviation" box under the Skewed population distribution option. I also moved the slider that controls the amount of skew all the way to the left to obtain a reasonable shape for the distribution of earnings for all graduated from Texas A&M 10 years after graduation.

You can now ask students what would happen if we randomly sampled 50 graduates from Texas A&M 10 years after their graduation and computed their mean earnings. What range of mean earnings would we expect to see? How far away from the true population mean of $60,000 will the mean earnings fall? What about the shape of the distribution of mean earnings for samples of size 50? As you ask these questions, use the app to generate the answers:

A second drop-down menu will appear, letting you select pre-loaded population distributions such as the life expectancy for (almost) all countries in the world, or the flight delay of all 29,544 flights leaving ATL airport in January 2017 (which has a very skewed distribution). The life expectancy is the default option, and its population distribution is shown in the app (see above), with a population mean of 71.7 years and a population standard deviation of 8.8 years.

You can now ask your students what would happen if we randomly sample 5 countries (of all the 198 countries reporting life expectancy) and compute the average (or mean) life expectancy of these 5 countries. To illustrate this with the app, set the sample size to 5 (you can use the arrows in the box to go down to 5) and press "Draw Sample(s)" once. A histogram will appear that shows the life expectancies of those 5 randomly selected countries. The actual life expectancy values for the 5 selected countries are also shown below the Generate Sample(s) button. The sample mean and standard deviation for this sample of 5 are shown in the title of the histogram. For our example below, the sample mean was 67.3 and the sample standard deviation was 7.3.

You can use one of the options to the left to vary the binsize of the histogram, or to zoom into it, i.e. adjust the range of the x-axis. You should also ask your students if this distribution looks normal, and hopefully they will pick up the slight left skew of this sampling distribution. (One option on the left allows you to overlay a normal distribution.)

You can now demonstrate what happens if the sample size gets larger. Let's increase it to 10. Just click on the arrow up in the "Select sample size (n)" box once to get to a sample size of 10. The histogram of the sampling distribution will update with a fresh set of 10,000 generations of samples of size 10:

You can mention that with a sample size of 10 the sample means tend to fall closer to the population mean, by pointing out the smaller range of the sampling distribution when n=10. This is nice to show using the app by just toggling between a sample size of 5 and 10 by pressing the arrow up and arrow down button in "Select sample size (n)". Finally, increase the sample size to n=35, say, and point out that it now looks fairly normal:

STATISTICS

THE ART & SCIENCE OF LEARNING FROM DATA

AGRESTI · FRANKLIN · KLINGENBERG

This is a good example to tell your students that in statistics, sometimes "large" (or infinity) means n=35, but sometimes n=3000. If you have an interesting population distribution that you want to be implemented, please let me know!

Flight arrival delays

For the preloaded population distribution about the delay (in min.) of all flights arriving at Atlanta airport in January of 2017, a sample size of 35 will not result in a bell-shaped sampling distribution because the population distribution is so skewed. (Note that I needed to zoom into the x-axis for the sampling distribution to see it more clearly. This is not shown in the screenshot.)

The sample mean of 67.3 is indicated with a blue triangle, and one sees that it falls a bit below the population mean of 71.7, indicated with an orange half circle. These two values are also shown in the bottom plot, which will keep track of all sample means generated. To see this in action, click on the Draw Sample(s) button a few more times and point out to your students how the sample mean varies around the population mean, for a nice visualization of sampling variability. After you get tired of clicking, opt to generate 10,000 samples (of size 5) at once, and you should get a nice visualization of the sampling distribution: