What is the standard error of the sample mean?

Hi! "Standard error" is an important idea. Each year I view understandable attempts to comprehend it first via words. Maybe that can be done, but I think you might want to see numbers first, in this case. I'd like to illustrate Standard Error, with the goal to comprehend wikipedia's fine definition: standard error is the standard deviation of the sampling distribution of a statistic.(note the SE is indeed a standard deviation. So by the end of this post i hope it's clear why the terminology switches ....)

Consider a single six-sided die, my favorite random variable . A single die is described accurately by a uniform discrete distribution: the probability of each outcome {1, ..., 6} is equally likely. We can write Pr[X = 1] = 1/6; i.e., the probability of rolling a '1' is 1/6.

Below I pasted a snapshot of a simple excel sheet I just prepared. It contains 1,000 rows. One row for each "trial" or "simulation," where a trial is simply rolling the one six-sided die. The second column contains the outcomes, produced by a random number generator, which are random integers in the set {1, 2, 3, 4, 5, 6}. So, the second column just simulates rolling a single six-sided die 1,000 times. That is, first roll is a 5. Second roll is a 2. Third roll is a 1 ... and the 1,000th roll is a 6. I "hid" (data > group) most of the rows so I can focus on the 10th, 100th and 1,000th trials.

About the final three columns:

Sample average: the average of the all of the dice up to that point. The first average = 5.0 because only one five has been rolled. The second average = 3.5 = (5+2)/2. The third average = 2.667 = (5+2+1)/3. The fourth average = 3.5 = (5+2+1+6)/4. The fifth average is the average of a Yahtzee-like roll, as if we threw five dice simultaneously and got a 5, 2, 1, 6 and 2.

Population standard deviation: If we look at the row which simulates rolling four dice (n = 4), the population variance is the average squared difference between the observation and the mean (which is 3.5). So the population variance = [(5-3.5)^2+(2-3.5)^2+(1-3.5)^2+(6-3.5)^2]/4 = 2.062^2, so the standard deviation = 2.062; i.e., each value's difference from the average is squared, and those are averaged (sum and divide by 4), then take the square root for the standard deviation. This variance of 2.062^2 is, in fact, the variance (aka, second moment about the mean) of this set of four numbers: {5,2,1,6}

Sample standard deviation: but the {5,2,16} is not really the total population. It's really the sample we happened to get; if we rolled another four, we'd get a different outcome. Put another way, we can't necessarily infer the uniform discrete distribution just be seeing the four values {5,2,16}. If we have in our possession Pr[X=x] = 1/6, then we know the population distribution; but if we merely have a set of {5,2,1,6} then we just have a very small sample which could be produced by any of several distributions (or none at all, maybe it is just empirical). Because this is a sample, we want the slightly larger SQRT([(5-3.5)^2+(2-3.5)^2+(1-3.5)^2+(6-3.5)^2]/3]) = 2.380. Which is variance/standard deviation (divide by n? divide by n-1?) is "correct?"Neither is incorrect, they are both humble estimators, each with properties we may or may not desire! Both formulas can be used as estimators of the "unknowable true" population standard deviation. But if we want the estimator which is "unbiased" we want the sample StdDev which divides by (n-1) instead of (n); I remember that by thinking "it's just a sample, let's use the more conservative [larger] number"

So this "Monte Carlo Simulation" offers the following ideas:

We can compute exactly the standard deviation of the uniform discrete distribution, P[X = x] = 1/6. We only need the incredibly useful variance formula: variance (X) = E(X^2) - [E(X)]^2, so that the (ex ante) standard deviation of a six-sided die = SQRT[91/6 - 3.5^2] = 1.707825. But that's applying the "perfectly clean" function which is perfectly knowledgeable about the distribution.

The simulation produces samples of various sizes for us, none of which is a perfect realization or manifestation of the uniform discrete function; for example, the sample variance of the set of ten dice is not nearly 1.707... but rather 2.044.

Now a key idea that is difficult: both the sample average and sample variances are statistics that are themselves random variables. See how, at n=9, the sample average = 2.899. At n=10, sample average = 3.2. At n = 100, sample average = 3.370. The sample average itself is moving around like a random variable. But at the same time, as n increases (e.g, n = 1,000), the sample average seems to be converging with less variability toward 3.5 (and the sample standard deviation, also, seems to be "attracted" to 1.707).

In this way, the sample mean has its own dispersion as a random variable. The central limit theorem tells us the variance of the sample mean = sample variance/n, such that the standard deviation of the sample mean = sample standard deviation/SQRT(n). For example, at n = 100, the sample standard deviation = 1.773, and therefore the standard error of the sample mean = 1.773/SQRT(100) = 0.1773. At n = 1,000, the standard error = 1.734/SQRT(1000) = 0.054834.

The CLT tells us, if we roll 1,000 dice, the sample average will still bounce around but only a little, as its standard deviation is only 0.0548, the range of the sample mean for such a sample is within a narrow 3.5 +/- 0.054834*deviate, where the deviate is function of our confidence level. CLT goes further, telling us we can assume (this part amazes me to this day!) the sample mean is normally distributed! Somehow a single uniform die, which is not even close to normally distributed, produces a SAMPLE AVERAGE that tends to look normally distributed as the sample increases.

To recap, under this particular sample, if we roll 1,000 dice, the sample standard deviation is 1.743, which is a measure of the dispersion of the set of 1,000 observed values. The standard error (of the sample mean) = 1.743/SQRT(1,000) = 0.054843, which is the dispersion of the sample mean itself. Standard error is a standard deviation, but where the distribution is not the observed set of outcomes, but the distribution associated with a statistic that summarizes the sample. The standard deviation is more directly observable than the standard error: we have the pile of die outcomes right in front of us (a 5,then a 2,then a 1, then a 6), but we don't exactly have a pile of sample averages (we have one sample average, but the other sample averages probably exist only in our imagination), those may exist only in our imagination or repeated samples. The standard error is the standard deviation of the sampling distribution of a statistic.

Finally, we are referring here only to the standard error of the sample mean. But the sample variance has a standard error (i.e., the sample variance is a random variable, too ...); so, too, does a VaR quantile have a standard error. This matters because, in general, we are only dealing with sample statistics and we are interested in their variability. We can then measure any statistic and ask, what is the standard error of this statistic that i just measured (average return, sample variance, sample VaR, VaR backtest) so that we can evaluate the significance of the statistic. If we measure, for example, an alpha of +40 basis points per month, the standard error is the dispersion of that statistic and the smaller it is, the more we can "trust" the statistic. Or, if we backtest a 99% VaR and observe that VaR was exceeded 2% instead of 1% of the time, the standard error of the VaR will give us a way to decide if the VaR is really broken (or maybe it's just random sampling variation)

Hi David thanks for the clarification. I think I have a better grasp of it now. The jist of it to me seems that the SE would be the deviation of the different samples... so if you took trials 1-4 as a sample 5-8 as a sample and 9-12 as a sample etc. and took the means(or variances/deviations) of these samples the deviation between these samples would be the SE. If you then increased the size of the samples (say from 4 to 5) you would be increasing the denominator and therefore decreasing the standard error which would make them more accurate as the sample size is larger.

I'm not sure I'm following the example of backtesting the VAR and exceeding 2% instead of 1% of the time but I think I will get a better idea of that as I see more examples. As long as what I said above makes sense to you then I think I have an understanding of standard error now.

Hi Matt, Yes I think that is a great way to look at it! In your experiment, you are fixing the sample size at n = 4, then collecting "mini samples," each with it's own mean (or whatever statistic) and the standard error is the dispersion (deviation) of that mean-as-a-statistic-which-describes-the-sample, and we can refer to the set of means = {average of #1-#4, average of #5-#8, average of #9-12 ....}. Thanks!

Your stuff is awesome really helped me out.
I am MBA student and working on Thesis. My topic is "Value at risk in karachi stock exchange by technique of historical simulation"
the problem is what will be the hypothesis regarding this topic. I am very upset please please help me out
I need relevant published papers of the same topic. I need this on urgent basis.

Hi Junaid, thanks for the kind words. I can't do better than you with respect to searching for papers, sorry: I tend to use google like most folks. The thesis that jumps out to me is a backtest of HS against actual exchange date; i.e., on each historical date (e.g., T - 250 days), the HS VaR could be computed, then that HS VaR could be tested for its efficacy over the subsequent period (e.g., 99% HS VaR would expect to be exceeded on 1% of days? How did it perform? With statistical significance) ... so, just a backtest to figure out basically, "If HS VaR had been employed, would it have worked?" Just a thought, good luck to you!

Sir, i am working on comparative analysis between Historical simulation (HS) and Variance Covariance method (VCV).
There are 3 portfolio contains 5 equities each and i assumed that i have equally invested my capital on
these portfolios. I applied both methods on each portfolio HS & VCV.
Now, I want to analyze which method help to select optimum portfolio. I mean which one is better.
I know it is determine by backtesting. But i dont know how to do back testing.
Sir, in this situation i really need your help. Please guide me how do perform backtesting according
the criteria that i have mentioned above. or is there any other easy way to analyse other than backtesting ?

Hi Junaid, Thank you for trusting me with advice, but alas, I am currently producing videos and I never have free time in the weeks before the FRM exam (e.g., I have several customer inquiries outstanding in addition to the content obligatins). Plus, it is non-trivial so summarize backtest mechanics. Conceptually, as Jorion illustrates in Chapter 6 of his book (Value at Risk, 3rd edition), you go to a historical date (e.g., Jan 1, 2010) and compute the VaR measure on the historical dates (e.g., on Jan 1 2010, 99% 10-day HS VaR would have been $1.0 million or $X). Then you observe the number of "exceptions/exceedences;" e.g., during 2010, daily loss exceeded the VaR on 5 days or y days. So, VaR expected 1%*250 = 2.5 but 5.0 were realized. As exceed/not is a Bernoulli, the decision to accept/reject can be framed under a binomial decision. If you get a chance to look at Ch 6 of Jorion, it may help; for deeper treatment, I always like Carol Alexander. Good luck! Thanks,

1) Create a model with whatever rules for prediction.
2) pretend you are at time t-tau
3) forecast t-tau+1 using time t- tau data. ( do this for whatever the relevant horizon is)
4) run this in a loop up until today
5) take the realized historical data and compare with your forecasted data.
Very easy... Use something like Matlab or R to do the forecasting/looping/back testing quickly