On Mar 23, 9:16 pm, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote:> I'm trying to demonstrate numerically (rather than algebraically) that> the expectation of the sample variance is the population variance, but> it's not working for me.>> Some quick(?) background... please correct me if I'm wrong about anything.>> The variance of a population is:>> ?^2 = 1/n * ?(x-?)^2 over all x in the population>> where ^2 means superscript 2 (i.e. squared). In case you can't read the> symbols, here it is again in ASCII-only text:>> theta^2 = 1/n * SUM( (x-mu)^2 )>> If you don't have the entire population as your data, you can estimate> the population variance by calculating a sample variance:>> s'^2 = 1/n * ?(x-?)^2 over all x in the sample>> where s' is being used instead of s subscript n.>> This is unbiased, provided you know the population mean mu ?. Normally> you don't though, and you're reduced to estimating it from your sample:>> s'^2 = 1/n * ?(x-m)^2>> where m is being used as the symbol for sample mean x bar = ?x/n>> Unfortunately this sample variance is biased, so the "unbiased sample> variance" is used instead:>> s^2 = 1/(n-1) * ?(x-m)^2>> What makes this unbiased is that the expected value of the sample> variances equals the true population variance. E.g. see>> http://en.wikipedia.org/wiki/Bessel's_correction>> The algebra convinces me -- I'm sure it's correct. But I'd like an easy> example I can show people, but it's not working for me!>> Let's start with a population of: [1, 2, 3, 4]. The true mean is 2.5 and > the true (population) variance is 1.25.>> All possible samples for each sample size > 1, and their exact sample> variances, are:>> n = 2> 1,2 : 1/2> 1,3 : 2> 1,4 : 9/2> 2,3 : 1/2> 2,4 : 2> 3,4 : 1/2> Expectation for n=2: 5/3>> n=3> 1,2,3 : 1> 1,3,4 : 7/3> 2,3,4 : 1> Expectation for n=3: 13/9>> n=4> 1,2,3,4 : 5/3> Expectation for n=4: 5/3>> As you can see, none of the expectations for a particular sample size are> equal to the population variance. If I instead add up all ten possible> sample variances, and divide by ten, I get 1.6 which is still not equal> to 1.25.>> What am I misunderstanding?

The formulae are correct only for a population witha Gaussian distribution. The distribution of yourtest population [1, 2, 3, 4] is not Gaussian, andits difference from normality is enough to givethose differences in the sample variances.--