Question about vaiance, population and sample

In the October edition of the magazine, Active Trader, a reader writing in Chat Room, "Deviating from deviation?" asks that in explaining last month the viaiance, why did you not in your example divide by two?

{(8-9)^2 + (9-9)^2 +(10-9)^2}/3 = .667.

The explanation given is nothing more than,'That's how it is done,' and completely ignores, adding, "We're not math majors," the difference between the sample diviation and the population deviation. (There is no explanation of where the above example come from, and probably it is nothing but an equation invented by the writers.)

Elementary statistic books do a very poor job of explaining WHY that difference occurs, saying such as "It eliminates bias," or even "It makes the theory work out better, and isn't worth going into."

Does anyone have a good explanation of why there is that distinction, and assuming it is a sample deviation, why is it better to divide by 2 than by 3?

Does anyone have a good explanation of why there is that distinction, and assuming it is a sample deviation, why is it better to divide by 2 than by 3?

In the estimators section of your statistics text, you should get, either as a problem or example, a simple calculation that shows the "divide by n" estimator of population variance for variance of samples is biased, while "divide by n-1" is unbiased. Are you looking for an intuitive answer ?

Well, I have made up several examples about dice, but when the number of trials falls, say three throws of the dice, this greatly changes the variance.

This is my example, the population is the six sides of a dice, mean is 3.5 on a throw, variance is 2.92. Now if we throw three times, and get a perfectly reasonable outcome: 2,3,4. The mean is 3, and dividing by 2 the variance is 1, where as dividing by 3 it would have been 2/3. In neither case are we near 2.94. Thanks, bob

This is my example, the population is the six sides of a dice, mean is 3.5 on a throw, variance is 2.92. Now if we throw three times, and get a perfectly reasonable outcome: 2,3,4. The mean is 3, and dividing by 2 the variance is 1, where as dividing by 3 it would have been 2/3. In neither case are we near 2.94. Thanks, bob

Run this experiment a million times, and look at the average value for the variance that you compute.

"It may seem surprising that the Expected value of the sample variance is slightly less than the population variance. The reason is that sum of the squared deviations of a set of observations from their mean is always less than the sum the squared deviations from the population mean."

It may seem surprising that the Expected value of the sample variance is slightly less than the population variance.

On page 129-130, Principles of Statistics, we have this problem gone into, though here a few additional details are presented. He writes:

[tex](S^2)= \sum(x_i-X)^2=\sum(x_i-\mu)^2-N(X-\mu)^2[/tex]
for the above S^2 is as defined, N is the number of samples, mu is the mean, X is the sample mean, each X_i represents a variable that takes on various sample values.

Now the point is to find the expectation, E. We have:
[tex]E(\sum(x_i-\mu)^2 = N\sigma^2[/tex], where sigma is the STD.

For the second term, E(x)=mu, and we have [tex]N*E(X-\mu)^2=N*(EX^2-(EX)^2)
[/tex]
The later term after N is [tex]V(X)=\frac{V(NX)=\sum V(X_i)=N\sigma^2}{N^2}[/tex]

Thus returning to the original equation we have:
[tex]E(S^2)=N\sigma^2-\sigma^2=(N-1)\sigma^2. [/tex]

Author adds: "Because of this fact S^2 is often divided by N-1 instead of N in order to obtain an unbiased estimate..."