"Iíve got one population that is a sub-set of another. I know the counts in both, the means and the standard deviations. What stat test can I use to teel if the mean of the sub-set is statistically different from the known mean of the whole? I think a t-test assumes two independent sets of data which these are not."

"Iíve got one population that is a sub-set of another. I know the counts in both, the means and the standard deviations. What stat test can I use to teel if the mean of the sub-set is statistically different from the known mean of the whole? I think a t-test assumes two independent sets of data which these are not."

Halp!

If the mean (say mu) of the total population is known (say mu=mu_0), and the "counts" in the total population are normally distributed (the word "counts" actually suggests that the data are count data. i.e. non-negative integers, rather than continuous normal data), then a one-sample t-test could be used to test whether the sample is from a population with mu=mu_0.

Alternatively, take bootstrap samples from the sample and see how far out †mu_0 is in the bootstrap distribution of the mean.

--------------After much reflection I finally realized that the best way to describe the cause of the universe is: the great I AM.

Another thought: counts data are often Poisson distributed and not suitable for t-tests / ANOVA. However, a simple transformation should work the trick. IIRC, a square root transformation is often best for counts data.

The problem is a bit weird. Suppose it turns out that a t-test says it's very unlikely that the sample was taken from a population with mean mu=mu_0, i.e. has a very small p-value, even though we know for sure that the sample was taken from a population with mean mu_0. Then what? The only sensible conclusion then seems to be that the sampling procedure was "non-random" in some sense. Does that make sense, Oh Bob?

--------------After much reflection I finally realized that the best way to describe the cause of the universe is: the great I AM.

Another thought: counts data are often Poisson distributed and not suitable for t-tests / ANOVA. †However, a simple transformation should work the trick. †IIRC, a square root transformation is often best for counts data.

Or a log-transformation. Or run a glm with a log-link and family=poisson option [R code: glm(counts~1,family=poisson,data=teh.sample] and test whether the intercept is significantly different from what's expected: |intercept - exp(mu_0)|/se(intercept)~N(0,1).

--------------After much reflection I finally realized that the best way to describe the cause of the universe is: the great I AM.

The problem is a bit weird. Suppose it turns out that a t-test says it's very unlikely that the sample was taken from a population with mean mu=mu_0, i.e. has a very small p-value, even though we know for sure that the sample was taken from a population with mean mu_0. Then what? The only sensible conclusion then seems to be that the sampling procedure was "non-random" in some sense. Does that make sense, Oh Bob?

Yes, that makes sense.

Assuming there's a decent amount of data, I wouldn't worry too much about the distribution: if you've got the means and standard errors, you'll be fine. If I had the original data, I would have used a GLM. But if the original data were to hand, we wouldn't have this problem!

I'll let Erasmus explain the sins of log-transformation, and how it relates to cricket.

--------------It is fun to dip into the various threads to watch cluelessness at work in the hands of the confident exponent. - Soapy Sam (so say we all)

"Iíve got one population that is a sub-set of another. I know the counts in both, the means and the standard deviations. What stat test can I use to teel if the mean of the sub-set is statistically different from the known mean of the whole? I think a t-test assumes two independent sets of data which these are not."

Halp!

If the mean (say mu) of the total population is known (say mu=mu_0), and the "counts" in the total population are normally distributed (the word "counts" actually suggests that the data are count data. i.e. non-negative integers, rather than continuous normal data), then a one-sample t-test could be used to test whether the sample is from a population with mu=mu_0.

Alternatively, take bootstrap samples from the sample and see how far out †mu_0 is in the bootstrap distribution of the mean.

Just to clarift above - counts are the population sizes. Oh pivot tables, you harsh mistress!

"Iíve got one population that is a sub-set of another. I know the counts in both, the means and the standard deviations. What stat test can I use to teel if the mean of the sub-set is statistically different from the known mean of the whole? I think a t-test assumes two independent sets of data which these are not."

Halp!

If the mean (say mu) of the total population is known (say mu=mu_0), and the "counts" in the total population are normally distributed (the word "counts" actually suggests that the data are count data. i.e. non-negative integers, rather than continuous normal data), then a one-sample t-test could be used to test whether the sample is from a population with mu=mu_0.

Alternatively, take bootstrap samples from the sample and see how far out †mu_0 is in the bootstrap distribution of the mean.

Just to clarift above - counts are the population sizes. Oh pivot tables, you harsh mistress!

I am still not clear on what the real question is. For example, if the question is "Have I a large enough sample to accurately estimate the mean, and standard deviation of the population?" you might split the data set, and then calculate t, and f for each half, and compare them to the total sample's parameters. If the results indicated that the sub-samples were basically the same, then problem solved.

ETA: There is a statistic to test if one has sampled the SD of a population that is an application of Chebsyev's Theorem, but I am having a caffeine deficit disorder.

Edited by Dr.GH on Mar. 11 2011,10:13

--------------"Science is the horse that pulls the cart of philosophy."

--------------"But it's disturbing to think someone actually thinks creationism -- having put it's hand on the hot stove every day for the last 400 years -- will get a different result tomorrow." -- midwifetoad