Using simulations to evaluate tests

Since the performance of a test depends on the sampling distribution of the test statistic we can evaluate tests via simulation the same way we’ve been simulating sampling distributions.

To do so, we make some assumptions about the population, generate the sampling distribution of the test statistic and then estimate its Type I error rate or power.

To illustrate the technique we’ll explore the power and Type I error rate of the Z-test, when the population is Poisson.

Z-test Review

Let’s start with a single sample, to review the steps in a Z-test. For now, let’s assume the population is Poisson\((\lambda = 2)\), we are taking a sample of size n = 10 and will conduct a level \(\alpha = 0.05\) test:

Our rejection region depends on the direction of our alternative, let’s assume we are interested in the two sided alternative, in which case we reject if \(|Z(\mu_0)| > z_{1-\alpha/2}\), and we do the comparison:

# Reject the null?
abs(z) > qnorm(1 - alpha/2)

## [1] FALSE

Where TRUE would indicate we should reject the null, and FALSE fail to reject the null. In this case we “Fail to reject the Null”.

To make things easy in the following sections, let’s wrap the Z-statistic calculation into a function:

You might want to take the time now to review how to find a p-value, and a 95% confidence interval for this particular sample

Evaluating the test

OK that was for one sample, to evaluate the performance of the test, we need to repeat that for many samples, then take a look at how often we are rejecting the null hypothesis.

Let’s start with evaluating the Type I error rate. Remember the Type I error rate is the probaility of rejecting the null when the null is true, so we need to set up our test to be testing the true population mean \(\lambda\).

sample_z now contains 2000 simulated test statistics, we can take a look:

ggplot() +
geom_histogram(aes(x = sample_z))

Or directly estimate the rejection rate with by finding the proportion that were in the rejection region

mean(abs(sample_z) > qnorm(1 - alpha/2))

## [1] 0.059

We estimate that \(P_{H_0}(\text{Reject Null}) = 0.06\).

What should this number be close to?

If we were interested instead in power, we need to pick specific value for the true population mean that falls in our alternative hypothesis region. Let’s say \(\mu_A = 3\). We need to regenerate our samples with this true mean:

We estimate the power for the alternative \(\mu = \mu_A = 3\), is 0.59.

Would you expect the power to be higher or lower for the alternative \(\mu = \mu_A = 5\)? Why? Simulate it and see.

Would you expect the power to be higher or lower for a larger sample size, n = 100. Why? Simulate it and see.

Would you expect the Type I error rate to be closer or further from \(\alpha = 0.05\) if n = 100. Why?

In HW#2 you have to calculate the power of the Z-test for the null hypothesis \(H_0: \mu = 2.6\), in a situation where the population mean is really \(\mu = \mu_A = 2.7\), the population variance is \(\sigma^2 = 1.96\), and n = 25. You could verify your calculations are in the right ballpark, by assuming a Normal population, and simulating to evaluate the power. Try it.