TYPE I AND TYPE II ERRORS

The question of how to determine the critical region ideally should de­pend on the cost of making a wrong decision. In this regard it is useful to define the following two types of error.

DEFINITION 9.2.1 A Type I error is the error of rejecting H0 when it is true. A Type II error is the error of accepting H0 when it is false (that is, when Hi is true).

figure 9.1 Relationship between a and p

The probabilities of the two types of error are crucial in the choice of a critical region. We denote the probability of Type I error by a and that of Type II error by p. Therefore we can write mathematically

(9.2.1) a = P(X Є R І Нй) and

(9.2.2) p = Р(ХЄД|#і).

The probability of Type I error is also called the size of a test.

Sometimes it is useful to consider a test which chooses two critical regions, say, R and R2, with probabilities 8 and 1 — 8 respectively, where 8 is chosen a priori. Such a test can be performed if a researcher has a coin whose probability of a head is 8, and she decides in advance that she will choose if a toss of the coin yields a head and R2 otherwise. Such a test is called a randomized test. If the probabilities of the two types of error for Ri and R2 are (ab Pi) and (a2, p2), respectively, the probabilities of the two types of error for the randomized test, denoted as (a, P), are given by

(9.2.3) a = 8c*! + (1 — S)a2 and P = 8px + (1 — 8)P2.

We call the values of (a, P) the characteristics of the test.

We want to use a test for which both a and p are as small as possible. Making a small tends to make p large and vice versa, however, as illustrated in Figure 9.1. In the figure the densities of X under the null and the alternative hypotheses are f(x H(i) and f(x H), respectively. If we con­sider only the critical regions of the form R = {x x > c], a and (3 are represented by the areas of the shaded regions. An optimal test, therefore, should ideally be devised by considering the relative costs of the two types of error. For example, if Type I error is much more costly than Type II error, we should devise a test so as to make a small even though it would imply a large value for (3. Even if we do not know the relative costs of the two types of error, this much is certain: given two tests with the same value of a, we should choose the one with the smaller value of (3. Thus we define

DEFINITION 9.2.2 Let (<xj, (3j) and (a2, (32) be the characteristics of two tests. The first test is better (or more powerful) than the second test if ax ^ a2 and (3X < (32 with a strict inequality holding for at least one of the

If we cannot determine that one test is better than another by Definition 9.2.2, we must consider the relative costs of the two types of errors. Classical statisticians usually fail to do this, because a consideration of the costs tends to bring in a subjective element. In Section 9.3 we shall show how the Bayesian statistician determines the best test by explicit consid­eration of the costs, or the so-called loss function. Definition 9.2.2 is useful to the extent that we can eliminate from consideration any test which is “worse” than another test. The remaining tests that we need to consider are termed admissible tests.

definition 9.2.в A test is called inadmissible if there exists another test which is better in the sense of Definition 9.2.2. Otherwise it is called admissible.

The following examples will illustrate the relationship between a and (3 as well as the notion of admissible tests.

example 9.2.1 Let X be distributed as В(2, p), and suppose we are to test H0: p = У2 against H^. p = % on the basis of one observation on X. Construct all possible nonrandomized tests for this problem and calculate the values of a and (3 for each test.

Table 9.1 describes the characteristics of all the nonrandomized tests. Figure 9.2 plots the characteristics of the eight tests on the a, (3 plane. Any point on the line segments connecting (l)-(4)-(7)-(8) except the end points themselves represents the characteristics of an admissible ran-

TABLE 9.

1 Two types of errors in a binomial examph

Test

R

R a =

P(R | H0)

P =

P(R | Ях)

(1)

0

0,1,2

0

1

(2)

0

1,2

%

15/іб

(3)

1

0,2

y2

%

(4)

2

0,1

У4

?/l6

(5)

0,1

2

3/4

9/ie

(6)

0,2

1

y2

%

(7)

1,2

0

3/4

Уіб

(8)

0,1,2

0

1

0

1

P

y> -(2)

3

4

1

2

-(3) (4)

.(5)

*(6)

1

4

(7)

N^_(8)

0

і i

4 S

3

4

1 “

FIGURE

9.2

Two types of errors in a binomial example

domized test. It is clear that the set of tests whose characteristics lie on the line segments constitutes the set of all the admissible tests. Tests (2), (3), and (5) are all dominated by (4) in the sense of Definition 9.2.2. Although test (6) is not dominated by any other nonrandomized test, it is inadmissible because it is dominated by some randomized tests based on (4) and (7). For example, the randomized test that chooses the critical regions of tests (4) and (7) with the equal probability of У2 has the characteristics a= У2 and 3 = У4 and therefore dominates (6). Such a randomized test can be performed by choosing H0 if X = 0, choosing Нг if X = 2, and, if X = 1, flipping a coin and choosing H0 if it is a head and Hі otherwise.

In Definition 9.2.2 we defined the more powerful of two tests. When we consider a specific problem such as Example 9.2.1 where all the possible tests are enumerated, it is natural to talk about the most powerful test. In the two definitions that follow, the reader should carefully distinguish two terms, size and level. In stating these definitions we identify a test with a critical region, but the definitions apply to a randomized test as well.

DEFINITION 9.2.4 R is the most powerful test of size a if a (R) = a and for any test Ri of size a, (3(.R) ^ 3№)- (It may not be unique.)

DEFINITION 9.2.5 R is the most powerful test of level a if a(R) < a and for any test Ri of level a (that is, such that oi(fii) ^ a), p(f?) ^ 3№)-

We shall illustrate the two terms using Example 9.2.1. We can state:

The most powerful test of size Vi is (4).

The most powerful nonrandomized test of level % is (4).

The most powerful randomized test of size % is 3A • (4) + Vi • (7).

Note that if we are allowed randomization, we do not need to use the word level.

EXAMPLE 9.2.2 Let X have the density

(9.2.4) / (x) = 1 — 0 + x for 0 — 1 < x < 0,

= 1 + 0 — x for 0 < x ^ 0 + 1,

= 0 otherwise.

We are to test H0: 0 = 0 against 0 = 1 on the basis of a single observation on X. Represent graphically the characteristics of all the admissible tests.

The densities of X under the two hypotheses, denoted by fo(x) and fi(x), are graphed in Figure 9.3. Intuitively it is obvious that the critical region of an admissible nonrandomized test is a halfdine of the form [t, °°) where 0 ^ t < 1. In Figure 9.3, a is represented by the area of the

FIGURE 9.3 Densities under two hypotheses

lightly shaded triangle and (3 by the area of the darker triangle. Therefore, algebraically,

Equation (9.2.6) is graphed in Figure 9.4. Every point on the curve represents the characteristics of an admissible nonrandomized test. Be­cause of the convexity of the curve, no randomized test can be admissible in this situation.

A more general result concerning the set of admissible characteristics is given in the following theorem, which we state without proof.

theorem 9.2.1 The set of admissible characteristics plotted on the a, (3 plane is a continuous, monotonically decreasing, convex function which starts at a point within [0, 1] on the 3 axis and ends at a point within [0, 1] on the a axis.