Welch's t-test

In statistics, Welch's t-test, or unequal variances t-test, is a two-sample location test which is used to test the hypothesis that two populations have equal means. It is named for its creator, Bernard Lewis Welch, and is an adaptation of Student's t-test,[1] and is more reliable when the two samples have unequal variances and/or unequal sample sizes.[2][3] These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping. Given that Welch's t-test has been less popular than Student's t-test[2] and may be less familiar to readers, a more informative name is "Welch's unequal variances t-test" — or "unequal variances t-test" for brevity.[3]

Contents

Student's t-test assumes that the two populations have normal distributions with equal variances. Welch's t-test is designed for unequal variances, but the assumption of normality is maintained.[1] Welch's t-test is an approximate solution to the Behrens–Fisher problem.

Here ν1=N1−1{\displaystyle \nu _{1}=N_{1}-1}, the degrees of freedom associated with the first variance estimate. ν2=N2−1{\displaystyle \nu _{2}=N_{2}-1}, the degrees of freedom associated with the 2nd variance estimate.

The statistic approximately from the t-distribution since we have an approximation of the chi-square distribution. This approximation is better done when both N1{\displaystyle N_{1}} and N2{\displaystyle N_{2}} are larger than 5.[4][5]

Welch's t-test is more robust than Student's t-test and maintains type I error rates close to nominal for unequal variances and for unequal sample sizes under normality. Furthermore, the power of Welch's t-test comes close to that of Student's t-test, even when the population variances are equal and sample sizes are balanced.[2] Welch's t-test can be generalized to more than 2-samples,[6] which is more robust than one-way analysis of variance (ANOVA).

It is not recommended to pre-test for equal variances and then choose between Student's t-test or Welch's t-test.[7] Rather, Welch's t-test can be applied directly and without any substantial disadvantages to Student's t-test as noted above. Welch's t-test remains robust for skewed distributions and large sample sizes.[8] Reliability decreases for skewed distributions and smaller samples, where one could possibly perform Welch's t-test.[9]

Reference p-values were obtained by simulating the distributions of the t statistics for the null hypothesis of equal population means (μ1−μ2=0{\displaystyle \mu _{1}-\mu _{2}=0}). Results are summarised in the table below, with two-tailed p-values:

Sample A1

Sample A2

Student's t-test

Welch's t-test

Example

N1{\displaystyle N_{1}}

X¯1{\displaystyle {\overline {X}}_{1}}

s12{\displaystyle s_{1}^{2}}

N2{\displaystyle N_{2}}

X¯2{\displaystyle {\overline {X}}_{2}}

s22{\displaystyle s_{2}^{2}}

t{\displaystyle t}

ν{\displaystyle \nu }

P{\displaystyle P}

Psim{\displaystyle P_{\mathrm {sim} }}

t{\displaystyle t}

ν{\displaystyle \nu }

P{\displaystyle P}

Psim{\displaystyle P_{\mathrm {sim} }}

1

15

20.8

7.9

15

23.0

3.8

−2.46

28

0.021

0.021

−2.46

25.0

0.021

0.017

2

10

20.6

9.0

20

22.1

0.9

−2.10

28

0.045

0.150

−1.57

9.9

0.149

0.144

3

10

19.4

1.4

20

21.6

17.1

−1.64

28

0.110

0.036

−2.22

24.5

0.036

0.042

Welch's t-test and Student's t-test gave identical results when the two samples have identical variances and sample sizes (Example 1). But note that if you sample data from populations with identical variances, the sample variances will differ, as will the results of the two t-tests. So with actual data, the two tests will almost always give somewhat different results.

For unequal variances, Student's t-test gave a low p-value when the smaller sample had a larger variance (Example 2) and a high p-value when the larger sample had a larger variance (Example 3). For unequal variances, Welch's t-test gave p-values close to simulated p-values.