Testing for equality of Two Proportions

Suppose we have two populations and we are interested in estimating whether the proportions of subjects that have certain characteristic of interest (e.g., fixed gender) in each population are equal. To make this inference we obtain two samples {} and {}, where each Xi and Yi represents whether the ith observation in the sample had the characteristic of interest. That is

and

Since the raw sample proportions of observations having the characteristic of interest are and

By the independence of the samples, the standard error of the difference of the two proportion estimates is:

Hypothesis Testing the difference of Two Proportions

Null Hypothesis: Ho:px = py, where px and px are the sample population proportions of interest.

Alternative Research Hypotheses:

One sided (uni-directional): H1:px > py, or Ho:px < py

Double sided:

Test Statistics:

Genders of Siblings Example

Is the gender of a second child influenced by the gender of the first child, in families with >1 kid? Research hypothesis needs to be formulated first before collecting/looking/interpreting the data that will be used to address it. Mothers whose 1st child is a girl are more likely to have a girl, as a second child, compared to mothers with boys as 1st children. Data: 20 yrs of birth records of 1 Hospital in Auckland, New Zealand.

Second Child

Male

Female

Total

First Child

Male

3,202

2,776

5,978

Female

2,620

2,792

5,412

Total

5,822

5,568

11,390

Let p1=true proportion of girls in mothers with girl as first child, p2=true proportion of girls in mothers with boy as first child. The parameter of interest is p1 − p2.

. This small p-values provides extremely strong evidence to reject the null hypothesis that there are no differences between the proportions of mothers that had a girl as a second child but had either boy or girl as their first child. Hence there is strong statistical evidence implying that genders of siblings are not independent.

Practical significance: The practical significance of the effect (of the gender of the first child on the gender of the second child, in this case) can only be assessed using confidence intervals. A 95% CI(p1 − p2) = [0.033;0.070] is computed by . Clearly, this is a practically negligible effect and no reasonable person would make important prospective family decisions based on the gender of their (first) child.