Effect Size Mechanics.

Similar presentations

2 COHEN’S D (HEDGE’S G)Cohen was one of the pioneers in advocating effect size over statistical significanceDefined d for the one-sample case2

3 COHEN’S D Now compare to the one-sample t-statistic SoThis shows how the test statistic (and its observed p-value) is in part determined by the effect size, but is confounded with sample sizeThis means small effects may be statistically significant in many studies (esp. social sciences)3

4 COHEN’S D – DIFFERENCES BETWEEN MEANSStandard measure for independent samples t testCohen initially suggested could use either sample standard deviation, since they should both be equal according to our assumptions (homogeneity of variance)In practice however researchers use the pooled variance4

5 COHEN’S D – DIFFERENCES BETWEEN MEANSRelationship to tRelationship to rpbP and q are the proportions of the total each group makes up.If equal groups p=.5, q=.5 and the denominator is d2 + 4 as you will seein some texts5

6 GLASS’S ΔFor studies with control groups, we’ll use the control group standard deviation in our formulaThis does not assume equal variances6

7 COHEN’S D – DIFFERENCES BETWEEN MEANSRelationship to tRelationship to rpbP and q are the proportions of the total each group makes up.If equal groups p=.5, q=.5 and the denominator is d2 + 4 as you will seein some texts7

9 EXAMPLEAverage number of times graduate psych students curse in the presence of others out of total frustration over the course of a dayCurrently taking a statistics course vs. notData:Find the pooled variance and sdEqual groups so just average the two variances such that sp2 = 6.259

11 DEPENDENT SAMPLESThe standard deviation of the difference scores, unlike the previous solution, takes into account the correlated nature of the dataVar1 + Var2 – 2covarProblems remain howeverA standardized mean change in the metric of the difference scores can be much different than the metric of the original scoresVariability of difference scores might be markedly different for change scores compared to original unitsInterpretation may not be straightforward11

12 DEPENDENT SAMPLESAnother option is to use standardizer in the metric of the original scores, which is directly comparable with a standardized mean difference from an independent-samples designIn pre-post types of situations where one would not expect homogeneity of variance, treat the pretest group of scores as you would the control for Glass’s Δ12

14 DEPENDENT SAMPLESThe standard deviation of the difference scores, unlike the previous method used for independent samples, takes into account the correlated nature of the dataVar1 + Var2 – 2covarProblems remain howeverA standardized mean change in the metric of the difference scores can be much different than the metric of the original scoresVariability of difference scores might be markedly different for change scores compared to original unitsInterpretation may not be straightforward14

15 DEPENDENT SAMPLESAnother option is to use standardizer in the metric of the original scores, which is directly comparable with a standardized mean difference from an independent-samples designIn pre-post types of situations where one would not expect homogeneity of variance, treat the pretest group of scores as you would the control for Glass’s Δ15

16 Dependent samples Which to use? 16Base it on substantive theoretical interestIf the emphasis is really on change, i.e. the design is intrinsically repeated measures, one might choose the option of standardized mean changeIn other situations we might retain the standardizer in the original metric, such that the d will have the same meaning as elsewhere16

17 CASE-LEVEL EFFECT SIZESCohen’s (1988) measures of distribution overlap:U1Proportion of nonoverlapIf no overlap then = 1, 0 if all overlapU2Proportion of scores in lower group exceeded by the same proportion in upper groupIf same means = .5, if all group2 exceeds group 1 then = 1.0U3Proportion of scores in lower group exceeded by typical score in upper groupSame range as U217

18 OTHER CASE-LEVEL EFFECT SIZESTail ratios (Feingold, 1995): Relative proportion of scores from two different groups that fall in the upper extreme (i.e., either the left or right tail) of the combined frequency distribution“Extreme” is usually defined relatively in terms of the number of standard deviations away from the grand meanTail ratio > 1.0 indicates one group has relatively more extreme scoresHere, tail ratio = p2/p1:18

19 OTHER CASE-LEVEL EFFECT SIZESCommon language effect size (McGraw & Wong, 1992) is the predicted probability that a random score from the upper group exceeds a random score from the lower groupFind area to the right of that valueRange .5 – 1.019

20 ETA-SQUAREDA measure of the degree to which variability among observations can be attributed to conditionsExample: 2 = .5050% of the variability seen in the scores is due to the independent variable.20

22 OMEGA-SQUAREDAnother effect size measure that is less biased and interpreted in the same way as eta-squared.Think adjusted R2, where k is the number of groups22

23 CONFIDENCE INTERVALS FOR EFFECT SIZEEffect size statistics such as Hedge’s g and η2 have complex distributionsTraditional methods of interval estimation rely on approximate standard errors assuming large sample sizesGeneral form for d is the form for intervals we’ve seen in general23