Re: st: Testing for significant differences between groups after running a random-effects regression

Re: st: Testing for significant differences between groups after running a random-effects regression

Date

Tue, 9 Oct 2012 14:25:02 -0500

Michael Housman asked:
"Was wondering if anyone could tell me how to test for significant
differences between groups after running a random-effects regression?"
This can be tricky in such a set-up. What you are doing is fitting
quadratic slopes that can vary across different groups. Technically,
it means that you are estimating two shape parameters for each
category. The question is now what does "significant differences
between groups" mean? I can think of two different things here: (a)
that the _shapes_ of the curves differ across groups, and (b) that the
expected means are significantly different across groups. (a) is
basically what the significance test from the model is referring to.
If we consider the following example:
*---------------------------------------------------------
use http://www.stata-press.com/data/r11/nlswork.dta, clear
drop if race==3
xtmixed ln_wage c.tenure##c.tenure##i.race || idcode:, mle
*---------------------------------------------------------
we can fit quadratic slopes for wage as a function of tenure, by
race/ethnicity (I did not do any data checking here but I assume that
you checked your data and found that quadratic slopes give a
reasonable representation of your data?). Testing the shape parameters
for the two race groups against each other yields:
*---------------------------------------------------------
test (_b[ln_wage:tenure]+_b[ln_wage:2.race#c.tenure]) = _b[ln_wage:tenure]
test (_b[ln_wage:c.tenure#c.tenure]+_b[ln_wage:2.race#c.tenure#c.tenure]) = ///
_b[ln_wage:c.tenure#c.tenure]
*---------------------------------------------------------
which is equivalent to the test from the model output itself.
Different contrasts in case of more than 2 groups can be obtained by
either recoding the group variable or using -test-. However, unless
one has a very specific hypothesis in mind about differences between
groups with regard to the actual _shape_ of the fitted curves, this
test is of limited usefulness. Rather, what we are usually interested
in are differences in expected means across time. If we had just
fitted linear slopes (or just one quadratic slope, i.e. without
interaction), we could simply look at the differences of the
intercepts of the slopes which would be the same across the entire
range of time. But given that we fitted quadratic slopes for each
group, there is no single difference in expected means because the
difference varies over the range of time, i.e. the difference in
expected means is a different one at each point in time. What one
could do know is to do some testing at selected points in time. For
example, if we wanted to check whether Whites are earning
significantly more than Blacks at 10 years of job tenure, we could
type:
*---------------------------------------------------------
lincom (_b[ln_wage:_cons]+_b[ln_wage:tenure]*10 + ///
_b[ln_wage:c.tenure#c.tenure]*100) - ///
(_b[ln_wage:_cons]+_b[ln_wage:2.race] + ///
(_b[ln_wage:tenure]+_b[ln_wage:2.race#c.tenure])*10 + ///
(_b[ln_wage:c.tenure#c.tenure] + ///
_b[ln_wage:2.race#c.tenure#c.tenure])*100)
*---------------------------------------------------------
That is, we predict the expected mean for Whites:
*---------------------------------------------------------
lincom _b[ln_wage:_cons]+_b[ln_wage:tenure]*10 + ///
_b[ln_wage:c.tenure#c.tenure]*100
*---------------------------------------------------------
and for Blacks
*---------------------------------------------------------
lincom _b[ln_wage:_cons]+_b[ln_wage:2.race] + ///
(_b[ln_wage:tenure]+_b[ln_wage:2.race#c.tenure])*10 + ///
(_b[ln_wage:c.tenure#c.tenure] + ///
_b[ln_wage:2.race#c.tenure#c.tenure])*100
*---------------------------------------------------------
and then calculate the difference to check whether it is zero. We
could have obtained the same result easier by typing:
*---------------------------------------------------------
margins, dydx(race) at(tenure=(10))
*---------------------------------------------------------
However, this approach is limited to a handful of differences at some
discrete points in time. What I would rather do is simply plotting the
fitted slopes and put confidence bands around it, so everybody can
quickly see the differences across the entire time range:
*---------------------------------------------------------
use http://www.stata-press.com/data/r11/nlswork.dta, clear
drop if race==3
xtmixed ln_wage c.tenure##c.tenure##i.race || idcode:, mle
predict h_wage
predict se_wage, stdp
gen cilo=h_wage-2*se_wage
gen cihi=h_wage+2*se_wage
twoway rarea cilo cihi tenure if race==1, sort color(gs8) fint(50) ///
|| line h_wage tenure if race==1, sort lcolor(green) ///
|| rarea cilo cihi tenure if race==2, sort color(gs8) fint(50) ///
|| line h_wage tenure if race==2, sort lcolor(red) ///
xlabel(0(5)25) ///
xtitle("Job tenure (years)") ///
ytitle("log(wage)") ///
ylabel(0(1)3, angle(0)) ///
legend(order(2 "White" 4 "Black") rows(1))
*---------------------------------------------------------
Joerg
P.S. I used -xtmixed- here because it can easily be extended to
letting the time effect vary across subjects. In the model above, and
in your -xtreg- model, these effects are constrained to be the same
across clusters, and it often makes sense to relax this assumption.
On Tue, Oct 9, 2012 at 10:57 AM, Michael Housman
<mhousman@evolvondemand.com> wrote:
> Hi folks,
>
> Was wondering if anyone could tell me how to test for significant differences between groups after running a random-effects regression?
>
> By way of background, I have data in which each observation represents an employee-date and the dependent variable is a performance metric (e.g., average handle time, customer satisfaction, etc.) for call center agents. In essence, I'm trying to model performance and plot the learning curve as a function of "day_of_service" for four different groups of employees.
>
> I've generated a variable called "hire_score_order" that's numbered 1 to 4, representing the four different groups that I want to represent. I've interacted that term twice with day_of_service so I can visually represent the first- and second-order effects. I've copied below my "xtreg" command and the resulting output for a sample metric.
>
> What I want to do is run xtreg post-estimation to test the hypothesis that group 1's learning curve is significantly different than groups 2's, group 2's vs. group 3's, etc. Any suggestions?
>
> Thanks in advance!
>
> Best,
> Mike
>
>
>
> xtreg aht c.day_of_service##c.day_of_service##i.hire_score_order, re
>
> Random-effects GLS regression Number of obs = 242792
> Group variable: emp_id Number of groups = 1984
>
> R-sq: within = 0.0049 Obs per group: min = 1
> between = 0.1248 avg = 122.4
> overall = 0.0622 max = 500
>
> Wald chi2(38) = 1544.57
> corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
>
> --------------------------------------------------------------------------------------------------------------------
> aht | Coef. Std. Err. z P>|z| [95% Conf. Interval]
> ---------------------------------------------------+----------------------------------------------------------------
> day_of_service | -.0035472 .0302398 -0.12 0.907 -.0628162 .0557218
> |
> c.day_of_service#c.day_of_service | -9.38e-07 4.63e-06 -0.20 0.839 -.00001 8.13e-06
> |
> hire_score_order |
> 2 | 168.1932 48.20808 3.49 0.000 73.70711 262.6793
> 3 | 20.51885 68.23659 0.30 0.764 -113.2224 154.2601
> 4 | 156.1946 109.0574 1.43 0.152 -57.55392 369.9431
> |
> hire_score_order#c.day_of_service |
> 2 | -2.088015 .5027992 -4.15 0.000 -3.073483 -1.102546
> 3 | -1.117207 .4928079 -2.27 0.023 -2.083092 -.1513208
> 4 | -2.408916 1.294864 -1.86 0.063 -4.946802 .1289699
> hire_score_order#c.day_of_service#c.day_of_service |
> 2 | .0023866 .0016018 1.49 0.136 -.0007529 .0055262
> 3 | .0014925 .0014822 1.01 0.314 -.0014126 .0043976
> 4 | .0040321 .0037677 1.07 0.285 -.0033524 .0114167
> |
> _cons | 246.4581 81.31057 3.03 0.002 87.09236 405.8239
> ---------------------------------------------------+----------------------------------------------------------------
> sigma_u | 521.47501
> sigma_e | 930.20434
> rho | .23912442 (fraction of variance due to u_i)
> --------------------------------------------------------------------------------------------------------------------
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/