Category: PROC GLIMMIX

The following is a proc glimmix example syntax. I ran it using a fake dataset, so the results are also fake. The outcome is an interval variable and the model is a linear model (not a non-linear model like the logistic regression model). The random statement makes this model "multilevel." Level-1 units are students and level-2 units are schools (IDs are nces_school_name). The intercept is being estimated as the grand average of school-specific intercepts.

This model is a multilevel model and different from the OLS model because it estimates intercept as random effects (school-specific intercepts are derived as random effects and the grand average of those are reported as the intercept).

The main result table you will need to look at is the following. You can interpret this just like you would interpret the OLS regression result.

Solutions for Fixed Effects

Effect

Estimate

Standard
Error

DF

t Value

Pr > |t|

Intercept

0.5062

0.01600

1040

31.64

<.0001

x1

0.005776

0.01717

3483

0.34

0.7365

x2

0.000465

0.01684

3486

0.03

0.9780

x3

-0.01117

0.01708

3487

-0.65

0.5133

Another table that you want to look at is this. Residual (0.54) is Level-1 variance (student-level variance). Intercept/nces_school_name (0.65) is Level-2 variance (school-level variance). I made up these numbers.

Covariance Parameter Estimates

Cov Parm

Subject

Estimate

Standard
Error

Intercept

nces_school_name

0.65

0.0181

Residual

0.54

0.2002

I will focus on the first coefficient table when discussing results; however, I would also report information related to the second table. I would report the following information:

ICC (intraclass correlation):

level2 variance / (level1+level2 variance)

That is:

0.65 / (0.65 + 0.54) =0.546218

This shows the degree to which outcome variance is located between groups (schools, clusters) as opposed to within individuals.

I would also report "variance explained."

I need to run the additional model, which is the analysis-of-variance model where I have no covariate in the model:

I would make the final table look look like this. I didn't round numbers, but you should.

Estimates

Standard error

p-value

statistical test

Intercept

0.5062

0.016

<.0001

***

x1

0.00578

0.01717

0.7365

x2

0.00047

0.01684

0.978

x3

-0.0112

0.01708

0.5133

level-1 variance

0.83

level-2 variance

0.92

ICC

0.55

Level-1 variance explained

0.35

Level-2 variance explained

0.3

Notes: *** if p < 0.001, ** if p < 0.01, * if p < 0.05.

How to interpret coefficients in the table

In the table above, x1's coefficient is 0.00578. This means that one unit increase in X1 will lead to an increase of 0.00578 in Y. The p-value associated with this is 0.74. So the coefficient here is not statistically significant at alpha=0.05.

One unit increase in X1 means ... if X is about height in meters (e.g., 1.5 meter, 1.7 meter), then 1 meter is 1 unit increase. If X is a binary variable (0 or 1), then one unit increase means "0 to 1 increase".

For my work, I almost always have a variable called TREATMENT which is 1 if subjects received treatment/intervention and 0 if the subjects did not. The coefficient for this is called "program impact effect." If, for example, the program impact effect is 0.25, I just say that and I also mention that other covariates are in the model and the program impact effect is adjusted for these factors. If the estimated program effect is 0.25, it means that the difference between the two groups (treatment vs. control) is 0.25 in outcome.

I also want to provide a standardized version of the program effect. I would run the same statistical model with the z-score version of the outcome variable. To do this, I usually use proc standard:

data abc2; set abc1;

Z_Y=Y;

run;

proc standard data=abc2 out=abc3 mean=0 std=1;

var Z_Y;

run;

Another approach would be to code this by hand in a datastep. if the mean of Y is -0.42 and SD is 0.5:

When I have multiple subgroup represented in a series of dummy variables (e.g., race groups, grade levels, etc.), I want to know if dummy variables as a system contribute to the model with statistical significance. This may be called a joint test because I want to know if, for example, race groups together (not separately) make a differences to the model.

The easiest way to do this is to treat those variables as classification variables. You will get a joint statistical test in one of the result tables.

proc glimmix ..;

class race grade_level;

....

run;

In my applications I almost always use numeric version of variables, i.e., dummy variables (coded as 0 or 1). I like this approach because I can use PROC MEANS on them to create a descriptive statistics table.

The question is how I get joint statistical tests when all of my predictors are numerically coded and thus I can't rely on the class statement (shown above in the syntax example).

The parameter estimate tables will show coefficients derived for each of the numeric variables; however, I wouldn't know if race groups as a group matters to the model or grade levels as a system matters to the model. For example, even when the coefficient derived for subjects being black is statistically significant, that is only about how black students are different from white students (reference group in this example). We don't know if race as a group matters and race groups jointly make a statistically significant contribution to the model.

<Again this can be done easily by using class variables instead (as shown earlier); however, I like using numeric variables in my models.>

The goal of this exercise is to find out what happens if I intentionally use a level2 variable at level 1 in HLM. I found that the coefficients and standard errors remain about the same. The parameter that differed was just the degree of freedom, which was consistent with my expectation.

Using my old NELS dataset, I ran two different HLM models using Bryk and Raudenbush’s software (See model 1 and model 2 equations in the table below).

The outcome variable is the achievement composite (POSTTEST), students are level 1 and schools are level 2. When expressed as mixed models, the two models are identical, which is why I expected most parameters to come out the same.

POSTTESTij = γ00
+ γ10*URBANij + u0j+ rij

The first model (MODEL 1; see below) included URBAN (students are in urban school) as a level 1 predictor. Of course this is a wrong specification because urban is a school characteristic. In the second model (MODEL 2), I used it at the expected level, which is at level 2 (school level).

These models look different, but AGAIN when expressed as mixed models, they are identical. As the third model (MODEL 3), I replicated the same HLM model using SAS PROC GLIMMIX. SAS requires that the equation be expressed as a mixed model.

Results showed that coefficients and standard errors are more or less the same across three models. The only one thing that was different was degree of freedom.

Conclusion: As long as variables enter the model as fixed effects as done here, there is nothing magical about the HLM model. HLM software or SAS PROC GLIMMIX (option ddfm=kr) adjust degree of freedom values, accounting for the fact that URBAN is a school-level variable and thus should not be awarded a value that is too large. Notice that under the correct specification (MODEL 2 and MODEL 3), the degree of freedom for URBAN is close to the number of schools, not to the number of students.

I encountered a situation where somehow PROC GLIMMIX can't produce results while PROC MIXED can. In this WORD document, I showed you how I edited the PROC GLIMMIX syntax to be a PROC MIXED syntax. Please ignore my macro %w.

The weight and the use of categorical variables were the cause of the problem as I found out that without weight the model produces results.

SAS support staff spent time trying to figure out what the issue is/was. I sent them a dataset (though I replaced the variables with randomly generated values) and the exact syntax. They determined that (thank you so much for going extra miles to figure this out!):

"the computation of MIVQUE fails because the default singular value (1e-12) leads to the detection of singularity while sweeping of the mixed model equation" -- May 12, 2016.

They proposed that singular value should be specified as a very small value to let the conversion happen:

HLM (multilevel models) and econometric analyses (e.g., time series analysis, ARIMA, etc.) are treated as different approaches (the goal of which is to deal with data dependency problem), they can be implemented in the same model via. SAS PROC GLIMMIX. However, I believe doing this is computationally demanding and models may not converge.

Imagine I have an HLM model where level-1 are students and level-2 are schools. I can enter teacher-level variables, but people who are used to HLM software by SSI will wonder how the use of teacher-level variables is possible without making the model 3-level models (level1 students, level2 teachers, level3 schools). This is because in the old HLM tradition (people who studied HLM using HLM software in 1990s), equations are written/processed separately by levels and data must be prepared by levels (student data file and school data file).

If there are student-level and school-level, some HLM users may wonder how teacher-level variables can enter the model (or prepared in what file? student-level file or school-level file???).

People who use SAS or other software programs (maybe R?) will not wonder this because they think in terms of one whole equation and say, "of course we can enter variables of any levels."

The software programs they use adjust degree of freedom to take into consideration that teacher-level variables have a lot less number of values compared to the outcome variable.

K-R option in PROC GLIMMIX adjusts degree of freedom to account for the fact that group-level variables have a lot less possible values when compared to outcome variables.

K-R degree of freedom option seems most appropriate for multilevel modeling applied in educational evaluation studies (where typically students are nested within schools). This option kicks in only when at least one random coefficient is estimated (e.g., intercepts as random effects).

After playing with data type of simulation (I will do a better and well-documented simulation in the future), I learned the following:

For student-level (level-1) variables, degree of freedom under KR option is close to the number of students (minus a small number of cases maybe close to the number of predictors). DF is larger if variance contained in the variable is greater. For binary variables, DF is the largest when variance is .50 (DF gets smaller as a proportion gets close to 0 or 1).

For school-level (level2) variables, DF is close to the number of schools minus a small number maybe close to a number of predictors. This may be also adjusted by variance of the variable.

I created a fake teacher-level predictors by creating two possible numeric values per school. DF for this variable was close to the number of teachers in the data (two per school) minus a small number. I think this is also adjusted by variance of the variable (which I didn't test exactly).