I am trying to calculate power for a multilevel study-latent class growth. I have the example from the addendum to the userís manual (example is for a two group, three level model). I also have Muthén and Muthén (2002) describing how to use a monte carlo study to decide power.

A few questions in trying to integrate the two pieces of information.

Is the first step to calculate ICCs and design effect to see if I even need to do multilevel modeling?

Iím following the example from the product support area. Iíve removed the input for the second group. For the monte carlo study, do I simply run a multilevel model and get the starting values from the output to get starting values for the within and between parts of the model?

In the multilevel model, I want to examine the effect of a between variable on slope. Do I only specify that regression in the between part of the model, only in the within part or in both places?

We are presently planning a family study where we would like to use multi-level SEM to examine potential influences on the psychosocial adjustment of children (level 1) nested within family systems (level 2). In the current or upcoming versions of Mplus, is there a way to estimate nonrecursive models? We have not seen this issue mentioned in any of the references we have collected on multi-level SEM. Thank you in advance for whatever help you can provide.

We are planning a family study where we would like to use multilevel SEM to analyze data collected from children nested within families. I have been reading what has been written about power analysis and sample size estimation for multilevel models. In those papers, there appears to be concern about power and the number of clusters. I see little comment about power and the number of observations within each cluster other than a note that more variability in observations per cluster can have a negative effect on power. In this data set, we expect to have more that 400 families with either 1 or 2 children per cluster so a low number of clusters is not a concern. Most (70%), however, will only have 1 child per cluster. Should we be concerned about the number of clusters with only 1 observation? How might this affect power? Thank you for your reply about nonrecursive models.

Having clusters with only 1 child is not a concern. They do not contribute information to the estimation of parameters related to within-cluster (child) variation, but do contribute to the estimation of cluster-related (family) parameters. So the question is for which type of parameter you want high power. If you have a small number of families with more than 1 child and want high power for within-family parameters, there might be a problem. As you say, you are likely to have good power for family-level parameters. You can study these power matters using the Mplus Monte Carlo facility if you have a notion of the approximate model and parameter values.

I am trying to run a power analysis for a longitudinal RCT where data are expected to fit a latent difference score SEM (McArdle and colleagues invention- an upcoming SEM pub cited below). This has forced me to ask some hard questions about the structural relations between variables at the within and between levels using the Mplus 2-level structure. Can you share some insight?

1) I'm finding it very hard to decide whether or not the SEM model auto-regressive and coupling effects (for bivariate growth) need to be constrained to be equal at both levels in the same way that Muthen (1997) suggests constraining the slope and indicator effects to be the same in clustered LGM. How does one go about deciding such things even for more typical multilevel SEM situations? In other words, when should loadings and regression effects be constrained across levels, and when should they be allowed to vary? Is this purely an empirical issue, or should theory drive model specification.

2) In Muthen & Muthen (2002), without 3rd level (or Mplus' 2nd level) clustering, the authors show how to run a monte carlo power analysis for an LGM with a binary covariate.

2a) Slope effect size was defined as the difference in the average group slope estimates divided by the variance of the latent slope term. The difference in group slopes is directly related to the regression of the slope term on the "binary" covariate, which if scored 0/1, has a mean of .5 and a variance of .25. Using these estimates, regression effects of .2 and .1 were chosen to represent medium and small effect sizes. These effects appear in the appendix Mplus code, but the covariate is scaled with a variance of 1 and a mean of 0 and not .25 and .5 for data generation. 2ai). Is the monte carlo SEM used to generate the outcome data modeling the covariate as dichotomous 0/1 (or perhaps now, -1/1, so that mean=0 and var=1), or continuous- only later to be transformed into the desired dichotomy? 2aii). Regardless, does this variance rescaling not change the effect size?

2b) When trying to plug in reasonable estimates of slope and intercept residual variability, does it make sense to use the Muthen & Muthen (2002) estimates at the within level, and use a ratio of the between to within variation equal to the expected ICC? For example, I expect an ICC of around .01 (at least for time 1 measurements), so does it sound reasonable to set the between slope residual variance equal to .01*V(WS)- i.e., .01 times the estimated within slope residual variance (which = .09 in Muthen & Muthen, 2002).

Oh, sorry, one more question related to my first. In older applications of these multilevel models (examples on the Mplus homepage for clustered LGM), residual variation for the observed outcomes is modeled at both the within and between levels; yet, in newer applications, this variation is only modeled in the within level (examples included in newest version of Mplus manual). How does one make these decisions?

1. Growth models imply a mean structure for the variable that is repeatedly measured over time. If you look at the 3-level version of the growth model in standard multilevel books, you see that it has time scores that multiply both the means and residuals of the growth factors. The latter part ends up on within and the former on between, but the two parts are using the same time scores; hence the across-level equality in Mplus. Other models such as McArdle's have to be studied from this point of view to see if the same model-implied equality across levels holds. I have not yet read the article.

2a The x is first generated as N(0,1) and then dichotomized. It is the dichotomized version that is used in the model to generate the data.

2b.

Yes this is reasonable.

Follow-up question:

The between-level residual variances are often close to zero. In conventional 3-level growth modeling such as HLM, they are fixed at zero. Mplus allows them to be estimated and sometimes that is important. However, with categorical and other outcomes it is computationally time-saving to restrict them to zero.

Thank you. Those are excellent explanations. Two quick follow-ups: I cannot seem to get a dichotomous variable to run at the between level in my SEM model. I keep getting the message:

Only x-variables may have cutpoints in this type of analysis. Cutpoints for variable W were specified.

Yet, when I run the LGM example 11.6 from the manual, I do not get this message. Are cutpoint variables at the between level only allowed for LGMs?

Also, if you create a dichotomous variable in the monte carlo routine, is the default scoring 0/1? Does rescaling the variable to a mean of 0 and variance of 1 in the model syntax make the dichotomous variable -1/1?

You do not create binary dependent variables using the CUTPOINT option. This is for independent variables only. The GENERATE option is used to generate binary dependent variables. See the user's guide for further information and examples. If you have further problems, send your input and license number to support@statmodel.com.

I'm sorry I should have been more clear, I was creating a binary independent variable and have now discovered it works fine as long as the rescaling and mean change sytnax (e.g., [w@0] and w@1) is left out of the model. Now that it's working, I have another dilemma. I'm interested in determining power for a clinical trial where block-randomization is used to ensure equal number of treatment and control clinics (only 10 clinics enrolled). Unfortunately, with the syntax we've discussed thus far, my treament variable is not always evenly distributed across clusters. Is there a way to avoid this in the monte carlo routine?

I'm back to a similar problem described above (May 20). I'm trying to fit the multiple-group approach but having trouble with the correct sytnax for this type of monte carlo. I keep getting the following error message when using the Nobservations, Ngroups, and CSIZES syntax below.

For multiple group analysis, you need to specify the number of unique cluster sizes for each group and the sizes for each group. This is done using the | symbol. See Chapter 18 where the NCSIZES and CSIZES options are discussed for an example of how to use these options for multiple group analysis.

I'm interested in doing a power analysis for a RCT comparing group tx vs. individual tx (block randomization). Would you have a good quick reference that could help in setting up this power analysis? I have read the info on the website for how to set this up using the Monte Carlo simmulation method, but wondered if there was an applied example that I could also refer to. I am also concerned with looking at mediators and moderators in this RCT and trying to determine power for these analyses as well. Any suggestsions would be greatly appreciated.

Thank you for the reference, it is very helpful. Should there be any concern with calculating power for a design where individuals nested within groups for the group tx arm of the study but not for the individual arm of the study.

UG Ex11.4 shows how to do MC studies in a 2-level setting. You can build on that example. If one of the groups does not have clustering in this way, you could probably use "external" MC analyzing data generated for the tx arm (2-level) separately from the other arm (single-level) - and then send the combined data to MC.

I am attempting to estimate power for moderational analysis in a multilevel model. Would it make sense to generate data based on main effects, then use those generated data to compute the interaction for each of the generated data sets, and then see how often the interaction turned out signifcant?

Bengt, I have followed your advice from Nov13-14 and run two sepearte MC studies to look at power within each arm of the study (first grp then individual). I'm now trying to build upon UG ex11.4 as you suggested with the group and individual data. I'm interested in (1) the power to detect difference in outcome btwn tx (grp vs ind), and (2) whether a mediator (social support) is stronger in grp vs tx. I'm not sure exactly how to set this up in part because I'm not sure what parameter gives me the test for (1). Do I set it up as ex11. but with:

NCSIZES = 2; CSIZES = 10 (8) 1 (80);

and then interpret the effects of w in the model or do I also create a categorical X variable for tx condition?

I think I could figure out (2), if I was sure about (1), but any suggestions would be helpful.

I see now my typo and appologize. I can run everything as you suggested, with individual and group data sets generated using SAVE/REPSAVE comands and then input into an "external MC." Do you have any thoughts on number (2) the example of moderated mediation I mentioned on the 6th?

I think your question of the 6th was how to assess if a mediator is stronger in one group than the other. You can do this via Model constraint using the New option to create a new parameter corresponding to the difference between the 2 mediator estimates. You will then get a monte carlo summary of this new parameter to see how often equality of the mediator estimates is rejected (see last output column).

Examples 9.3, 9.4, and 9.5 are two-level path analyses. They have Monte Carlo counterparts. These come with the Mplus CD and are also available on the website. This type of Monte Carlo analysis has been available for several years.

hello again Prof Muthen, I add model constraint to a 2 level path analysis MC model for mediation computation, I compute the estimates from Kennys data in step 1, then in step 2 for MC estimate, when I add same constraint to both model population and model, Mplus say, 'A parameter label has been redeclared in MODEL CONSTRAINT.' if I remove constraint command from Model population or from model,say, leave only one constraint command in model or model population, then, mplus work and the result is same whereever I put model constraint in model or model population. my question is: in MC analysis, 'model constraint' command is only for Model not for model population, right? thanks.

I am doing a Multilevel SEM Monte Carlo study to test the accuracy of estimation with small group sample sizes. I have a 2-level factor model with 4 indicators (measured at the individual level). Both at the within and between levels, two independent variables have an effect on the latent factor.

Up till now, the indicators (at the individual level) were assumed to follow normal distributions. However, I want to introduce nonnormality into the data generation model by manipulating kurtosis and skewness of the indicators.

In the papers on this site if found that you suggest using a mixture of subpopulations for which the latent factor follows a normal distribution with different parameters. Am I getting this more or less right?

If so, there are a couple of things I do not see clearly yet... 1. Can this mixture approach also be applied in the case of multilevel sem? 2. How do you choose the proportions and means/variances of the subpopulations to assure that certain skewness and kurtosis values are obtained? 3. Is it possible for indicators that load on the same latent variable to have divergent skewness and curtosis values?

1. Yes. You might want to start with the Monte Carlo counterpart of Example 10.2. You can generate the data as two classes and analyze it as one class. 2. Unfortunately, this is trial and error. 3. Yes.

Hopefully a final question wrt a Monte Carlo study for a two-level SEM model with small group sample sizes.

I want to report population within and between covariance matrices. Elsewhere on this discussion board, I found that for one-level Monte Carlo studies, population covariance matrices can be obtained by running a separate analysis with all parameters constrained to the population values, and the identity matrix as input file ('type is covariance').

However, to me it seems that this 'trick' cannot be used in the case of two-level models, because these models do not allow a covariance matrix as input (is this right?).

Instead, I thought of using one of the datasets that were generated during the monte carlo as input file (and of course also constraining all parameters to the population values). Is this a correct way to obtain population covariance matrices?

On page 604 of Muthen and Muthen (2002) you state that the R^2 value for the intercept growth factor is 0.20. Could you clarify how you arrived at this value. I've read through it and seem to be missing something. Thank you!

Thank you Linda. Could you also tell me how you arrived at an effect size of 0.63 for a regression slope coefficient of 0.2? My calculations are not no coming out. Also, why are you interested in both an effect and R^2 for the intercept?

We look at R-square for both the intercept and slope growth factors. We look at effect size for the regression coefficient in the regression of the slope growth factor on the covariate because that is the parameter for which power will be assessed. Effect size is used to gauge the size of the effect. The effect size is the difference in the slope means for the two values of the covariate divided by the standard deviation of the slope growth factor:

This is not a multilevel question but it seemed to fit with Thomas Schmitte's post so I'm placing it here.

I would like to run a monte carlo LGM for power estimation with missing data similar to examples 3 and 5 in Muthen and Muthen (2002). However, the estimates for missing data within these examples (12% at T1, 18% at T2, 27% at T3, and 50% at T4) is too high for my purposes. Here is the input for missingness from those examples:

I assume that the -2, - 1.5, etc. are threshold values for degree of missingness. How can I vary these? Our pilot work suggests retention and data completion rates of .80-.90 across 5 time waves. I don't want to run these assuming no missing data either.

says that the intercept is -1 in the logistic regression for the probability of missing at y3. If we have no covariates in MODEL MISSING, or if we consider the covariate(s) at the value zero, this -1 intercept value translates to 0.27 of having missing y3. A lower logit intercept (say -2) gives a lower probability. Use the formula

L = log(p/(1-p))

to go from the probability p to the logit L. So if you want p=.10 you get L=-2.2.

Now, if you want x to influence missingness, which you show, then the missingness percentage goes up for increasing slope on x. You have to experiment to get the values you want. The easiest approach is trial and error using one replication and a large sample size (say 10,000). You generate one rep and save the data, then run the data as real data and request "patterns" in the Output command to get the information about the missingness percentages.

I have a question regarding a mulilevel monte carlo simulation for comparing different computation approaches to multilevel data sets.

I have simulated data for a multilevel model with one level 1 variable (X), a level two variable (Z), and a cross level interaction (XZ) affecting the outcome (Y).

Using the external monte carlo analysis tool I want to compare a multilevel model to an aggregated regression approach.

While it is no problem to aggregate the outcome (Y) and the level 1 predictor (X) using the CLUSTER_MEAN option, I can not create an interaction term (X*Z) thereafter, due to the restriction that a variable created using the CLUSTER_MEAN function cannot be used in subsequent DEFINE statements in the same analysis.

Is there any possibility to create an interaction term (X*Z) for an aggregate regression analysis to compare it to the original cross-level interaction?

Dear Linda and Bengt, I am trying to assess robustness of estimators against ignorance of clustering for multilevel data(i.e., when researchers use non-multilevel CFA for clustered data). For that purposes I have simulated datasets with different levels of ICCs (I simulated 100 datasets for each of 11 levels of between-level variability) Beside that, I am trying to show that there is no parameter bias (regardless of ICC level) when clustered data are correctly analyzed using multilevel CFA. In the case when clustered data are correctly analyzed using hierarchical approach, everything works as expected except for the data that were simulated to have ICCs close to zero. In that case I observe unexpectedly large variability within each model parameter and deflated chi-square. I use WLSMV estimator. I hypothesize that this might be consequence of how MPlus handles negative estimates of ICC which may occur when the between-level variability is very small. I cannot find anywhere what MPlus does in this case? Or do you have any other explanation?

When ICC is close to 0 that means you are dealing with nearly non-existent between level random effects and singular between level matrix. In such situations in practical setting you would want to abandon these random effects from the model. Look in tech 9 for error messages and warning in the simulation. Possible attempts to improve the estimation would include increasing the number of integration points decreasing the convergence criteria, redefining the model as a more restricted model with fewer random effects.

In principle negative ICC estimates can not occur with the WLSMV estimator since it uses the EM-algorithm, but ICC near zero means random effect variances near zero which can lead to estimation difficulties or non-convergence.

I am trying a MonteCarlo simulation with an intervention that exists only at the between level. I want the intervention to be a binary variable, so I want to use CUTPOINTS to create that. Mplus keeps telling me that CUTPOINTS does not work for dependent variables, but in fact my intervention is an independent variable. Why doesn't this work?

I am trying to perform a power analysis for a simple linear regression model (1 dichotomous / dummy predictor) where I have clustered data, but the clustering is not taken into account in the model (analytic model has been mis-specified).

My objective is to see how much power I am losing due to clustering in the data (i.e., data are not IID).

I am using Mplus 7.0.

Have I set up my models in the non-IID case correctly? I ask because preliminary results indicate that I'm getting more power using clustered, non-IID data, than IID data.

The "increased" power you are seeing is a consequence of the standard errors being underestimated when you do not account for clustering. This is not truly increased power, you are using inappropriate/standard errors for you tests

The power estimate is not valid because you have not given any coverage values in the MODEL command. Coverage values are taken from the MODEL command. Population parameter values for data generation are taken from the MODEL POPULATION command. Your MODEL command should be:

Hi Linda - How can I create an ordinal IV (representing assessment times) in a Monte Carlo analysis? I'm trying to set up a program to estimate power for a multilevel approach to longitudinal analysis, and I can't figure out how to create an ordinal predictor representing time. Thanks! -bac

but since time is an x-variable, I get the warning: X-variables should not be specified as categorical in the GENERATE option. The CUTPOINTS option should be used for x-variables instead. The following variable was specified as categorical in the GENERATE option: T

But CUTPOINTS can only be used for binary predictors. I need to generate (say) 5 records per case to represent 5 occasions 1-5 (or preferably, 0-4), to allow predicting y ON t. Thanks! -bac

Is it the case that for a Monte Carlo ML-SEM with a dummy-code at level 2 to reflect treatment status, that if the residual variances for the indicators are fixed at 1 then the factor variances at between and within levels become the proportion of variance at that specific level (e.g., fb2*.70, fw2*.30 with y11-y32*1 would indicate 70% of the total variance is due to between level) or is it the between+within+residual?

I'm interested in estimating sample size need to achieve power in a cross-classified or three-level model with categorical outcomes in Mplus 7.2. Reviewing Mplus's manual and statmodel, it seems like this models would need to be estimated using Bayesian estimation. Is it possible to estimate sample size in this type of model via Mplus's Monte Carlo methods?

Hello. I am trying to run a monte carlo power analyses for a two level path model. I can get the model to run as a regular analysis but I cannot get the model to run as a monte carlo analyses. I have trying fixed the variances to zero but I keep getting the following error:

*** FATAL ERROR THE POPULATION COVARIANCE MATRIX THAT YOU GAVE AS INPUT IS NOT POSITIVE DEFINITE AS IT SHOULD BE.

Could you provide any syntax for a two level path analysis for monte carlo? I have not been able to find one.