Studies based on small samples are very common in social and behavioral sciences. One seldom knows the power of those findings. Could it be possible to incorporate Monte Carlo procedures in a study with a small sample? Has anyone done so in "content" studies rather than methodological studies? Can you give me some guideline, especially on using Mplus?

I think it is a very good idea to include a small Monte Carlo study also in "content" studies. This would give great insights into the quality of the results. Often, parameter estimates are good even at small samples but the quality of the standard errors, parameter coverage, the power of detecting effects, and the quality of overall tests of fit may be in question. When the parameter estimates are likely to be dependable they can be taken as rough population values for a Monte Carlo simulation. The population mean vector and covariance matrix can be computed for any model by fixing each parameter at its population value and requesting RESIDUAL (see estimated mean vector and covariance matrix). I have not, however, seen Monte Carlo approaches taken in content studies, but it is possible that this idea has been used. In my 1997 Psych Methods article with Curran, we did something akin to this to study power in our real-data application (see also the McCallum article referred to in that article). In that article, power was computed both via Monte Carlo simulation and using population values (Satorra-Saris approach).

Mplus allows Monte Carlo simulations in an automated fashion (data are generated, analyzed, and result summaries presented by Mplus) for several analysis types. See Chapter 29 of the User's Guide. Exceptions include twolevel and mixture analysis and for such cases, Monte Carlo simulated data can be generated outside Mplus as my research group often does. It would be good if articles including Monte Carlo were published to show the usefulness of the approach.

You must add a line to the data for the means of the dependent variables. This was left out in the first printing of the User's Guide. See the description of the FILE statement for Monte Carlo in Chapter 12.

I use Mplus to do Monte Carlo simulation study.I want to generate 1000 replications data,but SAVE command just allows me to save the first one.Please tell me how to save the rest 999 replications data.

I have previous conducted a 1-factor model with about 20 dichotomous indicators. I am finding that the item difficulty values (threshold/loading) are spotty in the lower portion of the factor score continuum. In order to create viable factor scores outside the context of Mplus, I have composited the items into 4 continuous level indicators by averaging over a set of dichotomous items with widely varying item difficulty levels. This way, I can use the factor score coefficient matrix to estimate the factor scores in a straightforward way without iterative procedures. This model fit the data very well but the constructed factor scores using the factor score coefficient matrix are very susceptible to the underidentification in the lower end of the factor - the distribution is very skewed. I plan to collect data on the original indicators as well as a number of new dichotomous indicators I hope will adequately measure the same factor and fill in the needed item difficulty levels. I am thinking I need to construct the "testlets" as before and set the measurement model parameters of these testlets to equal the results I obtained in my current dataset, then estimate the new indicators' measurement parameters freely. This is a very long-winded way of saying that I am trying the Monte Carlo feature of Mplus for the first time to try to determine the sample size needed to obtain stable and unbiased measurement parameter estimates for new dichotomous items with a variety of factor loading and threshold values, holding the rest of the factor model to the original obtained values from the continuous indicator model. How do I generate a set of data of this sort, where population parameters drive the generation of continuous and dichotomous data? Must I try to construct a reasonable variance/covariance matrix (out of thin air?) of the 4 continuous and 4 hypothetical dichotomous indicators in order to generate the data or can I do something akin to the approach in the Monte Carlo examples in the most recent addendum? My concern is that the mixture modeling approach used in the addendum example will not allow me to use the CUTPOINT and CATEGORICAL options I need. As always, I am very grateful for your help and amazed with the attention you give to us Mplus groupies.

Yes, to do Monte Carlo with categorical outcomes and continuous latent variable, the current Mplus requires you to go through the older Monte Carlo track (not the mixture track) and therefore construct a population covariance matrix form which the data are drawn. But not constructed out of thin air. The covariance matrix is for the y* variables, the continuous outcomes before categorization. The covariance matrix elements are obtained from the parameter values you hypothesize for the loadings, the factor (co-)variances, and the residual variances. You can get this matrix in a run where you pretend you have continuous outcomes, inputting say an identity covariance matrix as your "sample matrix", and fixing all parameters at the values you desire. The RESIDUAL output then gives you the estimated covariance matrix.

Note that this older Monte Carlo track does not give you the same output as the mixture Monte Carlo track. You don't get power information. But you do get chi-square information.

Am I correct in thinking that Mplus Monte Carlo can be useful in assessing if a particular pattern of partial measurement equivalence permits reasonable estimation of parameters for two group MACS analysis? That is, if I find a particular pattern of partial measurement equivalence, I could make that pattern the population values in Monte Carlo. Then I could run Mplus specifying a model that has that partial invariance pattern across the groups. Then I could see in the output the coverage and the reasonableness of the estimated standard errors. This seems to me at first look to be useful, but maybe its not really informative? Perhaps it will always look good no matter how weak the partial invariance?

I think you are right in expecting the Monte Carlo results to come out looking good even with only partial invariance. This is probably true for factor means for instance - because you have good information on the means even with only partial invariance. The deterioration of the statistical qualities happens much later - as more and more items are non-invariant - than the deterioration of the plausibility that you measure the same construct. The only parameters vulnerable to non-invariance are those that are not invariant since they are only estimated from one group. But even here, a large enough sample will give good results.

Thanks for your very helpful response. I would like to followup with an example of my situation. I am working with, say, accommodated math test scores where the accommodation is reading of questions due to low English reading skills. The content specialists/cognitive psychologists endorse that the items, even though read, still engage math ability. My preliminary results indicate well over half of the items are invariant with standard administration math items in a two group run. Given this and the endorsement of content specialists, would it be reasonable to say that there is a strong argument that the same construct is being measured? If that is the case, would the Monte Carlo analysis then give me support that the statistical qualities are there to estimate the non invariant parameters for the accommodated students?

To complicate it further, to be true to the real world scenario, I should probably run the standard administration students as a single group and determine the estimated item parameters (as test scoring would really be done). Then in the two group run wire those in as fixed values for the nonaccommodated folks and then determine which items are noninvariant with those values in the accommodated group. Is that correct? Thanks so much. I am hopeful this type of analysis will be useful in the study of these important testing issues.

Yes to your question in the first paragraph. Regarding the second paragraph, you certainly want to do a separate analysis of each group. But to test the (non-) invariance I think a better way is to analyze the two groups jointly, either with accomodation status as a covariate or in a 2-group run - you can then study/test (non-) invariance of each item or sets of items.

I want to use Mplus to do a simulation study. The populaion model is a CFA model that include 9 binary indicators and three latent variables (continuous). I saw a similar mplus code made by Linda and Bengt on the paper named as " sample size and power". I tried to modify that code to match my research situation, but it didn't work after I added up the commands (cutpoints & categorical) related to categoriacal variable. I am wondering if the Mplus can generate the binary data directly for the simulation purpose, or should I generate the binary data using other software first? Thanks in advance,

Yes, Mplus can generate such data but not using the approach that was given in the paper. If you are generating such data to get informaton on power for categorical outcomes, the power information is not printed. The current version of Mplus has two approaches to Monte Carlo simulation. Verson 3 will have only one and you will be able to easily do what you want. See pages 141-142 for a brief description of the current Monte Carlo facilities in Mplus. See Example 29.1A for an example of how to generate categorical outcomes in the current version of Mplus.

Thank you very much for your quick response. I saw Example 29.1A, but I still have some questions for that example. In my study, I surpose that each of three latent variables can been measured by three binary indictors, and residuals of indictors are correlated in some way. So it's easy for my to propose the population parameters in this situation, but hard to make a correlation matrix among indictors. Example 29.1A requires a correlation matrix among observed variables. I am wondering if there is a way to decide the correlation matrix based on the population parameters. Thanks in advance again,

If you don't want to compute the population covariance/corr matrix elements by usual expectation rules, you can use Mplus to generate the population covariance matrix and then simply get the correlation matrix in the usual way by dividing by the standard deviations. To get the population covariance matrix, do an ML run assuming continuous outcomes, where the input is a covariance matrix - for simplicity the unity matrix:

1 0 1 0 0 1 etc

In this run you fix all the parameters at the values you decide. So there is no free parameter. Ask for RESIDUAL - this will give you the "estimated" covariance matrix which is the population cov matrix.

I re-checked my Mplus code and data file many times, and I didn't find any difference between my typed code and data file and those listed in the Mplus manual. Would please tell me if there is a bug in Mplus software? how should I fix it? thank you very much.

I used the Mplus to simulate the CFA models with categorical observed variables. In the mplus code I fixed all parameters with the values used to generate covariance matrix. The estimated parameters were very close to the population values. The only thing I feel confused is that the average of standard error and 95% covers are all zero. I am wondering if this is acceptable, and what kind of model fit indices could be used to assess the effect of sample size in this situation?

As you know, I used the Mplus to simulate the CFA models with categorical observed variables. It seems that Mplus doesn't provide the model-fit indices for this type of simulations, such as chi-square, CFI, NFI and MRMEA. Would you please tell me how to get these indices? Thank you very much.

Per bmuthen's message of 2/9/2000 3:34 pm, "twolevel and mixture analysis . . . in such cases, monte carlo simulated data can be generated outside mplus" Can you suggest a reference that would indicate how to get started on this? I want to generate multilevel data using parameters of an actual multi-level data set that I have.

I am using Mplus v3 and would like to use the Monte Carlo command to generate a data file containing categorical variables with 5 categories. I am using the program MCEX5.2.INP as a starting point. Following the example on p 477 I change "generate = u1-u6(1);" to "generate = u1-u6(4);" which produces data with values of 0 or 4. What am I missing? Sincerely, John

You are on the right track, but you have probably not given population threshold values in Model Population (Model montecarlo). You need to give values for each of the 4 thresholds: u$1, u$2, u$3, u$4. The example you mention draws on the default of a zero threshold and there is only one in that example so it works without mentioning them.

Hello, INITIAL QUESTION: I am using Mplus v3 and would like to use the Monte Carlo command to generate a data file containing categorical variables with 5 categories. I am using the program MCEX5.2.INP as a starting point. Following the example on p 477 I change "generate = u1-u6(1);" to "generate = u1-u6(4);" which produces data with values of 0 or 4. RESPONSE FROM JUNE 17:You are on the right track, but you have probably not given population threshold values in Model Population (Model montecarlo). You need to give values for each of the 4 thresholds: u$1, u$2, u$3, u$4. The example you mention draws on the default of a zero threshold and there is only one in that example so it works without mentioning them. ***** NEW QUESTION How can I implement the solution described above in the syntax provided below? Sincerely, JOhn

I would like to carry out a simulation study using MPLUS. My data are generated with R. The problem that I would like to save the output (estimates, fit indices etc.) for each data set, and, as far as to my understanding, it cannot be done in MPLUS using the MONTECARLO command (as it gives only averaged values, not separate ones for each data set). Is my interpretation correct? Is there a way to do it anyhow?

There are 4 columns in the chi-square output (see also the Mplus User's Guide discussing these 4 columns). The second column is the observed proportions column, which for WLSMV is based on the p values for the replications (proportion of p values above a certain value is recorded). The 3rd column is the expected percentile, which for WLSMV is based on the across-replication average percentile. Hope that helps.

Hi, I have N=124 with three different treatment groups and am trying to estimate power for a path model (x=treatment group status, y1, y2, and y3). Since there are not much studies out there on the subject we're trying to study, we don't really know about the effect sizes. Is it okay to assume moderate effect sizes for the paths (.2) and do the MC runs like below? I'm especially not sure about the error variances for the variables. Thank you very much!

Regarding the residual variances, it looks like you are getting an R-square greater than 50%, which may be high depending on the application area. If so, you might want to reduce the residual variances.

I'm trying to generate a sample of likert data with five points. This data will then be used as input into a markov model. The output is below What does the eror mean? How do I get the data TITLE: MONTECARLO DATA GENERATION FOR 5 POINT LIKERT SCALE FOR A CATEGORICAL LATENT VARIABLE

*** WARNING in Model command All variables are uncorrelated with all other variables in the model. Check that this is what is intended. *** ERROR in Model Population command No MODEL statements for MODEL POPULATION. True values must be specified.

That is not a full Monte Carlo input. Most of the examples in the user's guide come with a Monte Carlo example also. Find the example in the user's guide closest to the model you want to estimate and use the Monte Carlo counterpart input as a start. Also, see Chapter 11 and the MONTECARLO command in the user's guide.

The examples in the users guide all specify the Analysis: and the Model: as well as the Model Population: commands.

If I just want to create the data so that I can analyze it in a number of different ways in Mplus, can I just stop after the Model Population is specified.

For example, if I find a published paper that specifies a specific LV SEM model and I want to create a dataset based on the published parameters so that I can look at different ways of specify the model, can I stop at Model Population, then use the saved data set as I would normally with any dataset?

Also, if I want to create clustered data does specifying the MODEL: differently than is specified in MODEL POPULATION command when using TYPE = TWOLEVEL, change the data that is generated?

Example 11.6 shows how to save data for a subsequent external Monte Carlo. You don't need the MODEL command if you are only saving the data. You do need the ANALYSIS command. Nothing in the MODEL command affects data generation.

I am planning to build a Monte Carlo program to examine the power associated with testing direct and indirect effects in a structural equation model containing both continuous and ordered categorical indicators as well as an interaction between two latent factors. The structure of the model is quite similar to that depicted in Example 5.13 of the user's guide, except that indicators y10-y12 would be binary rather than continuous.

I have at my disposal scale alphas from prior literature that I can use as reliability inputs to the Monte Carlo program for indicators y1-y9. I also have one-way frequency tables available for the binary indicators y10-y12 (I'm guessing I'd need their bivariate/crosstabular information to be able to fully specify the Monte Carlo model, however).

To provide a concrete example, if I knew that the previously published alpha value of indicator y1 was .70, I'd set the factor variance to 1.00, the F1-y1 loading to sqrt(.70) = .49 and the residual variance of y1 to .30. Is this a correct understanding of your recommended procedure for continuous indicators?

I have two other questions. My first question is whether it is OK for me to use this same method to set the factor loading and residual variance values for my continuous indicators in my Monte Carlo program given that some of the other indicators will be categorical?

My second question is even more basic, but pragmatic. What is the syntax I would need to change in the Monte Carlo version of example 5.13 to alter indicators y10-y12 from continuous to binary or continuous to ordered categorical with three or more levels? Perhaps you have another user's guide example or Web note example you'd recommend that I look at to locate the relevant syntax?

Regarding your concrete example, the variance of the indicator is 0.49+0.30=0.80 so the reliability is 0.5/0.8=0.63, right?

Regarding using this formula for categorical outcomes, that is probably less well motivated. You would have had to obtain your reliability by such a factor model. On the other hand, working off reliability as estimated by alpha, is rather approximate as it is (see the lit on alpha in an SEM framework), so maybe this is ok as a rough approximation.

Regarding the syntax, have a look at the Monte Carlo version of User's Guide example 5.2, which are on the Mplus CD.

Your comment on the example showed me that my calculation was wrong: I'd written that sqrt(.7) = .49. Actually, the square of .7 is .49. The square root is instead ~= .837, so .837*.837 + .30 ~= 1.00, which is what I'd intended. The loading would therefore be set to .837 with the residual variance equal to .30 to yield an approximate unit variance of the continuous indicator. I hope I got it correct this time.

Thanks also for pointing me to example 5.2 and for your comments regarding the usefulness (or lack thereof) in using alpha for continuous and categorical indicators for Monte Carlo simulations, especially w/respect to the categorical indicators. I've read the Raykov and Hancock articles on reliability estimation within the SEM framework vs. alpha. As well, your comments in the Mplus Discussion forum to a previous question of mine regarding computing optimal reliability for categorical y variables vs. underlying latent y* variables have been helpful as well.

The purpose of this particular simulation is to estimate the minimum detectable effect size for structural direct and indirect effects given a specific, known N (567). The investigator is writing a grant proposal to analyze secondary data, so the N of the parent data set is known. As well, she knows the previously published alphas for the continuous scale scores that will appear in her model.

Unforatunately, she does not have access to the data itself, so we must make educated guesses regarding correlations among the categorical indicators in the model. In your work, when you contemplate establishing values for categorical indicators, what criteria (aside from substantive area knowledge) do you use to set the values of categorical indicators' factor loadings and residual variances? Are there typical ranges you select for factor laodings and residual variances? Do those criteria shift depending on whether you're performing simulations with WLMSV vs. ML estimators?

With categorical indicators and working in the probit metric of WLSMV, I find that a binary item with relatively high reliability has around lambda=0.7 when the factor variance = 1. That's then 50% reliability in the "u* metric" (underlying continuous response variable). With a single binary item, I don't think one should expect higher reliability than that. I just looked at the classic LSAT6 and 7 results (Bock's classic example) and a more common loading there is around 0.4 - the highest was 0.7. In logit metric you multiply the loadings by about 1.8.

I am attempting to do a MonteCarlo simulation by first generating the continuous data model, and then generating the corresponding categorical data model (in order to have categorical y values as well as the underlying y* values).

In trying to generate both data sets, I am doing separate montecarlo programs, but using the same seed, and basically the same model (except residual variances are specified in the continuous model, but not in the categorical model (and these residual variances are=(1-(loading)^2) so that the loadings are standardized.)

So, I expected that the item response data generated in the continuous case would simply be categorized using the defined thresholds in my categorical model... but that does not appear to be the case.

Is there some way to do this? (other than categorizing the continuous data in some other software?) Is rounding error in my standardized loadings/error variance causing the differences?

Hello, in an older post (from 2003) Linda wrote -"See pages 141-142 for a brief description of the current Monte Carlo facilities in Mplus. See Example 29.1A for an example of how to generate categorical outcomes in the current version of Mplus"- How can I get example 29.1A. Have not found it on the website/cd etc. Thank you very much, Stephan

This is a reference to an example in an old version of the user's guide. The closest thing we now have to that example is the Monte Carlo counterpart of Example 5.2. You can find this on the Mplus CD or the website.

Hi, I'm running MPlus v. 3.11 and I am running a Monte Carlo simulation study for the purposes of power analysis. I am generating 10,000 datasets for an SEM model that includes 2 latent variable interaction terms. Mplus has been running for over a day and my task manager says that it is still actually running and using 50% of the CPU. Is this really possible? Should I let it run or restart the program? I recognize the interaction terms and the 10,000 datasets is a lot for the program to run, but when I ran a similar analysis in the past (without the interaction terms), it never took this long. Thanks for your help.

Adding latent variable interactions requires numerical integration so this could definitely make the estimation more complex. You are also using an old version of Mplus. You would need to send your input and license number to support@statmodel.com but I doubt that your upgrade and support contract is current if you are using Version 3.11.

I am running an *external* Monte Carlo (MC) analysis (data sets were generated by an external program). I use Mplus to analyze the data sets and I would like to save the analysis results for each dataset in separate files. Can this be done?

I am aware of the 'results' option in the 'montecarlo' command, but I do not think this option should be used in an *external* MC analysis.

In an older post (from 2005), in reply to a similar question, Linda stated that it can be done, referring to the User's guide. I have not found a suitable example there, unfortunately. Could you perhaps shed some more light on this issue?

When I run this in Mplus I get an error: THE POPULATION COVARIANCE MATRIX THAT YOU GAVE AS INPUT IS NOT POSITIVE DEFINITE AS IT SHOULD BE. However, when I add the two lines: categorical = v1-v5; ANALYSIS: ESTIMATOR = MLR; , it does run. When I subsequently analyze the saved generated dataset in Mplus with a linear factor analysis, treating the variables as continuous, I do not get an error. How can this be?

However this inp file does not include a INDIRECT command. I assumed that I could estimate the mediation using:

MODEL: y ON x; u ON y x;

Model INDIRECT: u IND y x;

But I get an error indicating that MODEL INDIRECT is not available with ALGORITHM - -INTEGRATION.

If I eliminate the ANALYSIS command entirely the program runs. AND I see that the ESTIMATOR = WLSMV.

If I then use the Monte Carlo counterpart, w/o an ANALYSIS Command BUT include the MODEL INDIRECT, I encounter a fatal error that the population covariance matrix is not positive definite  my assumption is that I have not modeled the indirect effect in the MODEL population and or that I am making an error by excluding the ANALYSIS command

QUESTIONS 1) In the modified 3.17 in which I have eliminated the ANALYSIS command, I wonder if the WLSMV estimates are appropriate or if I should model the data otherwise.

2) How should I specific the POPULATION parameters in the MODEL POPULATION section to reflect the indirect effect? (Hoping that this will eliminate the NPD error?

It looks like you have missing data on the mediator y. If that is not the case, things are more straightforward, but let's discuss as if you have missing on y.

In this case I think the ML estimator is better than WLSMV because ML can do MAR. With ML you then need montecarlo integration and with montecarlo integration you don't get model indirect results. You can, however, always create your own indirect effects as a*b using Model Constraint and defining a "New" parameter ind, where

ind = a*b;

where a and b are parameter labels in the Model paragraph. Here, a is u on y and b is y on x.

Regarding (2) - which you won't encounter by my approach - this happens with the WLSMV estimator when you don't give a residual variance in the population statement. See Monte Carlo input for such modeling in examples that mirror those of the UG examples (either on your Mplus CD, or on our web site).

I am using the Monte Carlo function to simulate CFA models with categorical variables. It seems that the output does not give results for CFI, which is my statistic of interest. In response to a similar question much earlier, you had said that the CFI stats in Monte Carlo would be available from Mplus version 3. I am using Mplus 4... is there a specific command I need to insert to call forth results on CFI? Thanks very much in advance, Yew Kwan

For a simulation study I generated replications externally and am fitting various models to this data in Mplus using "type is montecarlo" in the data command. Some of my models do not converge for all replications. However from my saved results I cannot find out which ones did converge. I know the results option in the Montecarlo command does this, but can I use it if I am not generating the data in Mplus. Thank you in advance Ben Spycher

If you ask for TECH9 in the OUTPUT command, you will see which replications had problems. With external Monte Carlo, there is no way to tie a particular data set to a replication number as in internal Monte Carlo.

I am trying to run a Monte Carlo simulation to test the power I have to test a mediational model in my known sample size. It would be very helpful to be able to report the power of the indirect or total effects. I tried adding MODEL INDIRECT to the input and got the error saying that is not available. Does anyone know if you can use model indirect in a monte carlo simulation? I see that there is no MC example for the indirect example in chapter 3 of the UG. Is there an example somewhere else for the syntax?

For the corresponding Montecarlo simulation study for example 6.4, there is the following statement in the input file. Can you detail how you get the scale factors based on the output of example 6.4? Thanks!

{u11@1 u12*.913 u13*.745 u14*.598}; ! this sets the scale factors at the inverted SDs for the u* variables, so that the estimates are in the metric of the Delta parametrizations

This is the follow-up for the montecarlo simulation. For the Mplus example 6.4, the ESTIMATED COVARIANCE MATRIX FOR PARAMETER ESTIMATES shows that the variance for u12 is 0.022, u13 is 0.037 and u14 is 0.024. Take u12 as an example, the scale for u12 should be 1/sqrt(0.022)=6.74. Why is it 0.913 as specified in the example 6.4 montecarlo counterpart?

Thank you very much for the detailed instruction. Now I am clear about how the scale factors were calculated. However, when I ran the ex6.4 using the dataset generated from the corresponding montecarlo simulation. The scale factors shown in the output file are not the same as specified in the montecarlo simulation {u11@1 u12*.913 u13*.745 u14*.598}. Instead they are as follows {u11@1 u12*1.060 u13*1.012 u14*0.772}. Why are the differences? Scales U11 1.000 0.000 999.000 99.000 U12 1.060 0.149 7.133 0.000 U13 1.012 0.192 5.259 0.000 U14 0.772 0.153 5.029 0.000

Hello. I am trying to run a Monte Carlo simulation to estimate power for a path model that contains 5 continuous variables and 1 binary (categorical) variable. I used the code: CUTPOINTS = y3(0); to indicate that y3 is the binary variable. The % Sig Coeff, or estimates of power, are much lower for y3, the binary variable, than for the continuous variables. Have I used the correct code to indicate that y3 is a binary variable?

I would like to do a power analysis for a MIMIC model with a latent variable outcome. The items for the latent variable are binary, therefore I guess the data need to be generated to be like this. 12.7 example looks the prefect for me, but it is for continuous measures of the items. I need syntax for ordinal CFA analysis. I was wondering if you could give me some suggestions in order to get started please?

Each example comes with a Monte Carlo counterpart where the data for the example are generated. Look at Chapter 5 and find the closest thing to what you want and start from there. The Monte Carlo counterpart for Example 5.2 is mcex5.2.inp.

Thank you very much for your guide. I have had a look at the relevant examples.

My model is simple. It has got six binary indicators forming a continuous factor. The factor is predicted by a continuous predictor. It is a 2 group analysis, and following the example in Mplus, I use delta parameterization.

I would like to use unstandardised empricial value for the predictor, but stanardised, emprical values for the factor loading, threshhold, and ridisual variance. I think this is easier for me to vary the effect size (regression path coefficient from the predictor) in the simulation study.

I have not figured out how to calculate the residual variance of the items as I suspect it is not the same as the situation in continuous items. In the later, to get residual variance of an item, I just need to substract the square fo the standadrdised factor loading from one. And similarly for factor residual variance and path coefficient. but I don' think it is done the same way for binary items.

It would be great if you could give some advice as to how to set item related parameter values for the standardised factor soluation.

I would like to run an external simulation study of a LCA with training data where I save out both the results AND the asymptotic covariance matrix (tech3) from each replication for subsequent analysis.

Is it possible to do this using external montecarlo options, or do I need to do this using a batch file of some sort?

TECH3 cannot be saved with the MONTECARLO command or external Monte Carlo. If you wanted to do this, you would need to run each generated data set separately and save TECH3. On the website, see Using Mplus via R. This may help you.

You cannot generate data using the TRAINING option but would have to do it as illustrated in Example 7.24. Example 7.23 is exactly the same but it uses the TRAINING option.

I noticed that in the examples accompanying the guide http://www.statmodel.com/ugexcerpts.shtml in chapter 7, only this ex7.23.dat data file is missing, I wanted to see how the training variables were specified... is there another place where I can get it? Thanks, Emil

I need help with my syntax. I always get an error message indicating that the population matrix is not positive. I am using Monte Carlo analysis to estimate the sample size and misspecification on my model. The indicators are categorical; the response scale of the questionnaire is 3 points. Here is the syntax that I used : Montecarlo: Names are cm1-ps6; categorical are all; Nobservations = 100; NREPS = 10000; SEED = 0; Generate = cm1-cm6(2) gm1-gm6(2) fm1-fm6(2) cg1-cg6(2) ps1-ps6(2);

as per my previous e-mails I am working on a manifest path model with categorical, non-normal data. I estimated the model using WLSMV, now I would like to assess the test power of my individual effects.

1. Is it possible to use the Monte Carlo Method you described in a paper from 2002 ("How to use a Monte Carlo method to decide on sample size and determine power")?

2. As I have to provide population paramters and it is an a-posteriori analysis - can I use the effect parameters from the model estimation?

3. Is it necessary also to assess test power on the model level? And what would be the strategy?

Many thanks in advance & many thanks for all the helpful answers to my previous postings.

when assessing test power of a path model through the Monte Carlo approach - is it sufficient to provide the population parameters for regression coefficients and explicitely defined covariances between dependent variables?

Or do I also have to provide the population parameters for means / thresholds / intercepts as well?

1. The population covariance matrix. 2. It is not in the output. It is the numbers you give in MODEL POPULATION. 3. Any parameter not given a population parameter value is given the value zero. You have probably not given all variances population parameter values.

1. Is it possible to use the Monte Carlo functionality in Mplus to get a distribution of eigenvalues out (as are output when one runs a Type = EFA). This would be nice as it would then provide a way (within Mplus) to code up a parallel analysis test of the number of factors and use the fact that MPlus uses polychoric correlations.

2. Is it possible to incorporate complex sampling weights, strata, and cluster into the data generation process within the Monte Carlo functionality? We could generate externally, but it would be nice if it could be done internally.

but using this code the variables are generate and analyzed as continous, the save data sete shows the as continous, only adding the CATEGORICAL = V1-V6; command the indicators are generate as categorical, but also analyzed as continous

How can i generate the data as categorial and analyze it as continous?

The GENERATE option controls the generation of the data. The CATEGORICAL option controls the analysis. I'm not sure why you think this is not happening. In your input, you are generating categorical. You have no CATEGORICAL option so the variables will be analyzed as continuous.

Hi I am trying to run a Monte Carlo simulation of multigroup cfa to test for measurement invariance. My indicators are all categorical, and I use WLSMV as the estimator. If I understand correctly, difftest has to be saved for the chi-square test. Would it be possible to run such a Monte Carlo study in Mplus without using any other software? Suppose I've generated data externally. Thanks very much

You can first save the Monte Carlo data sets. Then you would need to run each data set separately using DIFFTEST first in the SAVEDATA command and then in the ANALYSIS command. See if the Monte Carlo Utility under How To on the website can help.

Dear Linda, I have finished simulation of data from multilevel CFA model (30 items, categorical data, 5 correlated factors). I am now writing a paper and one of the reviewers asked to provide between and within population covariance matrices that have been used to generate the simulated datasets. Is there any way how to get population correlation/covariance matrices for within and between level in Mplus? I cannot see them in the output...

We don't use the between and within population covariance matrices to generate the data. We use population parameter values. I can't think of a way to get these matrices other than to create them using the population parameter values. This seems unnecessary because you know the population parameter values which is more detailed information that is given in the covariance matrices.

I want to simulate a CFA with categorical indicators and have 2 questions. First, how do I control what the thresholds are? I'd like half the items with low thresholds and the other half with high thresholds.

We are trying to generate data using Monte Carlo that specifies a negative slope over time, but never generates outcomes that go below zero. Currently, we are running into a lot of negative numbers in the outputs, which lacks practical application for what we are wanting to test. Any suggestions?

i have a question regarding the threshold concept in mplus. I want to generate data with 5 categories, so I need 4 thresholds. Now I want that the resulting dataset resembles a normal distribution as close as possible (skewness and kurtosis ~ 0). My problem is, that I have to choose the thresholds on my own, so I can't be sure if the arbitrary chosen thresholds could be optimized. Is it somehow possible to let mplus choose the thresholds on its own? Or maybe another way to automatically generate "normally distributed" categorical data?

Hi Dr Muthen, I am attempting to simulate a monte carlo path model (with continous variables) - my understanding is that I should specify population values for the path coefficients, variances, and residuals. I notice you use the following population values (0.8 for factor loadings, 0.36 for residua variances, 0.25 for factor correlations) in your 2002 paper. How do I determine the best population values in this case? Many thanks

Hi Dr Muthens, I ran into this error while running a MC path model: *** FATAL ERROR THE POPULATION COVARIANCE MATRIX THAT YOU GAVE AS INPUT IS NOT POSITIVE DEFINITE AS IT SHOULD BE. Based on the comments above, this is because I have not provided population values for all parameters in my model? I have tried (including all parameters, path coefficients, variances, covariances, residual variances, mean) but still having the same problem.. Do I need to include intercepts as well (if so, how) ? is this the correct way of specifying values for mean [var1*.8]?

Hi, I can't seem to find the description of the data format in the 'SAVEDATA: ESTIMATES ARE" option to save data. I found the 12.7 example, with its ex12.7estimates.dat data, and I looked at it but cannot guess which is which; found also the excellent http://www.ats.ucla.edu/stat/mplus/seminars/introMplus_part2/saving.htm , but didn't clarify the estimates format itself. The Step 1 output ex12.7step1.html that generated these ESTIMATES only says: Save file \ex12.7estimates.dat \ Save format Free. Thanks! Emil