I am wondering about the barriers to creating an RMSEA index that would be appropriate for models involving categorical variables. Mplus doesn't give an RMSEA index in such cases. But in talking to quite a few people, no one has been able to describe why such an index would be inappropriate for such models. Can anyone cite any papers that describe such barriers, or alternatively, any papers that define the RMSEA index in categorical models? Can anyone describe briefly what the barrier is?

It seems possible to use RMSEA for categorical outcomes and Version 2.0 of Mplus will include this. The drawback is that little is known about how to use RMSEA for categorical outcomes in practice. This is also true for RMSEA for continuous outcomes that are non-normal. In both cases, RMSEA can be built on a robust chi-square, either mean-adjusted or mean- and variance-adjusted. The 1999 AERA paper by Nevitt and Hancock studied the continuous non-normal case for the mean-adjusted chi-square. The problem is that our limited simulation studies suggest that in these cases, the RMSEA seems to be influenced not only by model misspecification but also by the degree of non-normality in the variables - the more skewed the variables, the lower the RMSEA. This means that the normal-variable standard of good models having RMSEA values less than 0.05 cannot be relied on in all situations, and the fit evaluation becomes data distribution dependent. Same thing for ADF and its WLS counterpart for categorical outcomes. But perhaps such RMSEA values have some practical utility nevertheless - simulations studies are needed. Other fit indices for categorical outcomes will also be put into the next Mplus version.

This is a question I would also be interested in learning an answer to. For the Mplus folks I would also be keen to know when version 2 might be 'on the streets'.

I pass on that in Bollens (1989) book 'Structural Equations with latent variables' on page 436 he reports that Muthen and Kaplan (1985) found that ML and GLS chi-square tests were quite robust except when the categorical variables had large skewness or kurtoses > -1 or +1. So I wonder if the data is not badly non-normal geting fit statistics (for categroical models) running Mplus for continous variables might not be too much of a sin - although no doubt purists (and the more knowlegeable) among us might disagree with this.

A robust RMSEA for categorical outcomes can be calculated using results from Mplus.

RMSEA = sqrt((2 * Fmin / t)- (1/n))*sqrt(g)

where

Fmin = the last value from the function column of TECH5 n = sample size (sum of all groups if multiple group) t = trace of the product of the u and gamma matrices. See Satorra (1992) for a definition. g = number of groups

For computational purposes, you can calculate RMSEA as follows:

RMSEA = sqrt((chi-square/(n*d)) - (1/n))*sqrt(g)

where d is degrees of freedom, n is total sample size, chi-square is chi-square, and g is the number of groups. Chi-square from WLSM or WLSMV can be used. The RMSEA will be the same whichever one is used.

The ratio Chi-squared/df has been suggested, for example by Marsh and Hocevar (Psych, Bull, 1985), as an index of fit with values close to 2 more desirable. I think interpreting RMSEA in terms of Chi-squared/df has an intuitive appeal. Write the RMSEA expression as:

RMSEA = sqrt((Chi-sq/df - 1)/n).

Then, RMSEA is an average excess Chi-sq./E(Chi-sq.) - 1, with RMSEA=0 when Chi-sq.=E(Chi-sq.).

With regard to WLSMV and WLSM, I am getting WLSMV Chi-sq./df = WLSM Chi-sq./df for several the models that I have use. I cannot see why this should hold in general.

Thanks for the great message board. In follow-up to your Nov 8 post regarding a robust RMSEA for categorical outcomes, is it appropriate to use results of model fitting reported in TECH5 to compute a goodness of fit index

GFI = 1-{[(s-sigma)'Winv(s-sigma)]/[s'Winvs]} GFI = 1-(Fmin/Finit)

where Fmin is the minimum of the fitting function and Finit is the initial value of the fitting function

GFI for categorical outcomes can be computed from information in TECH5. Fmin is the minimum value of the fitting function which is the last value in the first column of TECH5. Finit is the minimum fitting function value for a model with all parameters fixed to zero. This is obtained in TECH5 by estimating a model for the same set of variables with no statements in the MODEL command.

There's an article in the December 1998 issue of Psychological Methods by Hu and Bentler which reviews fit methods for continuous outcomes. GFI does not get a very good review. I don't know of any review of GFI for categorical outcomes. I'm not sure if its behavior has been studied in this situation. Perhaps someone else has some information on this topic.

Along the same lines as the discussion regarding RMSEA, can one calculate a CFI? I assume I would have to run a separate independent model to get the denominator, or does MPLUS calculate that somewhere?

TLI and CFI can be computed for models with categorical outcomes. Both of these measures require information from a baseline model in addition to the model being tested. Typical baseline models have zero covariances. The baseline model for categorical outcomes has all parameters fixed to zero except the thresholds. This is obtained by TYPE=MEANSTRUCTURE in the ANALYSIS command and no statements in the MODEL command.

TLI and CFI can be computed as follows where

chib = chi-square for the baseline model dfb = degrees of freedom for the baseline model chit = chi-square for the model being tested dft = degrees of freedom for the model being tested

OK, we're trying to calculate RMSEA, TLI and CFI for a model with 9 first-order factors and 3 second-order factors. All 30 indicators are categorical and we are not estimating a mean structure. We're not really sure that our "baseline model" is specified correctly.

We're a little surprised by such a large value of TLI given that RMSEA>.1 and CFI<.9 and we calculated similar values for another dataset ( RMSEA=.1019, TLI=.9739, CFI=.9344 ), although in this case TLI and CFI agree somewhat.

QUESTIONS:

1) Given that we used TYPE=GENERAL in the "real model", is the above "baseline model" correct?

2) Is there a generally acceptable reference we can use when we pubish RMSEA, TLK and CFI values calculated by this method?

================================================ Brent Hutto Early Alliance Project Statistician Department of Psychology University of South Carolina (803) 777-5452 or Hutto@SC.edu

It looks like your baseline model is correct and that your calculations are also correct. The warning message that you get is telling you that in the bivariate frequency table for variables 29 and 30, there is a low cell frequency. But if the model estimation terminates normally, this is nothing to worry about.

I don't see your results as that contradictory for either of your examples. Both RMSEA and CFI agree that the models fit poorly. The cutoff for CFI and TLI recommended by Hu and Bentler (Psych. Methods 1998 Volume 3) for continuous outcomes is greater than .95. Your TLI's are .96 and .97 which suggests that the cutoff for TLI should perhaps be higher for categorical outcomes.

As far as we know, there have been no published studies of the behavior of RMSEA, CFI, or TLI for categorical outcomes. Our very limited studies of RMSEA found that it does not work as well for categorical outcomes as for continuous. So we have no references to suggest. These are studies that need to be done.

The discrepancy between the CFI and TLI may be due to the type of Ch-sq. statistic used and its degrees of freedom. In my little experience with these indices I have found out that the ratio WLSMV/DF is about the same as WLSM/DF. When computing indices that are functions of the ratio Ch-sq./DF (e.g., RMSEA and TLI), WLSMV and WLSM produce about the same results. This may not be the case for other types of indices such as CFI, which depends on the difference Ch-sq.-DF. For numerous models that I have run, WLSM has produced remarkably close values for CFI and TLI. I have also noted that DF for WLSM is always the same as DF for MLE.

I'm working on a project that involves estimating multiple CFAs using symptom (dichotomous)data. After I establish the appropriate model for individual groups, I hope to test for invariance across groups. I had intended to rely heavily on nested chi sq. tests to establish the superiority of different models both within (boys 1 vs. 2 factor) and across (boys vs. girls) groups. However, it appears that this strategy won't work if I use the WLSM or WLSMV estimators. Is there some other method to test the validity of competing model structures when using these estimators?

Incidentally, when I try to use the WLS estimator, Mplus gives an error implying that the weight matrix is not positive definite. Given that models using WLSM and WLSMV converged, I was surprised at this finding. The symptoms are very skewed.

Finally, I should mention that these data come from an epidemiological study so I'm using a "weight" variable.

We recommend WLS for nested model testing. Unless the sample size is very large, very skewed items can make the weight matrix of WLS not invertable. The weight matrix is not inverted in WLSM or WLSMV. This is why you get convergence with these estimators. You can try to delete very skewed items to make the weight matrix invertable. I don't know of a good alternative to assessing nested models at the present time.

I have now dipped more than a toe or two into the waters provided by Mplus, specifically, path analysis with endogenous categorical/dichotomous variables. In fact, my current application is very much like Example 15.1A (p.131) of the manual. I just have many more variables exogenous to the two endogenous ones.

I now have several major areas of questions: (1) How do I calculate a case-by-case probability of the ultimate outcome, based on output of Mplus? FYI, to date I have concentrated on "sampstat" and "residual" output, and have not delved into any TECH output. (2) What is the correlation matrix used by Mplus for path analysis with one (or more) endogenous dichotomous variables? How is the correlation matrix calculated from free data (all categorical, dichotomous variables)? Pending an answer to those questions, what about using correlation matrices developed specifically for binary data? I am thinking here of the half dozen variations offered by SYSTAT 9 for Windows. (3) Has anyone examined the similarities or differences between the approach to path analysis with endogenous categorical (binary) variables implemented in Mplus 1.0x and that described by Bollen et al., a 2-stage probit regression (Demography 32(1): 111-131, Feb. 1995), entitled "Binary Outcomes and Endogenous Explanatory Variables...."? I would very much appreciate guidance, assistance, or references in my mission to get these questions answered soon.

For questions 1 and 2 there are several Muthen-authored articles that are useful, for example Muthen, Kao, Burstein (1991) in Journal of Educational Measurement. See list of references on Mplus Discussion. Another useful article is the one by Xie. With path analysis there are usually x variables in the model, that is variables that have no model structure (covariates). In such cases, Mplus has the advantage that it uses sample statistics that are regression-based, not correlations. Regarding question 3, I have not studied that Bollen article - anybody else?

I am running a path analysis type of model similar to the one on p. 131, except that: 1) there is a single binary observed outcome 2) the predictors are all latent factors (some with continuous indicators, others with categorical indicators). I have tried several wls(m)(v)estimators. Convergence is slow but it works. However, I do not know what kind of analysis is actually being conducted. I assume that Mplus runs a probit. Or am I wrong? Could you provide an explanation and a reference? Also, why doesn't Mplus allow the logistic option with latent variables? I tried the above analysis with "logistic" but it seems only to work with observed variables. Thanks for your response.

Logistic regression is available for only univariate observed outcomes with observed independent variables. The logistic model does not easily generalize to the multivariate latent variable model framework.

The analysis you are running with latent independent variables and a categorical outcome uses probit regression. This is described in Appendices 1 and 2 of the Mplus User's Guide. The following references describe the estimation further.

In scanning the posts above, I noticed that Linda presented a formula for RMSEA as follows:

RMSEA = sqrt((chi-square/(n*d)) - (1/n))*sqrt(g)

This formula differs from that provided by Rigdon (1996) in the journal SEM, 3(4), p369-379. The difference is that Linda is using n (total sample size) where Rigdon uses n-1 (he also doesn't include info about multiple groups - g). In my application the difference b/w formulae is trivial. Clearly the larger the sample size the less impact this has. However I was wondering whether the difference was related to the specific use of WLSM, WLSMV?

I do not get the following outputs when ALL the indicators are categorical. * Chi-Sq. and iteration results for NULL model defined with no statements in the model command. * Derivatives with respect to Theta for a structural variable model. There is no problem when some indicators are continuous.

If you are using Version 1.04, you should get chi-square when there is no model statement. I suspect that you are using an earlier version. To get derivatives with respect to theta add a covariance of zero to the model command, for example,

I wish Mplus would have an option of outputting sample correlations and model correlations to an external file so that it would be easy to compute RMSR and some other indexes. Mplus' printed output breaks correlation matrices into columns of 5 variables. It is possible to use it for computing RMSR by first manually editing the output (deleting unnecessary stuff) and then exporting the values to Excel. However, to compute Rao's distance (for a direct comparison of competing models) I need a traditional square correlation matrix. I haven't figured out an easy way of transforming output file into a square correlation matrix, and doing it manually is very tedious because I have a lot of data sets and large number of items. Writing a SAS program for doing this task is possible, but promises to be quite time consuming. Does anyone have any ideas?

I am examining nested models with categorical outcomes using WLS across 19 groups and a large total sample size of about 14,000. What is the recommended approach to testing nested models under these conditions?

You would do chi-square difference testing in the regular way using WLS. You would look at the difference between chi-square for two nested models and the difference in the degrees of freedom for the two models. A chi-square table will tell you whether the difference is significant for the number of degrees of freedom. Note that difference testing is not appropriate for WLSM and WLSMV.

I performed CFA with dichotomous items. Using WLSMV, one factor model had chi-square=142.188, df=84. To compute NNFI and CFI, I also ran null model and got chi-square=776.778, df=73. What I could not understand is why the null model has a smaller df value. In conventional SEM with continuous variables, the null model has a larger df value. Can you please explain this? My computed RMSEA=.06 and NNFI=.928. Can I define my model fit is good?

Above you suggest that weighted WLSMV and WLSM models may converge where an WLS model won't because the weight matrix is not inverted under these procedures. What do you recommend if a weighted WLSMV model doesn't converge whereas the unweighted WLS, WLSM, and WLSMV models do (Mplus returns an error message stating the model may not be identified) ? Would extreme skew of the weights generate this type of error ?

AIC is based on the maximum loglikelihood value. The WLSMV estimator does not maximize the loglikelihood so this value is not available for WLSMV. If an AIC could be computed, non-nested models can be compared using it.

You can look at a variety of fit measures like SRMR, CFI, TLI, RMSEA, the new WRMR that will be in Version 2 etc. You can't say that one is statistically better than the other. It would be a qualitative comparison. You can also look at the p-values from the chi-squares.

A question came up on SEMNET why Mplus has not included the test of underlying normality for categorical outcomes. Background reading for such testing includes my chapter Muthen (1993) Goodness of fit with categorical and other nonnormal variables (pp. 205-234). In Bollen & Long (Eds.) Testing Structural Equation Models. Newbury Park: Sage. This chapter points out that these tests can be useful, but also that they may be overly sensitive and frequently lead to rejection for reasons that may not be important enough to warrant abandoning the use of polychorics/polyserials. It suggests that often only one or two cells cause the rejection e.g. due to irrelevant causes such as response style (here a polychoric may actually serve to smooth the bivariate distribution and in some sense give a better correlation than the usual Pearson product moment). Also, the approach used in Mplus makes it possible to relax the assumption of underlying normality in the frequent situation where there are covariates in the model, instead using a regression-based approach that only assumes conditional normality. Here, the normality pertains only to the residuals in the regression of the categorical outcomes on the covariates and a bivariate normality test would be irrelevant.

We have attempted to fit one and two-factor models to a set of 11 dichotomous variables,using Confirmatory Factor Analysis with tetrachoric corellations in Mplus v2.01.

Curiously, the chi-square tests of model fit for both the one and two-factor models have 27 degrees of freedom. Surely the two-factor model should have one less degree of freedom, due to the correlation between factors?

If you are using the WLSMV estimator, which is the default estimator for categorical outcomes, the degrees of freedom are not calculated in the regular way but according to formula 110 which can be found on page 358 of the Mplus User's Guide. I think this is probably what is happening.

Using Mplus 2, I am testing a model with five continuous latent variables and one binary observed endogenous variable. My sample size is 314. I have three questions:

1. Is the sample size large enough to accurately estimate parameters using the WLSM method? I recall reading on SEMNET that WLSM could work with small samples, but my small sample size makes me a little nervous. Do you see this as a limitation?

2. Is there any way to compute indirect effects (correlations) given that one of my endogenous variables is observed and binary? I understand that probit regressions are used, and I am not sure how that would affect the indirect effects. Actually, I do not see where Mplus provides indirect effects at all. Could you point me in the right direction?

3. Is there a way to obtain the correction factor needed to do a chi-square difference test with WLSM in Mplus?

Models with 12 observed variables were studied for WLSM and WLSMV. Sample sizes as low as 150 looked OK. So I think this should be OK although sample size considerations must take into account the number of free paramaters and the distribution of the variables among other things. We recommend WLSMV for categorical outcomes.

Indirect effects are not automatically computed in Mplus. You would have to do them outside of the program. You would multiply the regression coefficients together. It doesn't matter if they are probit or regular or combinations. You can see how to compute the standard errors in Bollen's book.

There is no scaling correction factor for WLSM or WLSMV. This is under development.

I have 19 observed variables and between 139 and 145 df for the models I am testing. Is there a specific paper I can cite regarding how well different types of models behave with different sample sizes under WLSMV? Also, is there a paper that describes the benefit of using WLSMV versus WLSM with a categorical outcome? I am looking for the proper source to cite. Thank you.

Linda Muthen wrote in previous post: Models with 12 observed variables were studied for WLSM and WLSMV. Sample sizes as low as 150 looked OK. So I think this should be OK although sample size considerations must take into account the number of free paramaters and the distribution of the variables among other things. We recommend WLSMV for categorical outcomes.

I am doing a multigroup analysis using categorical, observed, dependent variables. I would like to test configural invariance of the model and I am having trouble figuring out how to specify certain parameters.

Here is what I would like to do:

A)Fix first item's factor loading to 1 to set the scale of the factors in each group with freely estimated factor loadings not constrained to be equal across groups***I am pretty sure I have done this correctly***

B)Set the mean of each factor ***not sure how to do this; I know that for continuous variables I would fix the first item's intercept to 0 but this doesn't seem to work with categorical variables***

C)Relax equality constraints for intercepts (thresholds)***not sure how to do this or if this would be the default once the mean for each factor is set***

D)Allow variances, covariances and means to be freely estimated and heterogeneous across groups ***I believe this isthe default in mplus***

E)Allow covariances between like items' uniquenesses to be estimated across groups ***Not sure how to do this***

I would be happy to send you a copy of the mplus command file that I have put together if that would help with these questions.

I am trying to interpret a WRMR value of 1.24 in a structural model that contains a dichotmous endogenous variable (I used the WLSMV method). I know from Appendix 5 of the manual that WRMR values of .90 or lower are considered indicators of good fit. What is the range of WRMR? How far from "good fit" is 1.24?

WRMR is a descriptive fit index and its statistical distribution is not yet known. This is much like the situation for CFI/TLI. In this sense, we don't know how far a TLI of 0.60, say, is from 1.0. Nor do we know how far 1.24 is from 1. WRMR ranges from 0 to infinity. In more recent work, we conclude that perhaps 1.0 is a better cut off than 0.9. More studies would be needed to decide "how bad" higher ranges such as 1.0-1.3 are in practice.

i have attempted to fit one and four-factor models to a set of 21 dichotomous variables,using Confirmatory Factor Analysis with tetrachoric corellations in Mplus v2.01. The output mentioned that I can not use chi-square difference to compare the fit of these two models. I am just wondering if there is any alternative approach to compare the fit of two models? Thanks a lot.

It doesn't sound like these models are nested. Therefore, chi-square difference testing would not be appropriate. For nested models, we recommend using WLS for chi-square difference testing and then using WLSMV for the final model.

I have a question on the scale freeness and scale invariance of ULS. I ran several item factor analyses with NOHARM and Mplus using ULS on the basis of binary data. I had 124 items and 407 subjects. I used NOHARM with raw product moments as input data. The Mplus analyses were based on tetrachoric correlations.

When comparing the results slight differences occured for the standardized loadings for less complex models. The largest discrepancies resulted for a nested factor model (e.g. each item got a loading on one general factor and on one group factor). The ULS discrepancy functions were completely different in all model runs. Are these signs for a lack of scale freeness and invariance? Your comments are highly appreciated. Thanks, Martin.

The Mplus ULS estimator applied to tetrachoric correlations is a scale free approach. I'm not familiar with the NOHARM approach, but it may consider a different model than the tetrachoric model, which could account for the discrepancy function differences.

Thank you for the great software and this useful discussion list. I have a case of the "missing SRMR". I have found with categorical indicators that when doing multi-group modelling, or do a CFA with a covariate (eg MIMIC model), or when outputting factor scores, that the SRMR no longer appears in the output. Is there a reason for this? How can I get it back (short of manually outputting the observed and fitted correlation matrices?)

The mystery of the missing SRMR is solved. We do not compute SRMR for categorical outcomes when there is one or more covariates in the model. This is because the sample statistics are not correlations in this situation. They are probit thresholds, regression coefficients, and residual correlations. WRMR looks at the difference between these sample statistics and their model estimated counterparts.

I have a question about WLSMV df. I have noted on posts here and from my own models that the df for the baseline model can be *smaller* than that of the fitted model, and I recognise the methods of computation of df involve a complex matrix formula. Is it possible to give a "layperson" explanation for how this can be, as it strikes as unintuitive (at least to me) to estimate more parameters yet also have greater df.

Also, there have been references above in this message board to the paper Muthén, B., du Toit, S.H.C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Accepted for publication in Psychometrika.

The WLSMV df is "estimated" to get as close of an approximation to a chi-square distribution as possible and is therefore both model and data dependent. Hence it doesn't have a substantively interpretable meaning as in regular chi-square testing. The p value is what can be interpreted here.

The Muthen et al paper is not published yet and I have not yet taken the time to do the final revision for putting it in press - too many other important papers to write. But, I am happy to send the paper in its current version.

I am estimating a model with 5 observed variables and 4 latent variables (with 2-3 indicators per latent variable). One of my observed variables is dichotomous, so I am using WLSMV estimation. Sample size is 204. I am using Mplus version 2.

I have estimated a measurement model, where my 9 variables are correlated, and a structural model, which has 22 more constraints than the measurement model. Fit indices are:

I understand that the chi-square is mean and variance adjusted and that degrees of freedom are not computed the traditional way. My question is about CFI, TLI, and RMSEA. I am not used to having the structural model (with more constraints) fit better then the measurement model (with fewer constraints). I am guessing that this is an anomaly from the method of computing chi-square and degrees of freedom. Is this correct?

No, this is probably not the case. It is more likely if you look at TECH1 for each run that there is a default that is changing the structural model in a way that you are not expecting. If you can't figure this out, please send me the two outputs including TECH1.

I have a few questions about nested growth curve modeling with binary outcomes. My data is highly skewed 0 ranges from 323-355 and 1 ranges from 28 to 60 for the six outcomes. I ran two nested models with two parameterizations (i.e., 1) fixing the mean intercept at zero and holding the thresholds equal across time and 2) fixing the first threshold at zero and freeing the intercept factor). I would like to know if my code is correct for each parameterization method.

I am particularly interested in the mean and significance of the intercept (i.e., if individuals vary significantly on 'dicomp' at time 1). So, I would like to use the second method (i.e., not set the intercept to zero). However, with this method the SEs can not be calculated and the output highlights a problem with parameter 7, the slope. Can you provide me with any suggestions?

Finally, could you please relay the RMSEA, TLI, and CFI values that would indicate a 'good or reasonable fit' for categorical outcomes? Has any research been conducted recently on this topic?

To do the chi-square difference test, is it better to use "WLSM" with use the adjusted formula on statmodel homepage, or using wlsmv with "DIFFTES" option? In other words, which of the two estimators are more appropriate to calculate chi square difference test?

As a novice with Mplus, I have a few questions concerning EFA and CFA with dichotomous observed variables.

First, in EFA, I wonder why sometimes I don't have the RMSEA for example the 3 factors solution. What I mean is this: I'm using WLSMV, I get the RMSEA for the 1 factor solution (RMSEA = 0.02, for example) and also for the 2 factors solution (RMSEA = 0.005, for example) but not for the 3 factors solution. Does this mean that the RMSEA is close to 0 for the 3 factors solution so that it is not in the output?

Second, I wonder if it is possible to get eigenvalues in a CFA with dichotomous data ? (with WLSMV)

Third, I would like to know if it is possible to test different model, nested or not (with WLSMV) by a chi-square model fit test or by having AIC, BIC, ... I noticed that I can get AIC and BIC with continuous observed variables but not with dichotomous observed variables... Maybe you can please tell me if there is an option I can add so that I can get AIC and BIC with dichotomous data (and with WLSMV). Or maybe you can please tell me the formula to calculate it?

I don't know exactly why this is so but I believe that there has been a change related to when RMSEA is zero. If you are not using the most recent version of Mplus, please download it. If you are, please send your output and data to support@statmodel.com.

You cannot get eigenvalues through CFA but you can get them through EFA. They would be the same because that are based on the data not on the estimated model.

You can use the DIFFTEST option to do chi-square difference tests for WLSMV. AIC and BIC are not available because they are based on maximum likelihood estimation.

Thanks for your reply! Just to add few more question: you said that "AIC and BIC are not available because they are based on maximum likelihood estimation"

But how come that I can get AIC and BIC even when using WLSMV with continuous data (but not with binary data)?

About the DIFFTEST option to do chi-square difference tests for WLSMV, we can only used it to compare between nested models, right? Can you just explain me what are exactly "nested models"? Because I want to compare models with 2 factors and models with 3 factors (with the same binary observed variables used) But if there aren't nested models, what can I do to know which model (between the 2 factors solution and the 3 factors solution) is the best one? (that's the reason why I want to use AIC and BIC)

Would you suggest me to use maximum likelihood estimation to get the AIC and the BIC, even if I have binary data?

WLSMV is not allowed when outcome variables are continuouos. I think if you look at your output, the estimator will be ML and there will be a warning telling you that WLSMV is not allowed.

Chi-square difference testing is appropriate only for nested models. Nested models are generally models that are special cases of the same set of observed variables. The 2 and 3 factor solutions would be nested but the difference test may not be valid because some parameters are on the border.

I am having a problem regarding fit indices with my mediational model. The final outcome variable is an observed ordinal variable with 4 levels. There are 2 mediators and 3 independent variables all measured as continuous latent variables. The model includes one direct path between the final outcome and 1 of the exogenous variables.

In Mplus Version 3 using WLSMV estimation:

CFI = 0.071 TLI = 0.912 RMSEA = 0.099

The problem is not due to the latent factors because the fit of the 5 factors using WLSMV estimation was not nearly as bad (CFI = 0.873; TLI = 0.861; RMSEA = 0.063).

The inconsistent fit of the CFI in the mediational model appears to stem from the estimation of the baseline/independence model. When comparing my model to the baseline model, the chi-square statistic goes down and the degrees of freedom get very small (see below). I know that WLSMV estimation computes degrees of freedom differently than ML, but I don't understand why the chi-square is decreasing when no covariation is assumed among the latent and final outcome variables.

When I used MLM estimation and assumed the final outcome variable was continuous, the fit was much better (CFI = 0.879; TLI = 0.869; RMSEA = 0.052) and the chi-square statistic for the baseline model increased as expected.

Dr. Muthen, I am running path models to test a theory in Mplus 3.01. I have continuous exogenous,and continuous and binary endogenous variables predicting a single binary outcome. Can I get the total effect for each predictor on the outcome? I understand that I cannot simply add the indirect and direct paths with binary outcomes.

When testing nested models using the DIFFTEST, is each model run separately and then the output from step 2 generates a chi-square value and df based on WLSMV for each model? Is it then appropriate to perform a chi square difference test using these values?

For WLSMV, the chi-square difference test is computed using the derivatives from the H0 and H1 models. You cannot obtain a proper chi-square difference test using the chi-square statistics from the H0 and H1 models.

First, you should update to version 3.12 - you received a message about this if you are a licensed user and sent in your registration card.

The message can be obtained for several reasons as is listed in the version 3.2 change of your warning message: negative factor variance, factor correlates 1 with other factors, factor is involved in linear dependencies with other factors, etc.

I have two questions about how the df being calculated infactor analysis models with categorical outcomes. I have three categorical variables (binary) and I have fitted a factor analysis model, with some specific constraints on the factor loadings and the residual variances. Because the constraints on the residual variances I use THETA parameterization. I have two datasets -- they are almost the same one dataset has 6 possible response patterns (3 binary items: 8 possible patterns but with these two "0 0 0" and "1 1 1" excluded) and another dataset including also "0 0 0" and "1 1 1" patterns (so all 8 possible patterns in the data). I fitted the exact same models to both dataset. I used WLSMV estimator. I have two questions: (1) how the df of the fitted model cacluated, I couldn't find the formula from the mplus manual. Is it "(total number of threshold parameters+total number of tetrachoric correlation) minus (total number of parameters in the models"? or ...? (2) the df for the baseline model differed by 1 between above two datasets---even the same model is fitted !?, why would this happen? I thought it should not be data-dependent in the df calculation ... did I miss something here?

Please let me know if I should send the input and output files to you.

1. The formula for the degrees of freedom for WLSMV can be found in the technical appendices on the Mplus website. I think the formula is 110 but I don't have this available right now. You can obtain degrees of freedom of the type you describe using WLS or WLSM.

2. The degrees of freedom for baseline models on two datasets can differ. Sample size is also involved in the degrees of freedom for WLSMV.

I am estimating very similar models on the same sample, but getting very different model fit and modification indices. Specifically, I am estimating 2 models on the same sample (N=1,602)...

Model 1: X1,X2-->M1-->Y1 Model 2: X1,X2-->M1-->Y2

Where M1 is continuous, Y1 is binary (any visit vs. no visit), and Y2 is continuous (# visits; logged+1). X1 and X2 covary. I adjusted for clustering in both models (type = complex), and both have missing data on X1 and M1 variables (not on Y's).

Model 1 is estimated using WLSMV. Model fit was poor (CFI=.53; RMSEA=.07; WRMR=1.99). Modification indices suggested adding "X2 ON X1" to the model. I reestimated and model fit was acceptable.

Model 2 (estimated with MLR), immediately had acceptable model fit, and did not suggest the modifications in Model 1. Given that X1-->M are the same for both equations, I cannot understand why I'm getting such divergent model fit indices and suggested modifications in that portion of the model.

Any thoughts/suggestions? I am presenting both models in the same paper, so large differences in paths that are consistent across models are sure to raise flags.

I am confused by your Mplus input compared to your introductory description. In the latter you say "Modification indices suggested adding "X2 ON X1"", but x2 and x1 are factor indicators in the Mplus input. In terms of the Mplus input for the categorical run, which parameter did the MIs suggest including? Generally speaking, I wouldn't be surprised if a model with a binary outcome y1 fits differently than the same model with a different, continuous outcome. Also, how come f2 and f3 are not regressed on x9-x30?

Where Y1 is any doctor visit versus not (0,1); Y2 is number of doctor visits (log(#visits+1)); F1-F3 are latent variables with multiple indicators; and X9-X30 are observed variables.

Model 2 fit the data well.

Model 1 did not, and suggested adding "F2 ON F3."

For context, my primary hypothesis is that F1 mediates the effects of F2,F3,X9-X30 on Y1,Y2. I will be submitting to health services research journal, where it is common to present utilization of health service results first as logit model for any vs. none, and then as OLS for total # visits...only the dependent variable changes in these models.

I would like to treat F2,F3,X9-X30 as exogenous and assume they are correlated as in multiple regression.

Returning to my problem...given that Y1 and Y2 are similar, and the rest of the model is identical, a red flag was raised when results indicated the "final model" would need to differ considerably to reach acceptable model fit.

Normally I would attribute the difference to the difference in outcome variables, but I also reran Model 1 treating Y1 as continuous, and the model fit similarly to Model 2 -- good fit with no major modifications suggested. From what I can see, it appears the difference is in the estimator (WLSMV versus MLR).

I would have thought that, if the covariances between F2,F3,X9-X30 were freed (as I would like them to be), then there would be very few degrees of freedom remaining for model fit and modification indices -- as it approaches a typical multiple regression specification. Whereas the MLR output lists covariance estimates between F2/F3 and X9-X30, the WLSMV output only lists a covariance estimate between F2 and F3.

Do I need to include WITH statements in WLSMV if I want to make the same "ceteris paribus" statement for my primary hypothesis in models 1 and 2?

If I am specifying this correctly, and I wish to present both in a paper, which model is more conservative?

I see. WLSMV does not automatically correlate x's (observed covariates) and exogenous factors (see bottom of this message), whereas this is done when all dependent variables are continuous. I would suggest making the 2 types of models (for y1 and y2) compatible by having in both

f2 f3 on x9-x30;

This then - for both types of models - makes f2 and f3 related to the x's and also lets f2 and f3 have a residual covariance. Then the results should be more compatible. Regressing the f2 and f3 factors on the x's is probably a realistic representation given that the x's may well be antecedent to the factors.

The reason why WLSMV does not correlate x's and exogeneous factors is that the x's then become part of the model instead of being conditioned on and this then forces model fitting via latent correlations instead of via the regression-based approach favored in Muthen (1984) - see Mplus references.

I have run a simple path model that exclusively involves observed independent and observed dependent variables. My independent variables are a combination of binary and continuous variables. My 3 dependent variables are all binary. Mplus provided fit indices for this model. However, I'm not clear as to whether these fit indices are appropriate to report. In other words, how are these fit indices computed in simple path analysis with no latent variables?

I'm interested in computing the population Gamma Index. But I wonder if it is possible to compute the population noncentrality index (NCI=chi-df/N-1) for the WLSMV chi-square estimate. Does is make sense in the case of the WLSMV?

Prof Muthen, Thank you for your reply to my posting on Fri. I did indeed leave out arrows between observed predictor variables within my model containing only observed variables. To clarify, are you saying that the the fit indices (i.e., chi-square, CFI, RMSEA) are measuring to what degree I have accurately accounted for the relationships among all of my observed variables within the model, even among my observed predictor variables? Thank you!

Re: May 31 - 10:46. Typically, there would be no left-out arrows among the predictor variables (called x's in Mplus jargon). As in regression analysis, the x's are freely correlated and not part of the model. Typically, a model says something about the relationships between the x's and the y's, and among the y's. So, in typical cases, you don't even mention the correlations among the x's. But if you specify that the x's are not correlated, they are brought into the model and correlatedness among them contribute in the fit assessment.

Prof Muthen, Thank you again for your helpful response to my posting for May 31 at 10:46am. In the model that I am running, I have multiple observed mediators. In this case, I assume that the mediators are "y" variables according to Mplus, given that they are regressed upon x variables within the path model structure. In path analyses such as this one, where multiple observed mediators exist, do the relationships among the observed mediators factor into the fit indices? I see in the output that Mplus estimates the relationship among the terminal y variables (i.e., the y variables that are the ultimate outcome in the mediation pathways), as indicated by "WITH" statements in the model output. Does Mplus generate similar estimates among mediating variables? I do not see that this estimated in my current model output. Thank you!

If you do not see parameter estimates in the Results section of the output, then those parameters are not being estimated. If you want them to be estimated, you will need to explicity add them to the MODEL command to override the Mplus default of having them fixed to zero.

Regarding my previous postings that referenced my path model that exclusively contains observed variables and multiple potential mediators - Forgive my ignorance as a novice Mplus user, what syntax would allow me to specify the pathways among my mediating variables? Would I use regression equation syntax (e.g., MediatorB ON MediatorC), or is there a better method? Ideally, I would prefer to account for the correlations among the mediators without specifying a hypothesized causal direction. Thank you!

I am a first-time visitor to Mplus, and have an urgent question that needs an answer right away. How can I use your service to calculate chi square difference test for my a prior model and a rival model to see which one has a better fit? Can you provide me with a step-by-step guide?

I have a question about SEM with categorical outcome variables. I thank you in advance for your assistance.

Measurement Latent Outcome Y1: The single latent factor Y (Alcohol Abuse Disorder age 35) is being created from 19 ordered categorical variables (0,1,2,3) for 8008 weighted cases, no missing data. These observed variables are predominantly 0s. Error: I would like to estimate the uncorrelated error variance of each of the observed variables, allowing for measurement error. In a separate step I would also like to set the error variance to 20% Independent Variable: is a single observed variable V308, heavy drinking age 18.

Structure Latent X as measured by a single continuous indicator V308 predicts the outcome Y. X is also non-normal with a predominance at the lower end of a 1-6 scale.

Initially I ran just the measurement part of this model (step A. in the MODEL below) and the rmsea was .048 and TLI .989.

When I run the full model below I get the following message: THE MODEL ESTIMATION TERMINATED NORMALLY THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING PARAMETER 21.

I do get model estimates STD and STDYX, but I do not get model fit statistics.

Also, in tech5 I get the following message 19 times. I know we have bivariate zero cells, but am not sure if this is the source of the problem. ZERO CELL PROBLEM: IR, J & K = 1 2 1

My questions are: 1. Have I misspecified the intended model? 2. Why do I not get any fit statistics? Is this a problem due to lack of variance? 3. How do I use the information in the tech outputs to determine where problems lie. The variable it identifies as a problem is not very different from many of the other variables in the model. 4. In a separate step I would also like to set the error variance of the observed variables to 20%. How do I do this?

I completed several CFAs of a measurement model with five factors and 40 ordinal observed variables (six points) using both MPlus 4.0 and Lisrel 8.72. To account for the ordinal nature of the variables, I based the analyses on polychoric correlations in Lisrel (using the assymptotic covariance matrix) and I specified these variables as categorical in MPlus. The thresholds that I obtained were the same for both software packages. However, I found that the fit indices were not the same. In particular, the RMSEA was consistently higher when estimating the model in MPlus regardless of which estimator was used (WLSM or WLSMV), and the CFI was consistently lower when using the WLSMV estimator in MPlus. Here is an example:

I understand that the WLSMV estimator results in a different chi-square with estimated degrees of freedom. But can anyone explain why the other fit indices, in particularly the RMSEA, are not consistently the same? I am unclear about how to interpret the RMSEA because the value obtained from Lisrel seems to be quite supportive of global fit whereas the value obtained from MPlus would be indicative of poor fit. Thank you.

What is your sample size? Are you using LISREL's diagonally weighted least-squares estimator? I don't know how LISREL computes RMSEA, but the CFI difference of 0.96 versus 0.72 would seem to be straighforward to try to understand given that the formulas are transparent in this case. I would start with comparing the baseline test of a diagonal correlation matrix that CFI uses.

Thanks Bengt. The sample size for the above is 1444 and I was using the DWLS estimator in Lisrel. My understanding is that this should be the same as the WLSM estimator in Mplus. Is that correct? I also tried different sample sizes and different CFA models, but the results consistently show a larger RMSEA and a smaller CFI when using Mplus in comparison to Lisrel. So the different global fit values remain puzzling to me. Also, although I checked that the observed correlations and thresholds are the same (and, obviously, the model specification is the same), the estimated correlations and residual correlations are slightly different when comparing both programs. Unfortunately it is beyond my mathematical capability to figure out why this might be. Should I be expecting the same results from both software programs are is there a fundamental different between the two with respect to CFA models? Thanks so much.

I think LISREL's DWLS is different from Mplus' WLSM and WLSMV because of differences in how the weight matrix is estimated - although asymptotically they are most likely the same. Because of this, the parameter estimates will be slightly different - as well as chi-square and the fit indices derived from chi-square. But it seems to me that a quantity such as CFI should be rather similar. If LISREL reports the fit of the "baseline" model (uncorrelated variables), you can compare that to Mplus and see if that's where the difference begins. Which approach is best seems like a question that calls for a simulation study. The Hu simulation that we have posted on the website shows that CFI works well in the WLSMV context.

Yes, the chi-square for the independence model is indeed substantially different. For your information, the following values pertain to the same model as the one I reported yesterday: Lisrel 8.7 (DWLS estimator): baseline chi-sq (Df): 514052.97 (780) MPlus 4.0 (WLSM estimator): baseline chi-sq (Df): 960726.69 (780) MPlus 4.0 (WLSMV estimator): baseline chi-sq (Df): 60353.343 (49) I am surprised by the magnitude of the difference, but this does seem to explain why the global fit indices are different. I agree that a simulation study would probably be very informative. I am still puzzled about why the CFI based on the WLSM in Mplus (CFI = 0.94, which is fairly close to the one obtained when using Lisrel) is so different from the CFI based on the WLSMV estimator (CFI = 0.72).

I am wondering about the estimation in categorical models. I have estimated the same model in Mplus (in Mplus you would call it a factor model with ordered categorical indicators, all loadings fixed to 1) as well as in SAS using PROC NLMIXED and in a specialised software for Rasch models (Winmira). PROC NLMIXED and Winmira results are only trivially different which could be expected considering that they apply different estimation techniques (marginal ML vs. conditional ML). The log-likelihood that Mplus reports is, however, non-trivially smaller than that for the other two programs. Could that have anything to do with the message: "** Of the 15625 cells in the latent class indicator table, 36 were deleted in the calculation of chi-square due to extreme values." Are "extreme value" cells already excluded when maximizing the likelihood function or only afterwards?

after digging through the technical appendices I found out that Mplus uses the proportional odds model for regressing ordered categorical variables on the latent vars, whereas the models I estimated with PROC MIXED uses an adjacent category logit link - both result in the same number of parameters, but I have not yet figured out if this could possibly influence model fit...

Hello Bengt and Linda, I have a following problem: I am testing a model with a binary dependent variable( WLSMV estimatation) and a few latent factors as independent var. with the sample size of 400. I am getting a very nice RMSEA of 0.043 and WRMR of .984, CHi-square=222.001 with df=127. On the other hand my CFI equals to 0.781 and TLI equals to 0.838. Would you think that this model had an acceptable fit, even CFI is low?

I could not find any references except of Yu's dissertation about the good/bad cut of points for fit indices.... I thought maybe some new suggestions came out for the models with BINARY dep. variables.

I imagine that you have a p-value greater than .05 for the chi-square given the RMSEA. I suspect you have low correlations among your observed dependent variables making it hard to reject the H0 model. I would not conclude that this model fits the data.

I have been examining models with all dichotomous variables and using WLSMV to provide model fit indices. I was curious about the chi-square values that are given by using MLR. When I have run the same models, I see that MLR provides chi-square values. However, since WLSMV is default, I have trusted that this is the most appropriate estimator.

In short, is it appropriate to use the chi-square values from models using MLR both for model fit and comparisons?

Hello Linda, Thank you very much for the response. About the low correlations: This is a longitudinal model with 3 time points, with the binary dependent variable at the last time point. So in that case the correlations are not very high between time2 and time3. Is there any way I could improve the model with the low correlations?

I have tried running the same model, that has all 8 measured variables as dichotomous, using WLS, WLSMV, and MLR. WLS gives the "expected" number of degrees of freedom (28). WLSMV gives the appropriate number of degrees of freedom based on the robust correction (24). However, I cannot figure out how the df works for MLR. I think that the computation is based on total possible cells in the distribution (i.e. 2^8 = 256). However, there are 7 parameters being estimated in the model and the model reports 248 df. Are the total df computed as [(2^8)-1] = 255 in this situation?

The degrees of freedom for WLSMV are not computed in the regular way and should not be used in the regular way. The degrees of freedom for MLR should be the same as for WLS. It may be that some defaults are different and you are getting different parameters. If this does not help you, please send the input, data, and output for both WLS and MLR to support@statmodel.com along with your license number.

Two questions please! (1)How do I use BIC and AIC to judge the goodness of fit in an (one parameter or two parameter logistic) IRT model? (2) Pearson Chi-square and Likelihood Ratio Chi-square, when do they agree/not agree and why?. Is there a note or a reference which may help to understand these relations in a straightforward way?. Many thanks for any suggestions

Hello, I m computing a LCA and use TECH11(like K.Nylund on web seminar) to get the threshold values and their related probability scales. Unfortunately TECH11 does not display any probability scales. Any suggestions? Thanks for your help, Stephan

Hi Linda, no I guess its not the p-values. After computing a LCA I get threshold values and related Estimates, S.E. etc.

Here K.Nylund sais that it is easier to interpret Estimators by looking on the probabilities -and switched to the next table-. Hence users don't need to compute anything to interpret the Estimates. Her output (TECH11) shows, that people with a 1 in variable a have a z% probability of beeing in class x and so on. Regards, Stephan

I think Karen must have meant that it is easier to look at the output probability values rather than the logit values. Mplus gives both logits and probabilities for some models, for example, models without covariates. See the output from Example 7.3 where you will find the results in both logit and probability scale. This has nothing to do with TECH11.

Would you recommend running separate models to free up a parameter so that I generate fit indices, and thus report them in the manuscript or is it ok to not report fit indices given my interest in both direct and indirect effects of anxiety on the DV and thus report the findings as a path analysis or regression analysis using Mplus etc.?

I am doing path analysis with both continuous and categorical dependent variables using the MLR estimator. All variables in the model are observed.

Following previous communication with MPlus support, my understanding is that loglikelihood can be used to assess model fit in this situation, as other fit indices are not available. The following values are provided in my output:

Max LL for the unrestricted (H1) model : -3553.456

LL Ho model: -3475.079 Ho scaling correction factor for MLR: 0.972

Questions:

1) Is the unrestricted model a fully saturated model, ie. no degrees of freedom. If not, please explain the unrestricted/H1 model and how it is estimated.

2) Is it appropriate to assess model fit of the Ho model (my analysis model), by comparing it to the H1 model ?

So, in general, values of CFI and/or TLI below 0.95, suggest that the model does not have good fit?

In my specific case, the RMSEA is less than 0.06 and WRMR is less than 0.9, but the TLI and CFI are around 0.90. I am not sure how much weight to put on each to determine whether or not the model has a good fit.

My understanding was that if the sample is large, it is easy to find significant difference between the estimated and the "perfect" model. I am working with two samples, one is 1800 cases, and the other is about 6000 cases.

It is true that chi-square can be sensitive to sample size but that does not render it useless. You can consider doing a sensitivity analysis where you free parameters until you obtain a good fit. Then compare the results to your original model. If the results from the original model are different in the less-constrained model, this would indicate the chi-square was correct in saying the model fit poorly. If the parameters stay approximately the same, it would point to chi-square being sensitive.

I have a new dataset with ordinal items, although it's an 11 point scale. MPlus limits categorical items to 10 pts. In this situation, would you recommend treating the data as continuous? Or collapsing two points (ie 1 and 2)to make it categorical? And does MPlus have a facility for doing that?

Also, in a single factor CFA with continuous indicators: is it possible to correlate errors of the dependent indicators?

----------------------- Question 1. ----------------------- Back in 2003, you commented in a response to a user’s question regarding WRMR as follows:

"There have been few studies of the behavior of fit statistics for categorical outcomes. In your case, you have a combination of one categorical and several continuous outcomes. I know of no studies of the behavior of fit statistics in this situation. The following dissertation studied fit statistics for categorical outcomes. It can be downloaded from the homepage of our website.

You may have to do a simulation study to see which fit statistic behaves best for your situation."

I wonder if you have ever seen any studies published since 2003 that solves this issue. I myself have a similar situation: three mediating dichotomous variables six continuous outcome variables, and three independent variables, with the following fit indices:

CFI 0.95 TLI 0.94 RMSEA 0.12 WRMR 1.16

CFI and TLI look good. But RMSEA and WRMR look problematic. Among those four, which index should I trust?

----------------------- Question 2. ----------------------- I ran a similar model using Amos with the model that I described above. I got the following fit indices and did not understand why RMSEA was so different from the one that I got using Mplus.

CFI 0.92 TLI 0.84 RMSEA 0.75 (WRMR not calculated by Amos)

One thing that I should note is that I was originally using Amos but switched to Mplus because Amos did not sufficiently address dichotomous mediating variables. So I acknowledge that the model that I ran using Amos was not exactly the same as the one that I ran using Mplus. Yet, the coefficients are pretty similar to each other. Then I don't understand why RMSEA was so much worse in Mplus than in Amos, despite the similarity of the models. I would appreciate your help with these two regards. Thank you very much in advance.

None of the fit statistics from Mplus that you show meets the guidelines recommended in the Yu dissertation. Only CFI meets the Bentler cutoff. And you don't show chi-square which is probably the most studied fit statistic. I would not be satisfied that this model fits the data.

When you treat the variables as all continuous in Amos, you do not use the same estimator or methodology. You would expect some similarity in results but not agreement. This methodology also points to model misfit.

I am running a model with a binary outcome and have calculated a series of chi-square difference tests using DIFFTEST for WLSMV. The added paths are correlations between an exogenous and an endogenous variable. You have responded to previous inquiries that the CFI and RMSEA should be examined using standard cutoffs. Although the chi-square difference tests indicate improvement in fit, the CFI and RMSEA become worse (CFI drops from .81 to .44). I have examined the same exact model with similar continuous outcomes and the addition of these correlations has improved all fit statistics. Are the CFI and RMSEA valid for interpreting improvement in fit with binary outcomes? Are there any other peculiarities of modeling with binary outcomes that I may have overlooked?

Hello. I am running a model with a dichotomous outcome variable. I have estimates of coefficients. I would like to find the predicted probablity of the outcome for each subject. Is there an Mplus command to do this? Thanks!

Is there some reason why the typical SEM fit indices cannot be calculated in categorical models using ML estimation and the logit link? The fit indices are available using WLSMV / probit.

I can imagine that one of the following is true:

1) The fit can be calculated, but it hasn't yet been implemented in MPlus. It should be possible to calculate these indices using MPlus output.

2) Fundamentally, these indices have not been defined or do not make sense.

Can you clear this up? I'd rather use ML / logit in my case because of the ease of calculating predicted probabilities. Also I am calculating indirect effects and their SEs using an equation given by Winship and Mare (1983) using the model constraint feature, but to do this using the probit link would require MPlus to have an inverse probit function. It is quite easy to do using a logit link due to the closed-form nature of the inverse logit transformation.

It is a matter of what the unrestricted model (H1) is that we test our model (H0) against. With ML for categorical outcomes, H1 is the unrestricted multinomial model for the observed data frequency table. Mplus gives that, although this is not useful for large tables.

Weighted least squares fits models to the correlations for the y* variables, the underlying normal latent response variables. This gives a test of H0 against an H1 which is the unrestricted correlation model for the y* variables.

I'm testing a logistic regression model in Mplus en want to calculate the chi square of my model by using the difference in log likelihoods between a model with 3 predictors and the empty model, but how would this empty or unconditional model syntax look like?

I was wondering if in CFA using WLSMV (delta) in multiple groups the chi-square was the sum of the the chi-square of the same model run separately in each group? I run the following model separately in two groups (n1 = 448; n2 = 224):

Which fitted well the data in both group (just for interest here: Chi-square(g1) = 14.972, df = 11, p = 0.1838; Chi-square(g2) = 14.211, df = 9, p = 0.1150). Now, when i run the same model on both group simultaneously, with freeing the factor loadings in both groups, I obtained:

Chi-Square Test of Model Fit

Value = 40.329 df = 23 P-Value = 0.0141

Chi-Square Contributions From Each Group

G1 = 16.454 G2 = 23.874

My questions are: 1- why are the group contributions different from the chi-squares computed separately? 2- how can the model fit both groups separately but not simultaneously? I hope I'm not missing something obvious...

I wish to compare a number of nested models. The model has one predictor variable, two mediators, and a single outcome variable. The predictor and outcome are continuous variables. There are two mediational pathways, one path through a continuous mediator, and one path through a dichotomous variable. The nested models involve a multiple group analysis by which different pathways are allowed to vary by gender.

(PART 2) I was running this analysis with only the bootstrap indicator in the analysis statement, but it will only give me RSMEA and WRMR for model fit indices. I have included type=general, estimator=ML, parameterization=theta in hopes that this would provide me with AIC, BIC, and others, and most importantly, deviance and Chi Square. However, when I run the model, it says:

*** WARNING in ANALYSIS command PARAMETERIZATION=THETA is not allowed for TYPE=MIXTURE or ALGORITHM=INTEGRATION. Setting is ignored. *** ERROR in ANALYSIS command ALGORITHM = INTEGRATION is not available for multiple group analysis. Try using the KNOWNCLASS option for TYPE = MIXTURE.

Your help would be GREATLY appreciated! I have spent the last hours pulling my hair out over this one!

When comparing nested models using WLSMV, Mplus gives an output with the estimated delta chi-square and the delta df. Can these delta chi-squares and delta df be seen as parallel to the deltas when comparing nested models using ML? If so, what is the interpretation of 2(delta)df-(delta)chi-square (i.e. Akaike information criterion)?

Hi, I'm scaling my items in an IRT framework, using the MLR estimator in Mplus. For a Rasch model, I fix all loadings to one, for a 2 PL model, I estimate them freely. My aim is to check whether a Rasch or a 2PL model fits my data.

I have some questions concerning the output of both calculations:

Mplus gives BIC and sample-size adjusted BIC. We have data of 1187 subjects - which one to trust more?

Embretson and Reise describe a test comparing the log likelihood of both models with a chi square difference test - taking -2 times the log likelihood of both models and compare their difference to a critical chi square value. What I've done is using the log likelihood I get in Mplus. However, you do point out that for different estimators like WLSMV and MLR (the latter I use) one should use the difftest option instead of doing the chi square difference test by hand because these estimators calculate the chi square value and degrees of freedom differently. Does this apply to my case, too, where I use the log likelihood for both models?

Yes, thank you Linda. Can the delta chi-square and delta df be interpreted in such a way that one can compute delta AIC and select between the nested models based on the delta AIC (prioritizing parsimony)?

How to do difference testing depends on the estimator. With MLR you need to use the scaling correction factor given in the output following the directions on the website. WLSMV does not get a loglikelihood. Chi-square difference testing can be done using the DIFFTEST option.

thanks for your answer; I've done the calculations using the scaling correction factor. Just one more question concerning difference testing with MLR - what is the correct number of degrees of freedom for this test? To give an example - I have a test with 17 items. They are scaled in a 1 PL vs. 2 PL model. Is it the difference in number of free parameters I have in the Mplus output - which is (34 -18 = 16)? Or is it the actual difference in free parameters (34- 17 = 17)?

The correct number of degrees of freedom is the difference between the number of free parameters in the two models. The number free of parameters is given with the fit statistics at the beginning of the results section.

Dear MPlus experts, I am trying to compare models using the WLSMV estimator and DIFFTEST. To be precise I try to compare the following two models: 1) the partial mediation model in which both the continuous mediator and dichotomous dependent share the most antecedents, except for the mediator also impacting the dependent and 2) a “no mediation” model in which both the continuous mediator and dichotomous dependent have the same antecedents and the mediator does NOT impact the dependent (therefore the no mediation).

I would say that model 2 is nested within model 1, but the MPlus output states that this is not the case. Please see the syntaxes below. Could you help me out on this? The reviewers really want us to test a “no mediation model” as well and compare it with other models.

Hello, I am using the WLSMV estimator to estimate and compare CFA type models with categorical ordered polytomous indicators. I have a model in which all the parameters are freely estimated and I am comparing it to a nested more restricted model in which the parameters are fixed to values that were obtained from a (separate) calibration sample. When I do the DIFFTEST (I am sure I am setting it up correctly) it is significant. The chi-sq "test of model fit" value for the more restricted model is lower (and therefore leading me to believe its better fitting) than the chi-sq value for the freely estimated model. I have read on this board and elsewhere, however, that the chi-sq value for the WLSMV estimator is wrong and doesn't mean much (its the p-value that means something). Am I interpreting correctly that the more restricted model is fitting significantly better than the freed model based on the values of chi-sq I am seeing? Again the DIFFTEST is significant (p<.0001) but I don't know which model is fitting better because I am not sure about the meaning of the chi-square for each model due to using WLSMV.

I'm sorry, your response above is very clear with regard to how to interpret the DIFFTEST. But just for maximum clarification-- Which chi-sq values are meaningless? The DIFFTEST chi-sq value, the individual chi-sq values or (it seems like) all of them? Does this mean the chi-sq should never be reported (whether or not one is doing difference testing) when using WLSMV and categorical indicators? Thanks.

Very last question: Does sample size effect the sensitivity of the DIFFTEST? It would seem not to because the DF are based on the number of freed and fixed parms and not on the sample size. But I am trying to be clear. Ultimatley what I am seeing is that the DIFFTEST for WLSMV seems way too sensitive.

Prior to Version 6, I would report only the p-value. With the new adjustment in Version 6, the chi-square values given and degrees of freedom agree with the p-value given in a chi-square table and can be reported. The difference in the chi-square values should not be used for difference testing. The DIFFTEST option should be used in this case.

Dear mplus team, I have version 5 at the office and version 6 on my home computer. I have been runing some very simple models (regressing one outcome on one predictor at a time), but comparing the effect of different distributional assumptions on the fit and coefficients. I am getting vastly different outcomes for ML and MLR estimation going from v5 to v6. I see that there was a change in the way LL is calculated going from the two versions. But it seemed to be specific to models with covariates, of which these have none. Should I be conserned about this change? Thank you in advance for you help.

I am conducting a LCA using complex survey data. The indicators are 6 different diagnoses and I've included only those individuals who meet criteria for at least one of the 6 diagnoses in the analysis (but many have multiple, hence the impetus to identify classes). I am obtaining some strange results in terms of conflicting information criteria. Whereas the Lo Mendel Rubin adjusted statistic is no longer significant at the 2-class solution, it becomes signficiant thereafter until the 8-class solution. Most other criteria also reach a low at the 7-class solution. Any ideas why the Lo Mendel Rubin initially prefers the 1-class solution?

There is no theory for turning a weighted least squares chi-square into AIC.

I would use the same estimator with all models. I would recommend MLR if you are going to use maximum likelihood. Note that categorical data methodology handles floor and ceiling effects. These are not a problem for categorical outcomes.

Hello I am currently evaluating a CFA measurement model and subsequent structural regression model for an array of latent continuous, observed continuous/ordinal, and observed dichotomous variables and have been running into some unexpected model fit issues. The regression model is structurally just-identified (i.e., same number of paths being estimated as the measurement model), thus leading me to expect that the model chi-squared and other goodness of fit statistics should be identical with that of the measurement model. This is not, however, the case. Using WLSMV estimation, the model d.f., chi-square value, and goodness of fit statistics slightly differ when going from a measurement to structural model. Using WLSM estimation, model d.f. remains the same, but the chi-square value and goodness of fit statistics again differ when moving from the measurement to structural framework. What could cause these discrepancies? Is it something to do with how the chi-square value is estimated using WLSMV/WLSM? If I remove the observed dichotomous variables from the analysis and use maximum likelihood estimation, model chi-square and all goodness of fit statistics are identical in both measurement and structural models, as expected. Thanks, Anthony

With maximum likelihood and categorical factor indicators, means, variances, and covariances are not sufficient statistics for model estimation so chi-square and related fit statistics are not available. In this case, nested models can be tested using loglikelihood difference testing where -2 times the loglikelihood difference is distributed as chi-square.

The chi-squares referred to in the message are chi-squares tests of the observed versus the estimated frequency tables for the categorical indicators. These do not work sell with over 8 indicators.

I suggest using WLSMV if you want chi-square and related fit statistics.

Thank you for your answer. I want to use the model to further compare two nested model, one without and one with an interaction between latent variables (XWITH), therefor i need to use ML to compare loglikelihood. Q1: Can i report chi-square and related fit statistics based on WLSMV and loglikelihood based on ML? Q2: If not, how do I examine model fit using ML in my case?

1. You should not report fit statistics from two different estimators. You should report the fit statistics from the estimator whose results are being reported.

2. The best you can do is use the fit statistics before the interaction is added and then add the interaction to see if it is significant. See the following FAQ on the website where issues related to this are discussed:

The variance of a dependent variable as a function of latent variables that have an interaction is discussed in Mooijaart and Satorra

I assume you are using TYPE=EFA and the WLSMV estimator for which the DIFFTEST option is required for difference testing. DIFFTEST is not available for TYPE=EFA but you can do EFA using ESEM where it is available. See Example 5.24 (remove covarieates and indirect effect) and Example 13.12.

I am estimating an ordered categorical cfa. I am confused with my fit indices since my chi2 is not significant and CFI also looks good, but RMSEA is very high. And there is no modification index value above 4. Does that mean this is an acceptable model?

Hi I would like to test for measurement invariance using the difference in McDonald's non-centrality index (NCI) as recommended by Meade et al (2008) in "Power and Sensitivity of Alternative Fit Indices in Tests of Measurement Invariance" J Appl Psych. I am testing for invariance over time using categorical indicators in a one-factor model and have a sample size of approx 1900. I am using the difftest to get the chi-square test and calculating a difference in the CFI to compare models, but would like to also use the third measure recommended by Meade et al as I suspect the chi-square test may be affected by the large sample size. Thank you

We are comparing two repeated measures CFA models with two latent factors at two points in time using ordinal data and WLSMV. All latent factors are standardized. In model 1, all other measurement model parameters are freely estimated at both time points. In model 2, the measurement model parameters are constrained to be equal over time. The WLSMV chi-square difference test indicates, not surprisingly, that model 1 (unconstrained) fits relatively better than model 2 (constrained). However, the RMSEA is .047(.043-.050) for model 1 and .042(.039-.045) for model 2. We have checked Technical 1 to verify that all model constraints are exactly as they were intended. Do you have a possible explanation for these seemingly discrepant results of fit indices?

I tested two nested models with categorical variables by using WLSMV. The chi-square value was lower in the more restricted (H0)model than in the less restricted model (H1). I also got smaller df in H0.

I wonder why this happened because the more restricted model(H0) should have had a higher chi-square value and more df.

Can we interpret the WLSMV-based chi-square values as that the lower the value the better the model fit?

Dear Professors, I am beginner user of Mplus and i'm in the process of rerunning my analysis for my dissertation because LISREL didn't run for some reason so now i'm trying to learn Mplus to solve my research question. I would really appreciate it if you could offer some help. Specifically, i used a secondary dataset but the sample that will be used in my study is only 469. There's a lot of missing data (ranging from 10%-40%). The models i'm testing included binary categorical outcome variables. I used WLSMV. My question is: is it true that Mplus by default treats missing data using FIML? But FIML works under MAR missing data assumption, however,i think my data is MNAR. Then the model fit i got is: Chi-square value:22.952 , df=23, p value: 0.46 RMSEA: 0.000, 90CI=(0.000-0.038) CFI: 1.000 TLI:1.001 WRMR:0.538.

So, this model has good fit, right? My dissertation committee asked for SRMR, but how can I get SRMR using Mplus?

Thanks a lot Professor Muthen! One follow-up question, is it ok to go ahead interpret the results using these model fit statistics? If i have such good fit (probably due to low power to reject the H0 model), can i still conclude this model fits the data well?

I have a simple path model with 3 dichotomous variables: 1 exogeneous, 1 mediating and 1 the outcome variable (each of the variables are representing change in a variable between two waves). I specified my two endogenous variables as categorical and then mplus gives me RMSEA, Chi², CFI and WRMR statistics? RSMEA and CFI are not good at all, but is that surprising in the case of my model? Which is the appropriate fit statistics to consider in case of my model specification?

Dear professors, Some articles suggest that ULSMV (or ULS?)outperforms WLSMV (or DWLS) (Forero, 2009, Rhemtulla, 2012). Q1: What is your opinion regarding this conclusion? So I decided to try ULSMV instead of WLSMV when estimating CFA with 65 ordinal DV, 15 latent IV and N=230 in Mplus 7. When I am doing CFA with ULSMV I am not getting SRMR or WRMR. I thought it would be preferable to have more information about fit of the model and shifted to WLSMV, which provided me also with WRMR (but not SRMR). (!) When I switch from CFA to ESEM framework I get both SRMR and WRMR with both WLSMV and ULSMV estimators. Initial rational for switching from ULSMV to WLSMV disappeared. So I think I should switch back to ULSMV but now I am concerned with the lack of studies investigating proper cutoffs for descriptive fit indices with ULSMV estimation. (If you aware of such studies, can you mention them?) Q2: Is it logical to refer to such studies based on WLSMV estimation (e.g. Yu, 2002) as general recommendations for categorical variable methodologies? If no, should I continue using WLSMV because we (probably) know more about fit indices based on WLSMV? Q3: Can you, please, comment on this discrepancy of available fit indices of the same estimators between CFA and ESEM frameworks? Are interpretations of SRMR and WRMR indices in ESEM the same as in CFA? Thank You very much!

q1. I agree that Forero-Maydeu (2009) show a certain advantage for ULSMV over WLSMV in their study, but both are expected to perform reasonably well in most cases, so I wouldn't just switch from WLSMV to ULSMV.

q2. Yes. Perhaps.

q3. I don't think you need SRMR - with CFA you already have chi-2, RMSEA, and CFI. SRMR is more of a descriptive statistics that you typically use with EFA explorations. I would not rely on WRMR very much because it sometimes gives results quite different from other fit indices.

I would also suggest including comparisons of your hypothesized model not only with the unrestricted H1 model but also with other H0 models that are only somewhat more relaxed, what I call "neighboring models".

Modifications improved only descriptive fit indices. Kline(2011) reminds as about the importance of the X^2 while some studies report that WLSMV X^2 are inflated (particularly in harsh conditions)(e.g. Potthast, 1993; Beauducel, 2006). Q1: How you think, is it reasonable to accept the model with the provided fit?

I have also tried to get non-significant X^2 by saturating the model as once you have suggested 'sensitivity analysis' for parameter estimates if we think X^2 is overestimated. Yet I am not successful at this (I get problems with identification or convergence). Q2: Would you recommend more efforts in this direction?

>>> Also thank you for your recommendation regarding "neighboring models".

I wanted to separate one factor from its EFA set in ESEM and test it as nested model. Brown (CFA for applied research, 2006) says "use of the x^2 diff test is not justified when neither solution provides an acceptable fit to the data". According to this diff tests in my case turns useless, don't they? (and I cannot make use of "neighboring models"). Q3: So I can get out a factor from the EFA set only based on negligible cross-loadings (at least I have showed negligible cross-loadings instead of not including the factor in the EFA set from the begining because it needed to serve as DV) and theoretical considerations (DV), can't I?

I think you have already answered that question. I wanted to separate a latent from an initial EFA set where it was with some other latents. I wanted to justify that decision by the diff test and theory. As you confirmed diff test is not useful in this case (poor fitting parent model), so it is not reasonable to do such modifications until I get good fitting (parent) model, right?

I ran a CFA with 26 indicators across four factors. All indicators were ordered polytomous and treated as such in the model. I'm attempting to test alternate models and would like to examine differences in BIC. I know the default estimator (WLSMV) does not provide this and have changed the estimator to ML. However, I'm getting an error message regarding co-variances being not defined with the ML estimator that I don't get with WLSMV. The error is specific to WITH statements for items that are conceptually similar - but distinct - and I'd like to model their co-variance(e.g., He slapped me. Violence Victimization VS. I slapped him. Violence Perpetration). *** ERROR in MODEL command Covariances for categorical, censored, count or nominal variables with other observed variables are not defined. Problem with the statement: UN012 WITH UN025

We don't allow the WITH option to specify residual covariances because each one requires on dimension of integration when maximum likelihood estimation is used. If you don't have too many, you can specify

I have a questionnaire with 55 items (not normal distribution). Items are categorical variables (a three-point Likert scale). I need to find a factor model using EFA.I have looked at a number of solutions.

The parallel analysis suggested four-factor solutions.

WLSMV: Often gave a three-factor solution (no downloads in the four factor).

MLR with categorical variable gives the four-factor solution. The Fit indices of this four-factor model are better than the three-factor solution (CFA).

Is this MLR with categorical variables (EFA) more suitable for my data processing?

I think that I cannot use the MLR with continues variables (EFA) since the data is categorical?

Hi, I am doing a simulation study in which I analyzed simulated categorical data for a CFA model. I used the "WLSMV" option and don't see SRMR being displayed in my output. The input file is a rather simple one:

Dear Linda, We ran a cross lagged panel model in Mplus with categorical predictors and categorical outcome variables. As we wanted odds ratio's an outcome variables (instead of probit regression coefficients) we implemented ML as estimator (estimator is ml; algorithm is integration;integration is montecarlo;) However, after including this comment, we do not get fit statistics like rmsea/chi square/ BIC ans SRMR. Is there an option to still get these fit statistics? Or is this only possible for probit regression?

I'm running a latent growth curve model with three-level categorical indicators. I also have time-varying covariates, which are my main variables of interest. Observations are clustered in families, so I'm also using the CLUSTER = command. I ran the model both using WLSMV and MLR. I was planning to use the parameter estimates from MLR and the model fit statistics from WLSMV. I found though that the p-values of the parameter estimates were very different between the two models. I knew the parameter estimates would differ due to logit vs. probit, but wasn't expecting large differences in statistical significance. Do you know why that would be the case? Is it invalid to use the model fit statistics from WLSMV if I'm reporting the parameter estimates from MLR?

Yes, it is invalid to use model fit statistics from WLSMV and parameter estimates from MLR. Differences may be due to different missing data procedures in the two estimators and different models fitting the data differently.

The chi-squares that are referred to here compare observed versus expected frequencies of the multiway frequency table of your categorical factor indicators. They are not useful if you have more than about 8 indicators and if they do not agree. It sounds like you have more than 8 indicators.

The chi-square that you are thinking about and related fit statistics like RMSEA etc. are not available when maximum likelihood estimation is used with categorical dependent variables. In this case, means, variances, and covariances are not sufficient statistics for model estimation.

Now please, a related question is: what is(are) the best way(s) to compare different models in my case (categorical indicators, MLR estimation), as I have also a correction factor for the loglikelihood?

OK, thanks again Linda, I just needed a confirmation. In relation to the same models I am enquiring about, is it normal that the estimation of a second-order factor (with categorical items) model takes 14 hours, and it's not over yet? I would prefer to continue with ml-type estimation. My model is as follows: Analysis: type is general; estimator is mlr; Model: ME by x1 x2 x3 x4 x5; GE by y1 y2 y3 y4; BL by z1 z2 z3; G by ME GE BL; Thank you.

Hello Dr. Muthen I'm new to this board and have been trying to find the answer to how to report fit indices for models with binary outcomes.

I have a path model with 1 continuous exogenous variable, 4 continuous mediator variables and 3 binary dependent variables. Because the outcomes variables are binary I used estimator=mlr

I have two questions about this: 1. If required, how would I report the fit statistics for this model, as the mlr analyses provide me with AIC/BIC and log likelihood measures and I am not sure that I (or the reviewers of my paper) know how to interpret these. This is what I get: MODEL FIT INFORMATION Number of Free Parameters 58

2. I would like to be able to control for the effect of each dependent variable on each other and was told to use parameterization = theta; However, this gives me the following message: "PARAMETERIZATION=THETA is not allowed for TYPE=MIXTURE or ALGORITHM=INTEGRATION. Setting is ignored."

1. When absolute fit indices are not available, chi-square difference testing using the loglikelihood can be used to compare nested models. BIC can be used to compare neighboring models that have the same set of dependent variables.

I am running a MIMIC model with 11 categorical x1 through x11 and one latent factor y1. MPLUS does not provide absolute fit indices such as RMSEA & CFI. What could be the reason for this? I could not find information in the v.7 handbook.

we conducted a CFA and wanted to see the model fit of the measurement model. we did it in AMOS first and got very good model fit (all fit the rule of thumbs). Since AMOS does not differentiate continuous and categorical data, I re-analyzed the model in mplus. Before i added categorical data the result is almost the same as that in amos: Remsea .046, CFI .98 TLI .975 WRMR .055. But when I added categorical data, I got low TLI and also relatively low CFI.

You will obtain different results when you treat variables as categorical versus continuous. When categorical variables are treated as continuous, their correlations are attenuated. Lower correlations make it more difficult to reject the model.

I am running path analysis which includes one binary mediator variable and accounts for clustering. Based on my desire for fit statistics and the suggestions from previous discussion posts, I just changed from MLR to WLSMV.

I got the desired model fit statistics, but my means in estimated sample statistics are way off, and I can't figure out why. For example, the mean for my dependent variable, height-for-age z-score, changed from -2.212 to -9.173 (implausible).

Can you offer any explanation or fix for this problem? I'd really like the absolute fit indices, but I don't think I can use the output knowing that my means are off.