For binary data are thresholds equal to intercepts for continuous data? If so, in a CFA multi-group factor analysis, given that factor loadings and thresholds must be held equal in tandem, is testing for invariance of the factor loadings across groups equivalent to testing for both metric invariance (factor loadings) and scalar invariance (thresholds/intercepts) at the same time?

I am assuming that I use the MI titled "Means/Intercepts/Thresholds". The output gives a MI for [item1] and a MI for [item1$1]. I am assuming the first is the MI for the factor and the other is the MI for the threshold, for the same item. The value for both the factor and threshold is the same. How would you determine if it is the factor or the threshold that is non-invariant across groups from this info (see comment above April 16).

Also, would the list of MI include all factor/thresholds that are contributing to the non-invariance? For example, for one analysis the MI output listed two items under "Means/Intercepts/Thresholds", but when I tested for non-invariance for each individual item (DIF)I found five items that were non-invariant across groups. Why the discrepancy (i.e., two versus five)? I was wondering if I needed to do some type of correction to account for the number of tests of invariance I conducted, i.e. tested three factors and 35 items. That might reduce the number of significant findings.

One additional question ... I have one item in the MI output that has 15 or 16 suggested correlations ("ON" and "WITH") with other items and factors that would reduce chi-square, and I was wondering if so many suggested correlations and cross-loadings indicated that there was an issue with that particular item? What does that say about the item?

In doing an IRT of 7 variables (3 categories each), with the default settings (which I believe are a normal o-give model with WLS) when I look at each of the 2 estimated thresholds per variable in the output, the first is always smaller in magnitude than the second. I assume this is an artifact of the probit model for ordinal polytomous regression. However, when I look at the ICC curves I see that, for a majority of the variables, the value of the factor at which the probability curves for categories 1 and 2 cross (I assume this 'difficulty' value is the severity where a jump between adjacent categories is equally likely and computed as the corresponding threshold/loading) is larger than the severity value at which the probability curves for categories 2 and 3 cross. As higher response categories are intended to be indications of higher severity for all variables for which this occurs, this is puzzling. Is there an easy explanation for this? It is true that, for many of these, the prevalence of the 2nd category is less than that of the third. Thanks.

The peaks of the ICCs for the categories are indeed ordered by the threshold values. Is there then information in these curves regarding when and how combining of categories is appropriate? For example, for any given item, the ICC curve for the first and last category intersect at a given factor value but the peaks of the probability curves for all the other categories in between never rise to the level of intersection of the probability of the first and last. Say an item had 5 categories at that the peaks for the 3 intermediate categories occurred at factor values that were on different sides of the factor value where the probability of the first and last category intersected. Could one then use this as a rule to decide which categories were combined? Thanks again,

A follow-up question regarding your response on April 15th to my initial posting. I'm trying to reconcile the threshold estimates to physical characteristics of the ICC curves...my initial post to you was that I thought they corresponded to where the probability category curves for adjacent categories crossed...but your response and the fact that the crossings aren't ordered ruled that out. You mentioned that the peaks should be ordered by the threshold values, and indeed they are..but given there are k-2 peaks (as the first and last categories don't have them) and only k-1 thresholds, I've clearly got something wrong. Is there an actual probability curve characteristic that corresponds to the values of the estimated threshold parameters? Thanks much again in advance,

Thanks much for the reference. One more reference request, if you know of one. I have a number of groups (data from different countries) each of which have the seven AUDIT 5-category alcohol problems...do you know of anyone who has used the multilevel functionality of Mplus in the context of a polytomous IRT for a DIF application paper? I've looked but not found anything satisfactory. Thanks again,

I have 17 countries...but it might make more sense to only look at DIF for meaningful subgroups of countries. Might you know of any references where anyone has looked at DIF when a moderate number of groups are used (I only recall having found analyses with G=2). Thanks,

Hello, I wonder whether my interpretations of changes of thresholds across two time points are correct. Thresholds of all items at Time 2 are higher than those at Time 1. Take item 1 for example, three thresholds of item 1 are .017, .92, 1.428 at Time 1, but they are .460, 1.52, 2.129 at time 2. Could I interpret the observed scores of this measure are underestimated at Time 2 (because all items become more difficult at time 2 and participants need more latent traits to endorse the same category of items at time 2)? Thanks a lot.

No, you need a well-fitting model to be able to make t6hose statements. You can't talk about "more latent traits" accounting for the change, nor can you talk about underestimation. All you can say is that the percentages of the observed variable have changed in a certain direction.

Using the new mplus shortcut code for measurement invariance I checked the scalar vs. Configural invariance for a 5-item domain with 3-item likert scale (2 thresholds) and I found non invariance. To locate the source of non invariance I replicated the analysis manually and in the modification indices I found the following to be provlematic: [item1$ ]. In examples I found in the literature the MI usually indicate which thershold is problematic and so I am confused as to what [item1$ ] means. When I free both thresholds of that item [item1$1* item1$2*] the fit improves, so would that be the right way of doing it?

Using the new mplus shortcut code for measurement invariance I checked the scalar vs. Configural invariance for a 5-item domain with 3-item likert scale (2 thresholds) and I found non invariance. To locate the source of non invariance I replicated the analysis manually and in the modification indices I found the following to be provlematic: [item1$ ]. In examples I found in the literature the MI usually indicate which thershold is problematic and so I am confused as to what [item1$ ] means. When I free both thresholds of that item [item1$1* item1$2*] the fit improves, so would that be the right way of doing it?

Thank you in advance

(I am reposting this as it does not appear in this thread - so apologies for any duplications)