I have a set of 11 categorical variables (some binary and the others have up to 7 levels). I was interested in performing factor analysis on this data. In all of the mulitvariate analysis books that I have read, it says that you cannot perform factor analysis on categorical data. How is it that your program can do what others say is impossible? Is there a paper that you can suggest that would help me to understand why this can be done? Also, I could not find out what type of correlation matrix the program uses in the factor analysis. I found that it uses tetrachoric correlations when the data is binary, but what about when it has more then two levels. Thanks for your help in this.

I think what these books mean to say is that the methods they are presenting are for continuous outcomes. If you go to www.statmodel.com under References, you will find a set of papers and books related to the factor analysis of categorical outcomes. For ordered polytomous variables, factor analysis uses a polychoric correlation matrix.

We're attempting to run a EFA on 13 categorical items, but whilst it allows us to run after specifying a 2 factor solution, it cannot/won't allow us to run it for a 1 factor solution. The error message says Unrecognized setting for TYPE option: 1.

You may be saying TYPE = EFA 1. You should say TYPE = EFA 1 1; You need to give both a minimum and a maximum number of factors to extract. If this is not what you are doing, please send the output so that I can take a look at it.

My EFA in MPlus and SPSS ran fine but I am getting an error message that my correlation matrix is not positive definite when I run it with CEFA and SAS. I have 37 items with n=102. Is there a way I can calculate the determinant for various combinations to investigate which items are causing it to go to 0. Also, do you know if the 0 determinant thresholds are different such that MPlus will run but SAS won't.

I think SAS probably has a function to invert postive definite matrices. You could try to invert the matrix to see where the problem is.

The default for EFA in Mplus is the unweighted least squares estimator for which positive definiteness is not required or checked. This may also be the case in SPSS. I don't know. If you use the ML estimator in Mplus, it will check for positive definiteness of the sample correlation matrix.

Unlike ML, there is nothing in the computations for ULS that requires pos def of the sample corr/cov matrix. A factor model implies a pos def model-estimated matrix, but you can fit such a model to a non pos def sample counterpart - the model may still fit well. In this sense, the non pos def of the sample may not be "significant". This is often the case with tetrachoric and polychoric correlation matrices. Currently, there is no printout of the determinant. With EFA, however, there is a printout of the eigenvalues.

I am conducting an EFA on a set of 13 categorical manifest variables. I am new to Mplus and have been struggling with some of the finer aspects to the code. Below's code gives me the basic output, but I was also hoping to see the polychoric correlation matrix for my items and the eigenvalue plot, but my foray into the code has been sketchy at best. Can you help me out?

See the PLOT command for the eigenvalue plot. See the SAMPSTAT option of the OUTPUT command for the tetrachoric correlations. You can also use TYPE=BASIC in the ANALYSIS command to obtain these. If you have further questions of this type, please send them along with your license number to support@statmodel.com.

I would like to see the polychoric correlation matrix for 24 item with n=334 using SAS program. Proc freq can only calculate the polychoric correlation coefficients for each two items separately, but does not display the correlation matrix.

Hi; I have done an EFA for 24-item scales with 4 points and the sample size was n=400; I obtained 3 solutions by PCF, PAF and ML using the polychoric correlation matrix. Eight factors should be retained in the PCF and PAF and the solution accounted only 62% of the variance in both of two solutions but the common factors were not the same in respect of the factorsí items. But when I performed the ML, only 4 factors should be retained and the solution accounted 100% of the variance. The ML solution was statistically acceptable more than the other in respect of RSMR & RSMP; I just wanted to know, what is the best method to perform the exploratory factor analysis? And why there is a difference between the 3 solutions? Am I in the wrong way?

It does not sound like you are using Mplus. In Mplus, EFA on categorical items can be done by ULS, WLSMV, or ML. ULS and WLSMV fit the model to polychorics. I would not expect large differences between the 3 Mplus approaches.

One quick question, Why EFA on categorical items can only be done in Mplus by ULS, WLSMV, or ML. ULS and WLSMV fit the model to polychorics? Where are the principal components factoring and the principal axes factoring as EFA methods? Is it for a theoretical reason or a technical reason?

Principal components analysis is a biased estimator for the factor analysis model (because it assumes zero residuals). It is only used to generate starting values. Principal factoring (with iterated communalities) gives I believe the same results as ULS.

You very kindly helped me with some advice regarding the rotations of factors, and I am e-mailingyou with another query, and would be much appreciated if you could point me in the right direction.

I am interested in running a factor analysis on true binary data but was wanting to avoid some of the pitfalls with linear correlation matrices. Is there any literature available as to whether it is possible to conduct a factor analysis on distance generated matrices (i.e. Jaccard)?

Dear authors, I read a post in this topic that says "EFA on categorical items can be done by ULS, WLSMV, or ML. ULS and WLSMV fit the model to polychorics.". But, when I try to estimate an EFA on some categorical indicators with ML estimator Mplus (version 4.2) gives me this warning: *** WARNING in Analysis command Estimator ML is not available for EFA analysis with non continuous variables. Default estimator will be used.

Hello, I want to know if several indicators of the same latent factor can be in different metrics. For instance, latent variable X has three continuous indicators x1 x2 x3. x1 is on a 0-5 scale, x2 is on a 0-100 scale, x3 is 0-1 scale. Would such a composite work? Is there a suggested limit to metric difference? Like the largest range of indicators should be less than 10 times of the smallest range of the indicators. Thank you very much!

The only cases that would be excluded in an EFA would be cases with missing on all of the dependent variables. If half are being excluded, you may be reading your data incorrectly, for example, you may have more variable names in the NAMES list causing two records to be read instead of one.