How can I do CFA with binary variables? | Stata FAQ

Let’s say that you have a dataset with a bunch of binary variables. Further, you believe that
these binary variables reflect underlying and unobserved continuous variables. You don’t
want to compute your confirmatory factor analysis (CFA) directly on the binary variables. You will want to compute the CFA
on tetrachoric correlations that reflect the associations among these underlying continuous
variables. We will demonstrate this by using data with five continuous variables and
creating binary variables from them by dichotomizing them at a point a little above their
mean values.

As you can see, the correlations among the binary version of the variables are much lower than
among the continuous version. The Pearson correlations tend to underestimate the relationship
between the underlying continuous variables that give rise to the binary variables. What we
need are the tetrachoric correlations which we can obtain using the tetrachoric
command.