I am trying to understand the difference between PCA and FA. Through google research, I have come to understand that PCA accounts for all variance, while FA accounts for only common variance and ignores unique variance.

However, I am having a difficult time wrapping my head around how exactly this occurs. I know PCA rotates the axis used to describe the data in order to eliminate all covariance. Does this step still occur in FA? If not, what differentiates FA from PCA? Thanks in advance.

1 Answer
1

The difference between PCA and FA can be thought of in terms of the underlying statistical models (regardless of estimation methods, although these will change depending on the model used).

Consider $n$ iid observations of a $p$ dimensional (column) vector $X$. Suppose that for each $X_i$, $i \in \lbrace 1, \dots, n\rbrace$, we also had a $k$ dimensional vector $f_i$, with $k \leq p$. These are our "latent factors". A (linear) factor model assumes that $\mbox{E}(X_i \mid f_i) = Bf_i$, where $B$ is a $p \times k$ "factor loadings" matrix and $\mbox{Cov}(X_i \mid f_i) = \Psi$, a diagonal matrix. If we further assume that $\mbox{V}(f_i) = \mbox{I}_k$ so that the factors are independent we see that the marginal covariance is $\Sigma \equiv \mbox{Cov}(X_i) = BB^t + \Psi$.

Roughly, you can think of PCA as making the assumption that $\Psi$ is the zero matrix. In both cases the goal is to find/estimate rotations ($B$) that explain covariance patterns.

If we remove the estimation part of the problem and assume we have $\Sigma$ in hand, the difference is between two ways of decomposing a covariance matrix. We either want a "factor decomposition" $\Sigma = BB^t + \Psi$ or a principle component decomposition $\Sigma = BB^t$.

I think the key really is this: Any
covariance matrix will admit either
kind of decomposition, but often the
rank of $B$ will be substantially
smaller if we allow the diagonal
elements of $\Psi$ to be non-zero as
in the factor decomposition.

Incidentally, finding the factor decomposition for a given covariance that minimizes the rank of $B$ is known as the Frisch problem and is computationally demanding.

PS. I hope this isn't merely a restatement of your remark that "PCA accounts for all variance, while FA accounts for only common variance and ignores unique variance".

The focus of the first sentence (1) and that of the enhanced-formatted one (2) are diametral. In fact it is the statistical model which gives the basis for the correct choice. In (2) it is correctly added that the covariance matrix admits both models. But then proceeds with a "rank"-argument (a purely mathematical one) instead of the statistical argument. (...)
–
Gottfried HelmsOct 6 '14 at 2:21

(...) After we've chosen the FA-model, and decided to respect the itemspecific variance: then we have the option to define (from theory or some earlier study) the amount of that part of variance -as large as possible, or possibly even zero or also slightly correlated. In fact (and is little known) the programmed estimation of the itemspecific error (based on the inversion of the covariance matrix) does not give the maximal possible sum of item-specific error-variance; by manually experimenting with this (when I've studied the method again) I could arrive at higher sums.
–
Gottfried HelmsOct 6 '14 at 2:27