I have a $20\times100$ matrix $X$, containing my $N=20$ samples in the $D=100$-dimensional space. I now wish to code up my own principal component analysis (PCA) in Matlab. I demean $X$ to $X_0$ first.

I read from someone's code that in such scenarios where we have more dimensions than observations, we no longer eigen-decompose the $X_0$'s covariance matrix. Instead, we eigen-decompose $\frac{1}{N-1}X_0X_0^T$. Why is it correct?

The normal covariance matrix is of size $D\times D$, each element of which tells us the covariance between two dimensions. To me, $\frac{1}{N-1}X_0X_0^T$ is not even of the correct dimensions! It is $N\times N$ matrix, so what would it tell us? Covariance between two observations?!

$\begingroup$The answer to your question is in the circumstance that - as it follows from your posing your task - you don't need the covariance matrix of the columns for itself. You only wanted it as a path to obtain PCs. Right? But the same PCA results can be obtained via eigen of X'X and XX' (as well as svd of X and X'). What is called "loadings" in one case will be called "pc scores" in the other and vice versa. Because both are just coordinates (see, for example) and the axes, the "principal dimensions" are the same.$\endgroup$
– ttnphnsApr 23 '15 at 6:56

1

$\begingroup$(cont.) If so and you are free to choose which to decompose - it is wise to decompose that which is to do faster/more efficiently. When n<p it takes less RAM and less time to decompose XX' since it is of smaller size.$\endgroup$
– ttnphnsApr 23 '15 at 7:00

$\begingroup$@ttnphns Great explanation. I see the point now. However, I still have problems going from eigen of XX' to the PC. Could you please very briefly show me how? Given that PCs are just eigenvectors of the covariance matrix, I attempted to move from eigen of XX' to eigen of the covariance matrix X'X, but failed.$\endgroup$
– Sibbs GamblingApr 23 '15 at 7:09

1

$\begingroup$I have to go. Perhaps @amoeba (who is much more agile in algebra than me) or another reader will look in here soon and help you. Cheers.$\endgroup$
– ttnphnsApr 23 '15 at 7:15

1 Answer
1

The covariance matrix is of $D\times D$ size and is given by $$\mathbf C = \frac{1}{N-1}\mathbf X_0^\top \mathbf X^\phantom\top_0.$$

The matrix you are talking about is of course not a covariance matrix; it is called Gram matrix and is of $N\times N$ size: $$\mathbf G = \frac{1}{N-1}\mathbf X^\phantom\top_0 \mathbf X_0^\top.$$

Principal component analysis (PCA) can be implemented via eigendecomposition of either of these matrices. These are just two different ways to compute the same thing.

The easiest and the most useful way to see this is to use the singular value decomposition of the data matrix $\mathbf X = \mathbf {USV}^\top$. Plugging this into the expressions for $\mathbf C$ and $\mathbf G$, we get: \begin{align}\mathbf C&=\mathbf V\frac{\mathbf S^2}{N-1}\mathbf V^\top\\\mathbf G&=\mathbf U\frac{\mathbf S^2}{N-1}\mathbf U^\top.\end{align}

Eigenvectors $\mathbf V$ of the covariance matrix are principal directions. Projections of the data on these eigenvectors are principal components; these projections are given by $\mathbf {US}$. Principal components scaled to unit length are given by $\mathbf U$. As you see, eigenvectors of the Gram matrix are exactly these scaled principal components. And the eigenvalues of $\mathbf C$ and $\mathbf G$ coincide.

The reason why you might see it recommended to use Gram matrix if $N<D$ is because it will be of smaller size, as compared to the covariance matrix, and hence be faster to compute and faster to eigendecompose. In fact, if your dimensionality $D$ is too high, there is no way you can even store the covariance matrix in memory, so operating on a Gram matrix is the only way to do PCA. But for manageable $D$ you can still use eigendecomposition of the covariance matrix if you prefer even if $N<D$.

$\begingroup$Great answer! I didn't know it has a name! Thanks a lot! I am now confident to use it to speedup my computation.$\endgroup$
– Sibbs GamblingApr 23 '15 at 17:54

3

$\begingroup$My answer assumes that what you want to get is $U$, and perhaps also $S/(n-1)$. If you also want to obtain $V$, then you can compute it via $U^\top X$ after you got $U$. In fact, if your dimensionality is too high, there is no way you can even store the covariance matrix in memory, so operating on a Gram matrix is the only way to do PCA.$\endgroup$
– amoebaApr 23 '15 at 17:55

$\begingroup$This answer is clearer that a lot of expositions I have seen in books. Thanks.$\endgroup$
– usεr11852Dec 13 '15 at 3:14

$\begingroup$For purely references purposes: I think the 1969 Technometrics paper of I. J. Good "Some Applications of the Singular Decomposition of a Matrix" is one of the first to first reference this fully.$\endgroup$
– usεr11852Dec 13 '15 at 4:00