Re: Principal component analysis PCA

You need to create the covariance matrix of people (rather than of SNPs)
for the 0/1/2 genotype at each SNP and take the principal components of
that matrix.

In this case the number of individuals is small enough that you should be
able to create the covariance matrix directly by matrix operations. In
larger data sets where the entire data matrix doesn't fit in memory, you
need some sort of double loop.

Re: Principal component analysis PCA

Thanks for the advice.

I tried to find the cov of my matrix using R and it ran out of memory. I am not sure how to do double loop to create the covariace matrix? Also is doing prcomp( covariace matrix) the same as finding
prcomp( original data ,matrix of snps)?

You need to create the covariance matrix of people (rather than of SNPs)
for the 0/1/2 genotype at each SNP and take the principal components of
that matrix.

In this case the number of individuals is small enough that you should be
able to create the covariance matrix directly by matrix operations. In
larger data sets where the entire data matrix doesn't fit in memory, you
need some sort of double loop.

Re: Principal component analysis PCA

>
> Thanks for the advice.
>
> I tried to find the cov of my matrix using R and it ran out of memory.

How did you do this? The covariance matrix is only 115x115, so it
shouldn't run out of memory
cov(t(code))
should work

If that doesn't work then
tcrossprod(code)/300000 - tcrossprod(rowMeans(code))
might.

> I am
> not sure how to do double loop to create the covariace matrix? Also is
> doing prcomp( covariace matrix) the same as finding
> prcomp( original data ,matrix of snps)?

That's the point of the paper behind the EIGENSTRAT software, which is
worth reading. The eigenvalues are the same and the eigenvectors are
related. One way around gives the left singular vectors of the data
matrix, the other gives the right singular vectors.