This project was completed by Syed Rahman. The project was his own idea which he brought to us and completed as a summer intern in 2015.

The CONCORD algorithm implemented by Syed Rahman

The CONCORD algorithm is a method to estimate the true population of a co-variance matrix. The co-variance matrix is a summary of the relationship between every pair of fields in the data. Co-variance values close to zero indicate that the fields don’t have a relationship. Values close to 1 indicate a positive relationship and values close to –1 indicate an inverse relationship.

In classic statistics there are many more observations than fields. In this case, the co-variance matrix of the sample is a good estimate for the true co-variance matrix.

Unfortunately, in big data, there any many cases where the number of fields exceeds the number of observations or may be close to the number of observations. It is the case that the sample co-variance matrix is a very poor estimate for the true co-variance matrix.