Clustering biomolecular structures by residue contact similarity

Comparing and clustering three-dimensional protein structures is a commonly employed end-game strategy in protein structure prediction. While the root mean square deviation of atomic coordinates (rmsd) between pairs of structures underlies most of these clustering methods, it becomes slow to compute for larger datasets, and also inaccurate for larger systems. We present a similarity measure based on atomic contacts that allows clustering in a much shorter time (~100-fold reduction) than rmsd methods without compromising accuracy. Our measure, the fraction of common contacts, is able to deal effortlessly with symmetrical assemblies that would otherwise require lengthy iterative rmsd calculations. It also handles different types of molecules such as proteins, nucleic acids and small ligands. Overall, our method offers an all-round computationally cheap solution to the problem of structural clustering in protein docking.