Surface similarity-based molecular query-retrieval

Author affiliations

Department of Computer Science, San Francisco State University, San Francisco, CA 94132, USA

Citation and License

BMC Cell Biology 2007, 8(Suppl 1):S6
doi:10.1186/1471-2121-8-S1-S6

Published: 10 July 2007

Abstract

Background

Discerning the similarity between molecules is a challenging problem in drug discovery
as well as in molecular biology. The importance of this problem is due to the fact
that the biochemical characteristics of a molecule are closely related to its structure.
Therefore molecular similarity is a key notion in investigations targeting exploration
of molecular structural space, query-retrieval in molecular databases, and structure-activity
modelling. Determining molecular similarity is related to the choice of molecular
representation. Currently, representations with high descriptive power and physical
relevance like 3D surface-based descriptors are available. Information from such representations
is both surface-based and volumetric. However, most techniques for determining molecular
similarity tend to focus on idealized 2D graph-based descriptors due to the complexity
that accompanies reasoning with more elaborate representations.

Results

This paper addresses the problem of determining similarity when molecules are described
using complex surface-based representations. It proposes an intrinsic, spherical representation
that systematically maps points on a molecular surface to points on a standard coordinate
system (a sphere). Molecular surface properties such as shape, field strengths, and
effects due to field super-positioningcan then be captured as distributions on the
surface of the sphere. Surface-based molecular similarity is subsequently determined
by computing the similarity of the surface-property distributions using a novel formulation
of histogram-intersection. The similarity formulation is not only sensitive to the
3D distribution of the surface properties, but is also highly efficient to compute.

Conclusion

The proposed method obviates the computationally expensive step of molecular pose-optimisation,
can incorporate conformational variations, and facilitates highly efficient determination
of similarity by directly comparing molecular surfaces and surface-based properties.
Retrieval performance, applications in structure-activity modeling of complex biological
properties, and comparisons with existing research and commercial methods demonstrate
the validity and effectiveness of the approach.