The image above is from Jain (2000), which introduced a generalized method for computing molecular similarity. It depicts the differences in molecular surfaces of two molecules from observer points in space.

ABOUT US
Predictive computational modeling encompasses all of the work in the
Jain Lab. This takes primary form in algorithmic approaches for drug
discovery. The primary areas of research in the lab are: 1) methods
for docking small molecules to proteins using empirically derived
scoring functions, 2) methods for inducing the shape of a protein
binding pocket given the structures and affinities of ligands that
bind the pocket competitively, 3) generalized surface-based approaches
to computing molecular similarity, both among small molecules and
proteins, 4) approaches for modeling and prediction of
polypharmacology based on molecular structure, and 5) applications of
such methods for cancer drug discovery. All of the approaches share
their roots in the use of sophisticated computational algorithms
involving object representation, function optimization, and search.

History
Following a long period of applied research in defense applications
and in speech understanding, Prof. Jain began a research career
exclusively focused on issues in computational chemistry and
computational biology. His foundational work in computer-aided drug
design was done in industry, beginning with the Compass and Hammerhead
techniques (see papers from 1994-1997). Compass involved a new
representational scheme for capturing the 3D surface-properties of
small molecules that made it possible to systematically address a
previously unaddressed aspect in modeling the activity of small
molecules: choice of the relative alignment and conformation (or pose)
of competitive ligands including the detailed relationship of their
hydrophobic shapes. A key insight, made with colleagues, was that the
choice of pose should be directly governed by the function being used
to predict binding affinity (essentially a direct analogy to physics
where the lowest energy state is sought). The difficulty was that the
function to predict activity was being induced at the same time as the
pose choice. The Compass method overcame this problem, and was one of
the foundational methods in establishing the field of
multiple-instance learning, as it has come to be known within the
Computer Science community. This work lead to the development of one
of the first molecular docking programs described that addressed
ligand conformational flexibility. The Hammerhead docking system built
upon the molecular representations, multiple-instance approach, and
search strategy developed for Compass.

Advances in Molecular Docking and Ligand-Based Modeling
Subsequent work built on the foundation laid by Compass and
Hammerhead. These methods addressed problems in computation of
molecular diversity and prediction of ADME properties (see papers from
1998-2000). Our most recent work in computational drug design (see the
Surflex methodological papers from 2003 onward) is focused on pushing
the frontiers of molecular docking and in constructing ligand-based
models of protein active sites in cases where protein structure is
unknown. The Surflex docking approach is unique, both with respect to
scoring function and search methodology. Surflex-Dock is competitive
with the best and most widely available methods in terms of docking
accuracy and screening utility on publicly available benchmarks. We
have recently made a substantial innovation to the multiple-instance
parameter estimation process by generalizing our approach to now
include negative training data. Putative inactive molecules have been
added to a set of known active molecules in re-estimation of the
scoring function for the Surflex docking method. We have continued our
work in ligand-based modeling as well. The Surflex similarity method
has been augmented, both in search strategy and in its objective
function, to support the construction of ligand-based models of
protein activity. The models are competitive with the best docking
methods in terms of effectiveness in identifying novel ligands,
generalizing remarkably well even across different chemical
scaffolds. The Surflex QMOD approach takes QSAR to a new level, by
transforming the problem into one of molecular docking. A protein
binding site is induced given structure-activity data using the
multiple-instance machine learning paradigm developed for Compass.

Rational and Predictive Pharmacology
Research within the lab has branched out to encompass larger
biological scales, with studies that contemplate the multiple effects
of small molecules in the whole organism. Our earliest work in drug
discovery focused exclusively on the therapeutically desired
target. At least as important are off-targets: those that are not
intended to be modulated by a therapeutic but are affected at relevant
drug concentrations. We are interested in building accurate predictive
models for promiscuous bad-actor targets such as hERG and
cytochrome-p450 enzymes. More broadly, we are interested in building
models for multitudes of human targets in order to help guide the
design and selection of compounds during preclinical research. This is
challenging, both in terms of the stringency on model accuracy and
also in terms of information curation regarding the multiple effects
of existing therapeutics and those that have undergone clinical
testing.

Wet Applications
The laboratory is purely a dry one. We rely upon our collaborators to
test predictions made by our computational tools. In addition to the
hundreds of laboratories that make use of our software, we have active
collaborations with both academic and industrial partners. We are
particularly interested in applications involving cancer.