Ligand Guided Modeling of Binding Pocket Conformation

For a structural chemist designing a small molecule modulator of a protein, there are two ideal environments. In the first, all receptor conformations and receptor/ligand complexes would have been crystallized, and in the second, reliable prediction of ligand/receptor structure would be possible. Unfortunately, the reality is far from both situations. Only a handful of GPCRs have been crystallized, and this limited amount of structural data must be extrapolated to describe ligand/receptor interactions for thousands of potential complexes. Ligand docking is highly sensitive to receptor conformation, requiring great accuracy in the predicted structural models. A benchmark study for a large test set of enzyme/inhibitor complexes found that conformational deviations greater than 1.5 A in the protein-active site precluded correct inhibitor docking [28, 29]. As existing GPCR structures for distinct receptor types exhibit an approximately 2.0 A root mean standard deviation (RMSD) within the helical TM domains (see Section 15.3.2), starting GPCR homology models will often be too distorted for meaningful ligand docking. Additionally, alternate binding modes for the same protein with other ligand types can be difficult to infer from an existing structure. For example, agonist/P2AR interactions cannot be properly described by the existing antagonist-bound structures. A protocol for refining and adapting the receptor binding pocket conformation for the recognition of diverse ligand types is therefore needed.

The Preconditions Ligand-guided modeling (LGM) of the receptor pocket conformation requires the following components:

1. One or several approximate starting protein models (M0). These models may be built by homology, collected from crystallographic or NMR structures, derived from an alternative functional state of the same protein (e.g., an agonist-bound model derived from an antagonist-bound conformation), or all of the above.

2. A set of validated ligands for a particular functional state of the protein of interest. Let us call this set L, and divide it into a subset of high-affinity "seed" ligands (Lseed) and the rest of the set (Ltest). At least one potent and validated ligand is required in Lseed.

3. A larger set of closely related nonbinders (designated N) to the same functional state of the same protein. This set may contain ligands for other functional states of the same protein or ligands for related proteins. It is acceptable if a small fraction of molecules in the negative set (say, 5-10%) have not been fully annotated and are in fact uncharacterized binders.

4. One or several restraints specifying receptor atoms or residues that must interact with ligand atoms. Specific interatomic interactions between the best reference binders (Lseed) and protein pocket atoms are the most desirable, such as the well-characterized hydrogen bonding interaction between Asp 3.32 and the positively charged ligand amine for biogenic amine receptors. Such restraints permit more reliable and accurate positioning of the ligand but are not strictly mandatory.

The Procedure A general outline of the LGM approach is shown in Fig. 15.3 . It consists of three steps: model generation (G) by conformational optimization with seed ligand(s) (Lseed), an optional model compression step (C), and model selection (S) based upon binder (Ltest) versus nonbinder (N) discrimination. Steps G and S provide two different ways to incorporate ligand guidance in the procedure. It is important, however, to point out that either step may be omitted. Multiple models (M0) may be generated without seed ligand, or taken from other sources (experimental or computational) and fed directly into step S [30]. Conversely, if only a single ligand is available (e.g., for a new protein with only one characterized ligand or substrate), step S can be skipped, though L versus N discrimination should preferably be assessed using the single known ligand.

Model Generation (G) During step G, receptor conformation is sampled in the presence of a known high - affinity ligand (Lseed). The addition of ligand prevents collapse of the binding pocket and allows the receptor side chains to form stabilizing interactions with the ligand functional groups. Alternatively, the receptor may be sampled in the presence of a "blob" of repulsive density placed within the ligand binding pocket - 31] . This technique also prevents pocket collapse and permits reshaping of the pocket. The receptor-flexible docking of ligands was first described in Reference 32 and was based upon

One LGM of the agonist-bound ffeAR binding pocket with 1-A shift of TM5

Figure 15.3 The LGM protocol. A general schematic is shown on the left side of the figure, while the right side describes the specific protocol followed in generating agonist-bound p2AR models for VLS. Experimental data that may be gathered from the literature or outside sources are indicated with small book icons, and computational steps are marked with computers.

previous work with flexible ligand docking in internal coordinates [33, 34] . Cavasotto and Abagyan later applied this procedure to flexibly model kinase/ ligand complexes and demonstrated that VLS with a training set can be used as a selection criterion [35].

When generating binding pocket conformations for Class A GPCRs, the following parameters should be considered:

1. Treatment of extracellular loops 2 and 3. These loops may be deleted, conformationally sampled, or modeled after ligand placement (for further discussion, see Section 15.3.3).

3. Selection of side chains for sampling. While the region of sampled side chains should be limited, convergence occurs quickly if internal coordinate mechanics is used for simulation.

4. Choice of distance or positional restraints included during the flexible receptor/flexible ligand docking. If experimental data do not support any specific interatomic restraints, simple nonspecific volume restraints can enforce ligand docking within a known binding pocket.

Compressing the Set of Models (C) The compression step is a straightforward one. Two criteria are applied to reduce the number of conformers: (1) geometrical proximity (typically an RMSD cutoff for pocket heavy atoms ranging between 0.2 and 1.5 A), and (2) the energy of the conformer, receptor/ ligand complex, or a binding energy estimate. The first metric ensures structural diversity, while the second eliminates physically unrealistic or strained conformers. Depending upon the cutoff parameters, compression may reduce the number of conformers dramatically.

Selection by VLS (S) The idea to use VLS enrichment as a selection criterion came from the refinement of weights for the docking score energy terms [ 36] . In that approach, VLS enrichment was calculated for 25 sets of true ligands (L) and nonbinders (N), and the weights providing the best discrimination were selected. While performing the calculations, we realized that different models of the same protein may differ dramatically in their ability to distinguish between binders and nonbinders. The capacity of a given model to distinguish true ligands from nonbinders is a very sensitive criterion, particularly when the true and decoy molecules are similar in size and chemical nature.

During the selection step (S), all molecules from sets L and N are docked and scored with each nonredundant model form. Following docking, standard measures of VLS performance are calculated and used to select a model.

Commonly used measures include the median rank of the true positives, the hit rate, or the "area-under-the-curve" (AUC). In this case, the "curve" is a plot of the "number of true positives" versus the number of top-scoring molecules.

The outcome of the LGM procedure can be any of three possibilities:

1. Multiple conformers are highly selective. There is no requirement that a single conformer is selected. This outcome is particularly interesting if different conformers are selective with respect to different ligands in L.

2. A single highly selective conformer is identified.

3. No conformers impart acceptable selectivity.

The first outcome leads to a multiple receptor conformation (MRC) set for future VLS, the second possibility to a traditional single conformer VLS, and the last case requires reconsidering the model generation steps.

Applications The first practical application of LGM was reported in 2007 [37]. In this study, LGM was used to generate antagonist-bound models of the androgen receptor (AR) from an agonist- bound crystal structure. Starting from an agonist-bound conformation, and two known antagonists (Lseed), thousands of hypothetical antagonist- bound pocket conformers were generated. Each conformer was then tested for the ability to discriminate known AR antagonists from nonbinders in a panel of 88 nuclear receptor ligands. The two AR conformations providing the best enrichment were then employed for VLS of a marketed drugs database. Interestingly, three antipsychotic drugs were identified as "hits" and, subsequently, were experimentally confirmed to exhibit antiandrogenic activity. These nonsteroidal molecules were rationally repurposed to improve AR antagonism and reduce affinity for the dopami-nergic and sterotonergic receptors.

LGM has now been applied to several GPCRs. A bRho-based homology model was constructed for melanin-concentrating hormone receptor (MCHR), a target for obesity therapeutics [38]. Sampling of the receptor side chains and ligand conformation for four distinct antagonist/receptor complexes yielded 800 ligand binding pocket conformers. Following clustering, three final models were selected for optimal VLS enrichment on a small test set of ligands. The best models were used to screen a larger compound library; experimental characterization of 129 predicted "hits" identified six novel compounds with low micromolar affinity. In this case, even a relatively distorted initial homol-ogy model was sufficient to yield productive conformers. Additionally, no experimental data were available to aid ligand placement, demonstrating that unrestrained LGM can enable the discovery of novel ligand chemotypes, even for receptors with limited experimental data. A similar ICM-based protocol was applied with comparable success to the human muscarinic M2 receptor [39]. Initial homology models were constructed with bRho, and the receptor side chains were sampled with several known antagonists to generate a col lection of conformers. More recent work has tested p2AR-based M2 receptor homology models as starting points [39]. Most recently, LGM was assessed in a set of blind predictions for the AA2aR structure [40] . An initial homology model of AA-aR was constructed using p2AR as a template. Roughly 400 additional AA2aR conformers were generated from this model using a combination of heavy atom elastic network normal modes analysis, and flexible sampling of the receptor side chains and docked ligand. These conformers were evaluated for ligand/nonbinder discrimination, and the best models were iteratively refined through the LGM process. Remarkably, LGM yielded an AA2aR model that correctly recapitulates more than 40% of the ligand/recep-tor contacts [41] . The LGM model predicted the largest fraction of correct ligand-receptor contacts of those models evaluated in the blind assessment [40]. Further, this model attains an enrichment factor of 29 for the top 1% of compounds in comparison to an enrichment factor of 20 for the AA2aR crystal-lographic coordinates for VLS on a large test set of 14,000 GPCR ligands.