This blog is meant to allow Fragment-based Drug Design Practitioners to get together and discuss NON-CONFIDENTIAL issues regarding fragments.

12 May 2014

In defense of ligand efficiency – and poll!

Last year we highlighted a provocative article from Michael
Shultz in which he took aim at the concept of ligand efficiency (LE). As we
noted at the time, he raised some good points, and I am the first to argue that
there is value in questioning widespread assumptions.

However, in addition to questioning the utility of LE, Shultz also questioned its mathematical validity. He repeated the attack earlier
this year by asserting that ligand efficiency was a “mathematical
impossibility.”

This is incorrect.

To set the record straight, Chris Murray (Astex), Andrew
Hopkins (University of Dundee), György Keserü (Hungarian Academy of Sciences),
Paul Leeson (GlaxoSmithKline), David Rees (Astex), Charles Reynolds (Gfree Bio),
Nicola Richmond (GlaxoSmithKline) and I have written a response just published
online in ACS Med. Chem. Lett. demonstrating
that ligand efficiency is mathematically valid.

One of the criticisms of LE is that it is more sensitive to
changes in small molecules (such as fragments) than in larger molecules.
However, this is a property of any ratio, and we show that the same behavior
applies to more familiar examples such as fuel efficiency: a few blocks of
stop-and-go traffic has more of an effect on the overall fuel efficiency of a
short trip than a long trip.

Of course, that’s not to say that ligand efficiency and
other metrics are perfect or universally applicable; we discuss a number of
situations where they may be more or less useful.

In this spirit, Practical
Fragments is revisiting a poll from 2011 to see what metrics you use –
please vote on the right-hand side of the page, and share your thoughts here.
Note that you can vote for multiple metrics, and please check the last box
(Polldaddy does not tally individual responses, so this box will track total
number of voters to allow us to calculate percentage of respondents who use a
given metric).

24 comments:

I would like to reprise my comment to the previous post on this: For me, and I am sure I have said this before, LE is not meant to represent reality, which is what I think most metrics are trying to do. LE, OTOH, is a useful guide to help you decide if you are making smart, efficient use of chemistry space, rather than just glomming stuff on.

It's a rubric (and I prefer the LEAN as LE) and if you understand the failings of your metric, have at it. It's not reality, nor should it be.

(3x4)/2 is 6. The fact that the 4 might have been calculated as log(10000) or as (2x2) is neither here nor there. You can divide a number derived from a logarithm by another number.

If the deltaG of binding between protein and ligand is 20kJ/mol, what is half the binding energy? We'll never know; we cannot divide it by 2 as that would be mathematically invalid?

LE is an arbitrary number calculated from deltaG and the number of atoms - no more and no less.

The confusion arises because we view deltaG through the lens of biological assays and experimental IC50. The relationship between IC50 and deltaG is not linear so the relationship between LE and IC50 won't be. It's not wrong, it's just something you have to be aware of.

If all compound affinities were measured with ITC and reported as deltaG's, none of this argument would arise. We'd have a lot of very slow projects though.

Both sides have completely missed the most important point in this debate and it’s unfortunate that the argument has been framed in terms of mathematical validity (which in this case is a complete red herring). Please don’t worry because help is at hand. Think of some points that lie on a straight line when ΔG is plotted against number of heavy atoms (HAC). Do you agree the compounds represented by these points have equal ligand efficiency? Now calculate the ligand efficiencies. Are they all the same value? That depends on whether the line passes through the origin and the problem is that the origin is not really an origin. If the line does not pass through the origin then LE will necessarily show a size dependency. When Mike Schultz asserts, “To be valid, LE must remain constant for each heavy atom that changes potency 10-fold” he is stating in a roundabout way that the line must pass through the origin in order that a linear response to HA translate to constant LE. Personally I would only describe a metric as mathematically invalid if it wasn't possible to calculate it. However, I would still question the value of a metric that considered points on straight line plot of ΔG against HAC to represent compounds of different ligand efficiency.

The Achilles heel of LE is the (arbitrary) assumption that the zero molecular size limit for IC50 and Kd is 1 M. Are you happy to make this assumption? If so then why? Two issues that are usually (always?) overlooked by LE ‘experts’ are units and standard states. If the ‘experts’ had paid more attention to these issues early on, this debate wouldn’t be happening and people wouldn’t be working themselves into such a lather about the best way to ‘correct’ LE for the effects of molecular size. We discussed these issues last year ( http://dx.doi.org/10.1007/s10822-013-9655-5 see third paragraph from end) and even suggested a solution to the problem. Unfortunately neither side in the debate appears to have considered what we had to say.

One delicious piece of irony is that the first equation in the critique of Mike’s study (it’s a couple lines from the top of the section entitled ‘MATHEMATICAL VALIDITY’ and you can’t miss it) is itself mathematically invalid. Is it heresy to state this? Who will be first to kindle the auto-da-fé? Here’s the equation:

LE = (-2.303RT/HAC).logKd

and to convince yourselves of its mathematical invalidity, I’ll invite you compute logarithms to base 10 of the following quantities:

"One issue that must be addressed when using lipophilicity in modeling is whether logP or logD is more relevant to phenomena of interest. Typically, logP will be more relevant when compounds bind to their targets (and anti-targets) in their ionized forms while logD is more likely to be the measure of choice when the concentration of neutral form is a limiting factor as would normally be the case for aqueous solubility and passive permeation through membranes."

Unfortunately, even monotonicity cannot be assumed. If you want to use standard free energy of binding you do need to specify a concentration to define the standard state. The choice of this concentration is entirely arbitrary and different values of the standard concentration will in general lead to different rankings of compounds when standard free energy differences are transformed to ligand efficiency. If one’s thermodynamic view changes with the concentration used to define the standard state then one is practising voodoo thermodynamics rather than the more traditional variety. Put another way, should our perceptions of a system change with the units in which we express the quantities that describe the system?

Recently, I published a paper describing the discovery of novel BRD4 bromodomain inhibitors by in silico fragment-based approach, and most interestingly using computed LE to prioritise candidate compounds for experimental validation. It turns out that use of computed LE in discerning active from inactive is quite efficient based on not only this single application, but also my five previous high-throught virtual screening. Here is the link for the paper http://www.sciencedirect.com/science/article/pii/S0960894X14003539. Please kindly not that two complex crystal structures (PDB code 4PCI and 4PCE) were solved upon acceptance of the paper.

I would expect LE to distinguish actives from inactives since LE is defined in terms of activity. I can't see the article so I'm probably missing something. Perhaps you could summarise how you used LE in your study?

The way I look at ligand efficiency is nicely summarized by a famous quote by George E. B. Box:

All models are wrong, but some are useful.

As Teddy suggested, LE doesn't necessarily have to map onto some Platonic reality to be useful. Hongtao brings up one example, and we've highlighted a survey of how docking is more successful at finding high-LE compounds. One example (among many) of using LE in the context of a lead optimization program is shown here.

As to Pete's comments, I think we can all agree to use standard state and move on. Just because we live in a relativistic universe doesn't mean we can't usefully talk about times and distances!

Not dividing Kd by the standard concentration before calculating the logarithm is a relevant error and, given the aim of the article is “to correct these mathematical statements and prevent them from propagating through the literature” I thought that the very least that I could do was flag it up. However, the omission of the standard concentration is much less of an issue than the arbitrary choice of 1M as the standard concentration. Suppose I say that we should use 1 mM as a standard concentration to define LE and you say we should stick with 1 M? How do we decide whose LE definition is ‘better’ or more ‘useful’? The rankings of compounds will not in general be the same with the two metrics and if we want to claim a scientific basis for LE then we need to be able to address these questions. The idea of using different values of concentration to define LE cropped up with the definition of LE for antibiotics ( http://practicalfragments.blogspot.com/2009/01/ligand-efficiency-for-antibiotics.html ) and it’s a bit strange to look back at the comments I made then. This was actually the article that got me thinking (although not immediately) about the problems with LE and I wanted to kick myself for not seeing it sooner. I had worried from the start about why we scale by HA and not HA**2 or sqrt(HA) but it took a while for the real issue to become clear. The urge to kick myself was particularly strong because I had been a fan of Mike Gilson (who has been stressing the arbitrary nature of the standard concentration for the last 20 years or so) long before the introduction of LE. I also knew about the problem of setting zero points (Mike Abraham was writing about this 20+ years ago) because of my interest in hydrogen bonding so there really was no excuse in my case. Unfortunately (or fortunately), I lacked the agility to kick myself.

However, there is a much bigger issue than missing units. I believe that we should use the trend actually observed in the data in order to normalize activity with respect to molecular size. This is what we were getting at when we suggested JCAMD (2013) 27:389-402 ( http://dx.doi.org/10.1007/s10822-013-9655-5 ) modeling data and using the residuals to quantify the extent to which the activities of compounds beat trends. If line of fit actually goes through the origin then LE can be regarded as validated (for the compounds in question) but using the residuals means no worries if the line of fit intersects the activity axis elsewhere. LipE/LLE is not without problems in this regard although the issue for offset efficiency metrics is that the assumption is made about slope rather than intercept.

As for moving on, there are some things that we might consider to leaving behind. Size dependency of LE looks a bit shaky as a concept and there are questions (e.g. choice of zero molecular size limit for activity) that could be asked of studies of maximal affinity of ligands such as PNAS (1999) 96:9997-10002 ( http://dx.doi.org/10.1073/pnas.96.18.9997 ) and JCIM (2012) 52:2098-2106 ( http://dx.doi.org/10.1021/ci200612f , featured in http://practicalfragments.blogspot.com/2014/02/pushing-limit.html )

In case folk can't see the JCAMD article to which I referred in the previous (and earlier) comments, here is the relevant paragraph (pasted from the manuscript file). Not the use of the terms 'scale' and 'offset' in the context of ligand efficiency metrics.

The situation is different in lead identification where lipophilicity measures are used to prioritize compounds and structural series for hit-to-lead chemistry and further optimization. The difference, ΔlogP, between logPoct and logPalk can be considered as a measure of the hydrogen bonding capacity of a compound and, outside structural series, cut off values cannot simply be shifted by a constant to account for differences between partitioning systems. Lipophilicity is also used to create efficiency metrics [67] (e.g. pIC50 - logP) which can be used to compare compounds and structural series. Offsetting potency or affinity by lipophilicity in this manner has the effect (at least for neutral compounds) of shifting the reference state for the binding equilibrium from the aqueous to an organic phase. Affinity can also be scaled by molecular size and the original measure of ligand efficiency [68] was obtained by dividing the standard Gibbs free energy of binding (ΔG°) by number of non-hydrogen atoms. Reference states also need to be considered carefully when affinity is scaled because relative values of ligand efficiency for compounds differing in molecular size depend on the standard concentration used to define ΔG° [69]. Whether one scales or offsets affinity or potency, one is implicitly assuming that the relationship with the relevant physicochemical or molecular property is linear. When biological activity is offset by lipophilicity one assumes a unit slope in the linear relationship while scaling by molecular size implies an assumption that Kd and IC50 values in all assays will tend to the a single concentration (usually 1M) in the limit of zero molecular size. An alternative to using efficiency metrics for evaluating compounds that differ in their activity and physicochemical characteristics is to fit affinity or potency to measures of lipophilicity and/or molecular size and use the residuals to quantify the extent to which compounds beat (or are beaten by) the underlying trend in the data. One advantage of analyzing measured biological activity in this manner is that the results are invariant with respect to standard concentration. This is not the case when ligand efficiency is ‘corrected’ for molecular size [70].

As I said in my original comment, I consider the mathematical validity issue to be a complete red herring and I continue to hold this view. However, before continuing I do need to correct your assertion:

As equations, both of these are mathematically. The first is valid for all values of x and it defines the tangent function. The second is a statement that x is either 45 degrees or 225 degrees. The equations have different meanings but are both mathematically valid.

In a nutshell, the invalidity of LE stems from the fact that differences/ratios between LE values are not invariant with respect to the concentration chosen to define the standard state. This means that rankings of compounds using LE vary with the standard concentration. When a view of a chemical system changes with the concentration chosen to define the standard state, I believe that it would be correct to term that view as thermodynamically, chemically or physically invalid. My preferred term, however, is voodoo thermodynamics.

I don’t know why you thought that we were returning to the mathematical validity theme. As I stated at the outset, I consider it to be a complete red herring and for the record I believe the expression -ΔG/HA to be mathematically valid. At the same time, you need to concede that there is an element of farce when an argument in support of the mathematical validity of LE is presented using a definition of LE that is itself mathematically invalid. Especially when the stated mission is “to correct these mathematical statements and prevent them from propagating through the literature”.

The bigger question concerns how effectively ligand efficiency metrics normalize activity data. When we normalize activity data we are trying to subtract the effect of the risk factor (e.g. HA; ClogP) from the activity. To do this we make assumptions about the underlying relationship between activity and risk factor such as zero intercept (LE) and unit slope (LipE). If our assumptions about the underlying relationship are incorrect the normalization will introduce bias (think what happens when you force a straight line fit to data through the origin when the data says you should be doing otherwise). What we were saying in our paper ( JCAMD 2013 27:389-402 http://dx.doi.org/10.1007/s10822-013-9655-5 ) is that it would be better to normalize activity using the trend actually observed in the data. As I mentioned to Dan in an earlier comment, if the trend in the activity data is a straight line passing through the origin then I’d be happy that LE would normalize the activity properly. However, one can’t simply assume a priori that this is indeed the case and that’s why we need to model the data to establish the underlying trend.

A challenge that can be made to the ligand efficiency framework is to question whether the values of intercept (0 for LE) and slope (1 for LipE) are optimal. For example, I might ask, “would pIC50 – 0.7ClogP not be a better metric than pIC50 – ClogP”? This challenge is analogous to that made of the solubility forecast index in our correlation inflation article ( JCAMD 2013 27:1-13 http://dx.doi.org/10.1007/s10822-012-9631-5 ). These are not easy questions to answer and ‘validation’ of ligand efficiency metrics often consists of a mix of arm-waving, pointing at pictures and asserting usefulness. In the absence of quantitative validation criteria, I believe that modelling the response of activity to risk factor(s) represents a less biased approach to normalizing activity data.

I had thought you were returning to mathematical invalidity with the delicious irony. I don't see what is mathematically invalid about this one either.

I will certainly agree that LE has its dangers and you need to be careful with it. The alternative treatments you describe sound intriguing and a potential step forward but I've not yet understood them fully and will try to do that.

The problem with the equation LE = (-2.303RT/HAC).logKd is that one cannot calculate a logarithm of a quantity with units. The error itself is not too much of a concern for me (although I presume the authors won’t need to be prompted to publish an erratum) and, in any case, the original definition of LE uses ΔG rather than a formula for it.

The problems with ligand efficiency metrics are much more to do with physical chemistry than mathematics. By casting his critique in a framework of mathematical validity, Mike Schultz launched an equivalent of the charge of the Light Brigade (“C'est magnifique, mais ce n'est pas la guerre”) that was as doomed as the original. However, his criticism is implicitly connected to the issues with standard states and intercepts with the activity axis. He is also prepared to criticize PNAS (1999) 96:9997-10002 ( http://dx.doi.org/10.1073/pnas.96.18.9997 ) which may not be as seminal as is often asserted. The problem with this article and the related JCIM (2012) 52:2098-2106 ( http://dx.doi.org/10.1021/ci200612f ) is that lines are drawn through what is effectively an arbitrary point on the activity axis. One question that one needs to ask when presented with analyses like these is where do we think the lines would intersect the activity axis if we used the data to position them without arbitrary assumptions about intercepts.

It is a somewhat distant memory now, but I thought Kd was dimensionless.

Returning to the wider point, I take the view that "invalid" is a very harsh term and and even from a physical point of view refers to an equation that ends up saying charge = distance or something like that. I cannot see that LE falls into that category. To me it comes down to usefulness and whether having it was better than not having it and, to me, it is. I very much like the idea of the deeper treatment.

You raise some interesting points, and I think we all agree that LE is a simple model that breaks down under extreme conditions, such as when the number of heavy atoms approaches zero. Although you argue that the mathematical validity statement is a red herring, I think we all agree that it is also factually incorrect, and thus has the potential to confuse researchers who don't dive deeply into the issue.

The first word of this blog's title is "Practical", so for me the most important question remains: is ligand efficiency useful? In my experience LE can be a helpful tool, for example to convince people that a low-affinity fragment may still be a good starting point, or that the modest affinity boost gained by adding a phenyl group may not justify the added mass. Like any tool, LE needs to be used judiciously, but just because it has shortcomings doesn't mean it should be abandoned.

If it has units, it has dimensions and the chaps who are into the size dependency of LE really do need to make very sure that they understand the relevance of this. There’s probably an article entitled ‘Size dependent ligand efficiency in pursuit of a standard state’ waiting to be written although I have no intention of writing it (life really is too short).I don’t really find the term ‘invalid’ particularly useful in the context in which it was used in the article (or the articles that provoked the response). My issue with LE (and LLE/LipE) is the arbitrary nature of the assumptions made when defining the metrics. Fitting a straight line to a plot of deltaG against HA has the effect of allowing the data to ‘locate’ the most relevant standard state and you can think of the procedure as representing a generalized ligand efficiency.

My other criticism of LE/LipE/LLE paradigm is that validation is extremely thin on the ground. In the metrics business, usefulness is the last refuge of the scoundrel. This may be a bit harsh but the purpose of these metrics is to normalize activity data and potential users of them want to know how effective the normalization is. The definitions use parameters (intercept = 0 for LE, slope = 1 for LipE/LLE) and a potential user of the metrics should be asking whether the parameters have been optimized.

If I were tracking molecular size in the course of a project I would still not use LE. Instead, I would fit pIC50 to HA or MW and use the residual as a measure of how much each compound beats the trend. I also think about effects of structural changes in terms of matched molecular pairs (group efficiency is part of the MMP framework and does not suffer from LE’s standard state problem). MMP analysis can be useful to get an idea just how large the effect of a particular structural modification can be, especially when one access to a lot of potency data.

The other advantage of modelling the activity data is that the influence of molecular size and lipophilicity can be explored simultaneously in a single framework. Currently, we scale (by molecular size (LE) but offset by lipophilicity (LipE/LLE) without really saying why. LELP is the exception although I regard that particular metric as straight from the pages of a Mary Shelley novel. All I’m suggesting is that it’d be better to normalize activity data using the trend observed in the data rather than what we think the trend should be.

No. A concentration of 1M is used by convention to specify the standard state but Kd still has dimensions of concentration and the choice of 1M is still arbitrary (i.e. has no physical basis). The equation deltaG = RTln(Kd/StdConc) holds for whatever concentration you choose to adopt to define the standard state. In contrast, the ‘p’ operator has 1M built into it and I prefer to talk about a reference concentration. One can define the ‘p’ operator as taking the log of a concentration that is expressed in molar units although I prefer to just write pIC50 = -log10(IC50/M) since it’s clearer and more concise.

There’s no problem comparing different deltaG values when the same standard concentration is used and differences between deltaG values are invariant with respect to changing (as we are allowed to do in thermodynamics) the standard concentration. However ratios are not and that’s where the trouble starts (be especially wary of statements like ‘70% of the binding energy’ or ‘the contribution of enthalpy to binding is double that of entropy’). This is essentially what Mike Schultz is getting at and, although I think he scored an own goal by charging down the mathematical validity path, he is still on target in an implicit sort of way