Phase 2a proof-of-clinical-concept trials: the single most important determinant of R&D efficiency in drug development

The future of the pharmaceutical industry depends on improving our approach to the early Phase 2 trials. After all, three of the most important decisions relating to a drug product candidate are founded on the data emerging from Phase 2a trials: do we believe it will work in pivotal trials with regulatory end-points that will likely cost hundreds of millions of dollars? What should be the design of those pivotal studies? And perhaps most critically, will this drug likely offer a sufficiently differentiated profile to be a commercial, as well as regulatory, success?

A product candidate clearly needs positive answers to all these questions to progress.

Actually, a product candidate ought to need these three ticks to progress. The recent slew of high profile phase 3 failures, with Lilly in the vanguard, suggest that at least in some quarters this critical gatekeeper function is not being efficiently applied.

Phase 2a clinical trial design is the single biggest contributor to declining R&D efficiency in the pharmaceutical industry

Of course, its not that straight-forward. Decision making in early Phase 2 is a tricky balancing act between discarding things that may be valuable versus progressing things that may fail. But this knife-edge decision only makes the choice of the right design for Phase 2a studies all the more critical. When hard data is severely limited, and is being used to predict well beyond its boundaries, the quality of that data is absolutely paramount.

Doing better than at present requires a culture change: early Phase 2 trials need to be “safe experiments on humans” and not “small scale Phase 3 trials”. DrugBaron explains how such a change in thinking will pay dividends (perhaps literally as well as figuratively) by improving the quality of decision making at this critical juncture in the journey from benchtop to bedside.

Obtaining evidence of safety and efficacy is an incremental process, where both certainty and cost go up along the route. In theory at least, certainty should rise quickly at first (putting relatively little cash at the highest risk). It is moving from ‘almost sure’ to ‘certain’ that costs the bulk of the development dollars, but should in theory expose that investment to relatively little risk.

That’s the theory. And in some indications it is also the practice: early stage trials are highly predictive of last stage outcomes and the theoretical paradigm plays out well in the real world. But in other areas, the early studies induce only false confidence and the expensive late stage trials are therefore little more than a bare-faced gamble at very poor odds.

The most recent example was the failure of the anti-amyloid antibodies bapineuzumab (from Pfizer and JnJ) and solineuzumb (from Lilly) in massive Phase 3 programmes in Alzheimer’s Disease (AD). As DrugBaron already noted, entrenched hypothesis (that amyloid deposition causes AD) trumped weak Phase 2 data. Part of the problem, then, lies in overly strong hypotheses that flourish where definitive data is very hard to obtain (a phenomenon that DrugBaron called an “ideas bubble”). But part of the problem also lies in poor design of early studies and a fragile appreciation of the limitations of such studies.

Similar problems led to the failure of CETP inhibitors in cardiovascular disease. Believing that biomarker-based early stage studies de-risked the programme to a much greater extent than they actually did, several companies not only ploughed into huge Phase 3 studies, but repeated the mistake when the first studies failed. So strong was the belief in the flawed Phase 2 data that a second round of trials was initiated after the first failures were blamed on a relatively trivial side-effect of the most advanced compound.

It is interesting, and not a little scary, to draw parallels with the “party line” on anti-amyloid product candidates. Having failed entirely to improve cognition end-points in patients with established AD the argument goes that, since we believe so strongly that amyloid causes AD (despite the lack of proof for that hypothesis) the failures must have resulted from treating established disease. Treating earlier, in patients with mild cognitive impairment (MCI), so this argument goes, will presumably reveal the expected efficacy for this mechanism of action. To support this argument, shaky evidence from sub-group analyses of the failed Phase 3 studies is trotted out. For those already addicted to the amyloid hypothesis, such evidence seems strong.

There is more at stake than just the amyloid hypothesis (and the dollars of investors in Pfizer, JnJ and Lilly who are pursuing these gambles). Investors reacted to the positive-sounding noises Lilly made following the failure of solineuzumab. Remarkably, for a Phase 3 failure of such proportions, the share price rose presumably on the assumption of jam tomorrow – when the next round of Phase 3 trials in MCI patients elevates solineuzumab to super-blockbuster status. When, as it surely will, this proves to be another mirage, investor confidence in the scientific analysis of pharma and academia will take a big hit.

DrugBaron, then, would be selling Lilly hard – had he ever believed in the programme sufficiently to have bought in the first place

The problem, with both CETP inhibitors and anti-amyloids, then, lay in the design of the early phase 2 trials that were meant to de-risk these programmes at modest cost – an objective in which they demonstrably failed completely.

And the failure in both cases was due to over-reliance on limited biomarkers. In cardiovascular disease, the Phase 2 biomarker of choice was HDL. In Alzheimer’s Disease, it was amyloid deposition. The companies developing these products assumed, without sufficient evidence, that demonstration of an effect on the biomarker would be predictive of an effect on the clinical end-points required for registration. They were wrong.

A Phase 2a study is like a joke mirror in a fun-house – its purpose is to warp time and space. By using predictive biomarkers the aim is to guess how the drug will affect the regulatory end-point. In almost every case, its not possible to use the clinical end-points required for approval in the first efficacy study – simply because changes either take too long or the variability between patients is so high that a large number of subjects are required to see a statistically significant effect.

By adopting a reliable biomarker, it should be possible to learn in 8 weeks and 100 patients what might require 12 months and 1000 patients to demonstrate “for real”. And where such a reliable biomarker or surrogate end-point exists, everything (mostly) goes fine. Success in the early Phase 2 studies translates, with reasonable likelihood, into efficacy in the pivotal Phase 3 studies (and safety is then usually the issue rather than lack of efficacy). In asthma, rheumatoid arthritis and many other diseases, the design of the Phase 2 is easy. Its just a mini-Phase 3 with a surrogate end-point.

But for many of the diseases with the largest unmet medical need, things are very different – there is no such reliable surrogate. That’s exactly why clinical development is much harder in cardiovascular disease or neurodegeneration than it is in respiratory disease or autoimmunity. That’s exactly why there is such extreme “asset favoritism” in the pharmaceutical industry (with hundreds of products in development in RA, against a handful in sepsis – another area of high unmet medical need where there is no reliable surrogate to predict effects on the regulatory end-point: 28-day all cause mortality).

What is the answer? Are we stuck in the undesirable scenario where the only way to find new medicines in certain indications with vast unmet medical need is to perform massive Phase 3 studies effectively blindfolded? Clearly that is never going to be viable (even though, in effect, it is what the industry has been doing for the last two decades).

But if you are developing a first-in-class medicine, do you really know the right patient population to treat when you design the first Phase 2a study?

Fortunately, there is another solution: stop treating Phase 2a studies as “mini Phase 3” trials. Too often, a surrogate end-point is selected and the trial design is centred around a statistically significant change in that primary end-point (just exactly as a Phase 3 trial is designed only to demonstrate, beyond doubt, that the single primary end-point has been modulated). The dominant concern is to “protect” the statistical power of the trial by focusing exclusively on the primary end-point in the a priori analysis.

It is entirely right and proper that such a “statistical gold standard” applies in pivotal trials. It is the only way to be sufficiently certain of the benefit of a product candidate. To lower that bar would risk public health.

But it is entirely wrong to take that approach in Phase 2.

Statistics is great at determining whether the single question posed by the trial has been robustly answered. But it does nothing at all to help determine if the right question has been asked in the first place.

So if you know you have a predictive surrogate end-point, and you know you have the right patient population, then all you need to do is demonstrate you had the desired effect and you have de-risked the subsequent Phase 3 trial. A “mini Phase 3” trial design, using the surrogate end-point, is fit for purpose.

But the number of indications where there is such a perfectly reliable surrogate end-point is small. For the majority, the available surrogates are imperfect. And in some of the largest indications, the surrogates that have been used have been proven to be plain useless rather than simply “imperfect”.

The more imperfect the surrogate the less the value of the “mini Phase 3” type of design. Instead, you need to know “did I ask the right question?” as well as “did the drug have the intended effect?” And the only way to address the former is to measure lots more things. The weight of biological evidence then provides comfort (or not) that the effect of the drug in patients was as desired.

In effect, such a study design trades off formal statistical power for a much wider appreciation of the effects of the drug on the whole organism. It is very much the right choice to make such a trade. Instead of looking with absolute precision through a pin-hole, one gains a fuzzy view of the whole landscape. Asked to identify the object of a photograph, DrugBaron is certain most readers would perform much better with an out-of-focus version of the whole image than a perfectly focused single pixel – even if the focus was truly terrible. If you don’t agree, try the experiment!

Getting comfortable with such an approach requires a culture-change for many organizations. There is a tendency to view the primary-end-point driven design as the “only way to do clinical trials” or perhaps “the only approach that will be acceptable to the regulators”. This view is dead wrong on both counts.

“A Phase 2a study is like a joke mirror in a fun-house – its purpose is to warp time and space”

At Funxional Therapeutics, we put forward an innovative Phase 2 trial design with a robust multi-objective statistical framework pioneered by Total Scientific, who provided the data management and biostatistics expertise. A senior drug developer in a large pharmaceutical company looked at the design and commented “I love the approach, but you will never get approval to actually run it”. History tells us that gaining approval was straightforward. And the trial design provided information on many end-points in subjects with different chronic inflammatory diseases – a broad and fuzzy picture rather than a sharply-focused, but incredibly narrow, view.

In effect, we had designed a “safe experiment in humans” rather than a “clinical trial”. The results could never be used to support the approval of the product – but that was never the intention (and, indeed, is never really the objective of a Phase 2a study). Instead they did what a Phase 2a study is intended to do: inform the design of the next clinical trial, both in terms of selecting the right patients and the right end-point, and at the same time de-risk continued investment in the drug.

Ironically, the straightjacket of “regulatory development” infects development programmes in large pharmaceutical companies to a much greater degree than in small companies. They call it “having a better understanding of the needs of regulators” – which is undoubtedly true – but that focus sub-ordinates their own needs. The first responsibility of the company, to its shareholders, is to properly de-risk drug candidates before advancing them. This particular bar for the regulators and ethics committees is quite low: as long as the safety risk is negligible, then they will likely approve the trial even if the risk of failure has not been properly mitigated.

Smaller biotech companies are typically much more innovative in their early stage clinical trial strategies. That innovation is driven by a need to convince a pharmaceutical company to buy their programme, rather than from a need to satisfy the regulator at that early stage. The regulator will need to be satisfied, of course – but not yet. Perhaps if pharma companies applied the same stringency on starting a Phase 3 trial of a programme they already own as they do to acquiring one from outside they may see the need to generate a different kind of data in Phase 2.

The old mantra that biotechs can do discovery but pharma are the experts at drug development holds true – but the dividing line is not the IND application, but the start of Phase 3. Pharma can source innovation not only in new molecules and new targets in the biotech community – they have a thing or two to learn about early development as well.

Multi-objective study designs in Phase 2a have another advantage – not to be underestimated – as well. Learning more about the biological profile of the compound informs the judgment of whether the eventual product can be commercially competitive. Even if a robust surrogate end-point exists, it is unlikely to be sufficient to guide to the likely competitive positioning of the product.

Of course, its not black and white. Every trial has secondary end-points and post-hoc analysis of samples to try and fill this void. But that goes only a small way to embracing a true multi-objective statistical framework for early stage clinical trials – almost every trial performed today uses a homogeneous group of patients for example. That’s great if they are the right group of patients. But if you are developing a first-in-class medicine, do you really know that when you design the first Phase 2 study?

Poorly designed Phase 2a studies induce only false confidence and the expensive late stage trials that follow are therefore little more than bare-faced gambles at very poor odds

DrugBaron is convinced that Phase 2a clinical trial design is the single biggest contributor to declining R&D efficiency in the pharmaceutical industry. Introducing a robust approach to trial power, proper use of run-in periods and area-under-the-curve measurements for end-point changes that have been recommended on this blog before will help. But making a meaningful impact on productivity in the pharmaceutical industry can only be achieved by avoiding huge, costly busts in Phase 3 (like the anti-amyloid antibodies and the CETP inhibitors) and costly busts in the marketplace (like Lilly’s Effient™) – molecules that DrugBaron has called “busters” as opposed to “blockbusters”; molecules that achieve regulatory approval without offering sufficient competitive advantage to garner meaningful sales to pay back the investment in bringing them to market. And avoiding these big expensive failures needs a complete re-think – a complete culture change – in how we approach Phase 2 first-time-in-man studies.

If you have got this far, and still don’t know what a multi-objective clinical trial design looks like, its time to find out.