June 15, 2012

I was flying home from Europe yesterday and was watching the Oscar award-winning movie "Good Will Hunting" starring Matt Damon, Robin Williams, Ben Affleck, and Minnie Driver:

One thing I missed the first time I saw the movie, was a moment where Matt Damon's character shows up at Minnie Driver's dorm room to ask her out on a date. She complains that she can't because she has to spend the whole night attempting to assign the 1H NMR spectrum of Ibogamine:

Of course, Will Hunting, janitor for MIT, and genius, goes off to the park and quickly scribbles down the structure on a little piece of paper along with his proposed 1H NMR assignments. Who needs the actual spectral data?

For those that don't know, this movie was very critically acclaimed. It received 9 Oscar nominations and won two, most notably, best original screenplay. So the moment the plane landed I was intrigued to look into how much homework Affleck and Damon did to ensure the validity of that one minute plot line. Too often we see foolish and non-sensical science in movies and on TV. The fact that he assigned the spectrum without even seeing the data aside, I didn't recognize the name Ibogamine off the top of my head, so my first search was whether this compound even existed. A quick check in the ACD/Dictionary, and here it was:

Next question, was whether the NMR would even be difficult for this? Then searching our database for NMR assignments, I found the 13C NMR assignments for two forms:

Unfortunately, no 1H NMR assignments to be had though.

But, after a little more browsing through the web, I came across a Yale Professor who created this very problem for his class based on the reference in the movie:

And based on the spectrum above, we get a reasonable looking 1H NMR prediction:

So a couple of things on this fun Friday:

1) Kudos to Damon and Affleck for incorporating NMR into their script, a critical piece of Organic Chemistry, that often doesn't get the lime-light. I think it's pretty obvious they consulted a chemistry-savvy person for this one.

2) NMR predictions and databases might not only help you with your structure elucidations, but they also might enable you to land a date with someone completely out of your league :)

June 07, 2012

PKC Pharmaceuticals is working very hard to try and not only remedy the issues that have come out of the Bosutinib issue described in previous posts and in other media outlets, but to also really understand what went wrong, what is out there, and how to clear the air on this issue. I believe they should be commended for that.

In the Bosutinib Report #1, issued by PKC on February 20th, 2012 there is an exhaustive review on their analysis up to that date. They plan a report #2 in the near future.

Report #1 is an interesting read and contains several interesting observations about a historical precedent for this kind of issue in the biochemical reagent space, but most interesting for me was the posed question of "What is Bosutinib?"

You see, PKC is now reporting evidence that there is possibly not just 2 different versions of bosutinib being sold, but the potential for a 3rd variant!

The report goes on to explain exhaustive efforts on their part to collect melting points, TLC, HPLC data, and full spectroscopic data to identify the identities of the compounds claimed to be bosutinib.

At page 13, comes the big question. What is bosutinib? From the report:

For the question, "what is bosutinib?", our opinion is that, in view of:

(i) the substantial structural ambiguities (for example, the 16 possible isomers for the simple anilinic raw material used to make bosutinib, many of whose structures cannot be rigourously established, distinguished or ruled out in the final molecule by NMR analysis

(ii) the general and specific difficulty of rigourously proving many complex chemical structures such as bosutinib by any methods other than x-ray crystallography and;

(iii) the fact that bosutinib has been or now is being administered to humans in clinical trials.

bosutinib should now be rigorously defined as that x-ray crystal structure that results from x-ray crystallography carried out on a sample taken directly from the actual "active pharmaceutical ingredient" supplies used to prepare the drug product administered to humans in the Pfizer-sponsored clinical trials- "from the same lot"

Now I don't discount, at this point, the need for very riguorous analysis and structure confirmation of bosutinib by any means necessarry. And there is no question that the most reliable method for structure elucidation of this compound is by X-Ray.

But I also can't help but state that while this will hopefully uncover this problem, it doesn't address the overall problem, and it certainly does not prevent an instance like this from happening again. It doesn't change the fact that false positives happen. Whether they are from degradation products, products from incorrect starting materials, incorrect compounds coming from an external supplier, etc.

X-Ray is a powerful tool but it is also not fast, nor routine, difficult to realize for some research groups, and still relatively expensive. In the industry, X-Ray really becomes the last resort for many of the above reasons. For the more routine cases, this would be like using a hand grenade to kill a housefly.

So the questions come back to the first statement about whether NMR analysis alone can be used to establish, distinguish, or rule out structural candidates. Of course this is a big job because to truly be 100% certain that you've got the correct structure you need to rule out all other possible alternatives. In the case of bosutinib, once you look at the molecular formula and generate all possible isomers, that is clearly a daunting task for a human.

Enter Computer-Assisted Structure Elucidation (CASE). ACD/Structure Elucidator is the most peer-reviewed CASE system available today with over 30 publications focused around the technology and performance. A full book was recently published by the RSC on CASE and much of it is devoted to ACD/Structure Elucidator as the co-authors are ACD/Labs employees past and present.

Specific to this blog posting, the methodology behind Structure Elucidator is such that it can generate ALL conceivable structures from spectral data and a priori knowledge. And it can do this pretty quickly depending on the NMR data made available. And of course the instrument vendors have made great strides in hardware development to make some of the more sophisticated 2D NMR experiments available in a reasonable amount of time for the special cases.

Another notable case study can be found here. it explains the structure elucidation of a molecule where the authors had to resort to residual dipolar couplings to solve a structure based on their belief that the structure could not be solved by classical spectroscopic means.

Which brings us back to Bosutinib. So far we have obtained NMR data of the bosutinib isomer from our collaborator Phil Keyes, Lexicon Pharmaceuticals. We already ran that through our ASV systems and presented the result. Next up was running the data through ACD/Structure Elucidator. We did our first run with the following experiments:

- 1D 1H NMR

- 1D 13C NMR

- 2D COSY

- 2D HSQC (DEPT-edited)

- 2D HMBC

Lo and behold, following structure generation the software generated 17 unique structures based on the data above. After ranking these structures based on average 13C chemical shift deviation between experimental and predicted values, the #1 best structure was indeed Bosutinib Isomer #1:

Note that the average 13C chemical shift deviation between experimental and predicted for the bosutinib isomer #1 was 1.980 ppm which in our experience is very low and suggests a strong level of consistency between the proposed structure and the data. Also worth noting that quite obviously, the actual proposed structure of bosutinib was not generated as any of the 17 possibilities based on the data provided.

That said, there were other interesting structures proposed and we do have some additional 2D NMR data to put into the software. This was a quick and dirty run to generate some quick results and to make some quick conclusions. More to follow. We are going to take a more rigourous look at all the data provided and generate some additional results. We also hope to soon get our hands on the authentic bosutinib material and run similar tests on that.

Special thanks to Philip Keyes for acquiring and sharing the NMR data, and to my colleague Joe DiMartino from promptly running the data in Structure Elucidator. I'd also like to acknolwedge the efforts being put forth by my colleagues Mikhail Elyashberg and Kirill Blinov for conducting some more rigourous tests with this data and I will share more results as they become available.

June 04, 2012

I've already described the Bosutinib fiasco in my last entry, and finished with a teaser that we would be back for more commentary on this topic.

We plan a series of tests of our algorithms and systems to see if our software could have been used to prevent a situation like this. The main question in this post being:

If we supplied the 1H and HSQC spectra for the incorrect isomer to our system proposing the actual chemical structure of bosutinib, what would the result be?

For this, I'd like to thank my friend and colleague, Philip Keyes from Lexicon Pharmaceuticals for his crucial participation in this study. Phil has kindly purchased the compounds of interest for this study and has acquired the NMR data, tested it on our system in his environment, and supplied us with the data for our own testing and evaluation.

In this article, Levinson and Boxer clearly suggested that acquisition of an HSQC experiment would have clearly ruled out the structure of bosutinib as a possibility. We put our system to the test for this study.

First some brief methodology on our combined verification approach when a proposed structure, 1H NMR and HSQC are entered into our system the software will automatically process and analyze the data without any manual intervention and generate a verification score that we call the Verification Product. In a nutshell, the verification product gives us a measure of the consistency between a proposed structure and the supplied NMR data. More details on the methodology of our system can be in the 2007 article published here.

The verification product generates a result between 0 and 1, with 1 being the highest possible score suggesting the highest level of confidence that a structure is consistent with a given set of spectra. Based on how the system is deployed in practice, the end-user will define verification product thresholds to determine which compounds should be let through, and which compounds require further investigation.

Based on our numerous publications in this area, we've adopted a traffic light schema where:

Green Light: The structure-spectrum correspondence are consistent with each other. No further review is required

Yellow Light: The strucure-spectrum correspondence is questionable. The software has identified one or more issues in an attempt to assign all peaks to atoms in the structure. Review is up to the user's judgement.

Red Light: The structure-spectrum correspondence are inconsistent with each other. Review is required.

And again, based on our research works to date, we've adopted the following thresholds as an optimal starting point (organization who deploy ASV will deviate from these thresholds based on whether they are false positive or false negative tolerant). Those thresholds are:

Green Light: When Verification Product exceeds 0.67

Yellow Light: When Verification Product is between 0.5 and 0.67

Red Light: When Verification Product is less than 0.5

With that out of the way, let's look at the results.

When we ran the 1H and HSQC data through the software against the structure of Bosutinib in completely automated fashion the reported verification product result yielded a yellow light with a value 0.52. In essence, the software has flagged this compound as questionable based on the 1H and HSQC NMR data.

What's more, the software does not generate just a black box numerical result. When there is a questionable or inconsistent result, the software specifically describes the issue, and highlights it on the structure and spectrum (see below).

Naturally, we also ran the data through our system with the actual structure for the bosutinib isomer #1. The result was a green light with a verification product of 0.75. Furthermore, as shown below, the software showed no assignment issues:

So that gets the important example out of the way. The next test is whether what the system would have thought of the actual data of bosutinib against both structures. That's coming soon, I hope.

Also, I'll be writing up a post on how our Structure Elucidation software handled this case when given a full set of NMR data.

May 29, 2012

For many years I have been looking for a high profile public case that represents a good example of why an automated NMR structure verification system can help prevent bad things from happening. We all know these types of examples exist, but they don't generally become public knowledge.

Yes, I've blogged about this topic many times over the years. To name a few:

Naturally, the first thing that came to my mind when reading this story was, could an automated NMR software system have caught this mistake. From the C&E News article:

Levinson and Boxer put their publication on hold, ordered bosutinib from a different vendor, and did a battery of tests to determine which material was the genuine bosutinib. They soon figured out the original compound they had done all their research on turned out not to be bosutinib. “We had wasted a huge amount of time and money on the wrong isomer,” Boxer says.On the basis of multidimensional nuclear magnetic resonance (NMR) experiments, Boxer and Levinson believe that this isomer not only has a chlorine at the 3-position rather than the 2-position, but also that the chloro and methoxy groups that appear in the 4- and 5-positions, respectively, in bosutinib’s aniline moiety have been switched.The difference between the two molecules is subtle. Mass spectrometry and elemental analysis would be the same for both compounds. There are differences in the aromatic region of the 1H NMR, but one wouldn’t necessarily pick up on them unless they compared the two compounds side by side. It’s only when you use 13C NMR that the symmetrical nature of the aniline group in the isomer becomes clear.

Here are the two molecules in question:

Much of the discussion has been that the average chemist would let the incorrect isomer through with just the LC-MS information and quick glance at the 1H. Of course the 13C NMR would clearly point out the difference but this isn't standard practice for many bench chemists.

But would one have even needed to acquire a 13C NMR spectrum for this?

Would a 1H NMR have been enough? At the very worst, a HSQC experiment? The supporting documentation for this article suggests clearly yes to the latter.

Phil Keyes from Lexicon Pharmaceuticals and I are working on this right now and should have some results showing how our automated NMR verification system would have handled this case.

February 15, 2012

It's been a very long time since my last post. I apologize for that, but I can assure that my posting absence has not been due to lack of quality things to talk about. Hopefully this post will help me get over the activation barrier of ignorning the blog for months!

My motivation for coming back is to update you on some excellent publications that have been published that involve ACD/Labs technology.

June 09, 2011

LC/MS has become the primary analytical check for the medicinal chemist. I Love NMR, but it's the reality

So how did we get here? Clearly there a lot of reasons, but one of my hypotheses is based on the evolution of open access and the walk-up environment. No doubt that moving away from the traditional analytical service departments has been a great thing from a productivity standpoint.

LC-MS lends itself really well to the open access environment. There are good systems out there that can auto-process data well and provide a nice report that gives the chemist a simple answer. In most cases, the chemists can get the answer they are looking for with a quick glance at the report. In addition, for this kind of routine analysis, the interpretation of the results is very easy.

Contrast that to NMR. How often are the printouts from the instrument good enough? Not nearly as often. More and more it is becoming common practice for the chemist to import their raw or processed data into the processing software on the instrument or in some offline processing software. Within these applications, they can get a better look at their data, zoom in on areas of interest, re-integrate the spectrum so it makes sense, perhaps even generate a multiplet report for a potential patent down the road.

Add up those tasks alone, and the chemists may be forced to spend 5-10 more minutes on a routine NMR than an LC-MS when you factor in interpretation as well! Unfortunately, the NMR instrument is not printing out an answer, it is printing out a spectrum that needs to be interpreted by the chemists which on a routine basis is a lot more difficult and complex task than checking an LC-MS for reaction completion. Furthermore, let's keep in mind that intrepreting analytical data is not the medicinal chemists job. They are evaluated, compensated, etc. for making molecules. The more molecules the better so the inherent risk here is obvious and I don't think it's fair to simply "blame the chemists"

Given the above, is there any wonder why LC-MS has become the primary analytical check?

June 01, 2011

LC/MS has become the primary analytical check for the medicinal chemist. I Love NMR, but it's the reality

Reality #2:

More often than not, NMR spectra are merely glanced at:

Sometimes interpretation is based on the presence or absence of one peak (did my reaction go to completion?)

How often are NMR spectra fully assigned by a medicinal/synthetic chemist? Not very often, it's not their job.

Is all the information a 1H NMR spectrum can provide being effectively used? Some chemists are extremely good at NMR, other couldn't be bothered with the intimate details, and I can't say I blame them. It's largely a training/education issue

How often are 2-dimensional experiments acquired and interpreted? Of course this varies chemist by chemists, organization..but despite strong advances in hardware, acquisition, etc, I believe the safe answer is, "not enough"

Reality #3:

In many organizations, upwards of 95% of compounds are being registered by synthetic/medicinal chemists without having undergone any analytical review by an analytical specialist (NMR spectroscopist).

***

How and why did we get here? I have some hypotheses, and will write about them in the coming days.

May 19, 2011

When I created this blog several years ago, one of the first links I added to my blog was to Stan's NMR Blog. Over the years I have had the pleasure to meet Stan who is a wonderful individual. His blog is very interesting, Stan is incredibly knowledgeable, and I really enjoy his writing style.

Recently, Stan posted about a topic, those readers who follow my blog know I am very passionate about; Automated Structure Verification (ASV). The initial entry was very thought-provoking and while I think the numbers Stan posts regarding the number of spectra chemists interpret below are too high, I think he's put together an interesting argument nonetheless about the need and application of ASV in the scientific community:

The typical application area for such a product is drug discovery, meaning Pharmas. I was told that these Companies employ masses of chemists who have to interpret some 50 spectra a day, having only 8h/50 = 9:36 minutes per spectrum. Clearly, they don't wag their tails, nor wish to pursue more balls. They were reduced to the echelon of routine workers and they hope that the new software might give them a few minutes more time to maybe ponder again that strange spectral peak that looked like an interesting challenge. Illusion, of course, because their top CEO's hope that the software will make half of the chemists redundant, leaving the rest doing a spectrum every 4:48 minutes (guess who will prevail).

Stan's entry definitely triggered some interesting comments from the community as well, in addition to a follow up letter from John Hollerton, GSK.

My take is that we don't need to be waiting 3-5 years for this technology to become more relevant...it already is! It is already being used in the industry. However, the secret is finding and recognizing the right application for it within the industry, and that application will vary considerably from lab to lab.

It seems the main discussion about this completely surrounds whether or not ASV can replace the NMR interpretation of the medicinal or synthetic chemist. Of course this is the ultimate goal, but it should not be considered the only goal. John eludes to some other applications in his letter and there are other uses as well.

I will blog more later about my thoughts on the "Reality of the New(ish) Discovery Environment" as it pertains to NMR and more thoroughly justify why I feel the way I do about this technology. But for me, the bottom line is that this is not just a concept, it's not a dream that we need to wait be realized in the next 3-5 years. It is also not a new technology on the horizon. Our first publication on this topic was back in 2006 and we've continued to build on those approaches in 2007 and 2008. Work was done by Griffiths et. al in the late 90s and early 2000s. The techonology has only gotten better and better. We have been working diligently with industry partners for over the last decade on this concept. It is not new and it is not an unsolved area.

Just to be clear, am I saying this technology is ready to completely replace the NMR interpretation responsibilities of the average medicinal/synthetic chemist? Definitely not.

Can it be used to help chemist make better, faster, and more independent decisions? Yes. Can it be used in complimentary way to the chemists existing interpretation workflow? Yes. Can it used to be validate the quality of registration libraries? Yes. Can it be used to verify compounds purchased from external sources, whether this are assay materials or simple starting materials? Yes. The list of applications go on for a while.

I believe current ASV applications can improve all the above workflows. The key is identifying the best applications and practicing caution in how this technology is deployed and used by chemists.

In conclusion, my advice is that if we continue to treat this technology as only applicable to the ultimate goal, perhaps it will take longer than 3-5 years, or perhaps NMR will become irrelevant before we get there. The goal of NMR scientists, hardware, and software vendors should be to work hard to continue to make NMR technology relevant in all industries. If we can do this, we all win.

February 21, 2011

Recently I finished a great book that made me think about different things on a lot of different levels. The book is called The Black Swan and the author is Nassim Nicholas Taleb.

In short, a black swan is defined as an event, positive or negative, that is deemed improbable yet causes massive consequences.

From wikipedia:

What we call here a Black Swan (and capitalize it) is an event with the following three attributes. First, it is an outlier, as it lies outside the realm of regular expectations, because nothing in the past can convincingly point to its possibility. Second, it carries an extreme impact. Third, in spite of its outlier status, human nature makes us concoct explanations for its occurrence after the fact, making it explainable and predictable.

Anyone who follows this blog knows that I have a passion around the concept of automated structure verification by software methods and the different applications it can play within the pharmaceutical industry.

Reading through this book, one of the things that popped in my mind was the black swan impact as it pertains to compound registration libraries in pharma.

I spend many hours in my role discussing the concepts of automated structure verification (by NMR of course) to spectroscopists, chemists, directors, VPs, Managers of compound management, etc.

In many of these conversations, the risk of having an incorrect compound in the registration library is always a fruitful and interesting discussion. However, this conversation is almost always dominated by low probabilities. The low probability that the compound being registered is wrong in the first place. I agree, this is a low probability. After all, a trained chemist has synthesized said material and has used a variety of analytical methods to confirm it's identity. In some organizations, the compounds in this library will be validated further either before they are accepted in the store, or before they are sent off to assay.

So the argument goes that incorrect compounds in the registration library are low probability. And furthermore, there are steps being taken in some organizations to proactively catch incorrect compounds (most often LC-MS). Finally, perhaps the lowest probability event of them all is the compound being identified as a hit during the assay.

So in the end, what are the chances that a meaningful compound that is identified as a "hit" is actually an incorrect compound? Furthermore, in this instance what's the chances that this compound will make it too far downstream without it's false identity being exposed?

In short, it's a pretty low probability event, and because in most cases it will eventually be caught it's a stretch to really call this a massive consequence or impact.

That's not the black swan I am talking about in this post.

In my opinion, the Black Swan concept is more relevant to those incorrect compounds that lie in the registration library and never get cherry picked out. A compound that doesn't produce anything interesting from the assay. A compound that effectively hides in the back of the shelves in a compound management cold room, that has been documented as "tested" but never advanced any further because of poor assay results.

Of course this compound is not the Black Swan. However, it's possible that the compound that it was supposed to be is. The compound that the chemist thought they made, and the compound that was referenced in the inventory.

While it's low probability, there are probably tens of thousands of misrepresented compounds that haven't truly been assayed over the last 20 years. Of these thousands could any of them turned into the highly coveted, blockbuster drug?

Sure, it's doutbful, but the infinitely small possibility exists.

And further, this very idea contains one of the core components of a Black Swan; "nothing in the past can convincingly point to it's possibility"

I am not certain that there is an example of a blockbuster drug that was originally missed because of a mistake in synthesis, or a misrepresented registrant. My guess is "not really" And if it did, perhaps as Taleb suggests, it would have been rationalized by hindsight.