Wednesday, 25 May 2011

A recurring thing we are interested in is classifying the natural ligand/partner of a protein. Having a classification of the 'natural' ligand partner for every human protein would be really useful, and would allow us to ask and more importantly answer quite complex questions in ChEMBL like - 'show me all small molecules that modulate a protein-protein interaction (PPI) at better than 100 nM potency?'.

I've had a bit of a dig around the web, but there isn't anything that does exactly what I need (or I'm not smart enough to find it), but I'm sure it must exist, so in the comments section, please post resources and ideas.....

HCV is a prolonged infection that affects the liver and is caused by a small positive single-stranded RNA virus, which is transmitted by blood-to-blood contact. Chronic hepatitis C is normally asymptomatic, but may lead to liver fibrosis, and if untreated, potentially fatal liver failure. There is currently no vaccine for this type of hepatitis.

Telaprevir is an inhibitor of the hepatitis C virus (HCV) non-structural protein 3 (NS3) protease (ChEMBLID:CHEMBL4893; Uniprot ID:A3EZI9), a viral protein required for the proteolytic cleavage of the HCV encoded polyprotein (UniProt:P27958) into mature forms of the NS4A, NS4B, NS5A and NS5B proteins (NS3 is Uniprot: P27958[1027-1657]). These proteins are involved in the formation of the virus replication complex, and therefore are vital to its proliferation. In a biochemical assay, Telaprevir inhibited the proteolytic activity of the recombinant HCV NS3 protease domain with an IC50 value of 10 nM.

The -vir USAN/INN stem covers antiviral agents, and the substem -previr indicates it is a serine protease inhibitor. Telaprevir is the second approved agent to target HCV NS3, following the approval earlier this month of Merck's Boceprevir (q.v.). Other compounds in this class in late stage clinical development/registration include Tibotec's TMC-435, and Bristol Myers Squibb's Asunaprevir (BMS-650032). Others at earlier stages of development include ABT-450, BI-201335, IDX-320, MK-5172, Vaniprevir (MK-7009), Narlaprevir (SCH-900518), Danoprevir (RG-7227, ITMN-191), BIT-225, VX-500, ACH-1625, GS-9256.

Telaprevir (IUPAC: (1S,3aR,6aS)-2-[(2S)-2-({(2S)-2-cyclohexyl-2-[(pyrazin-2-ylcarbonyl)amino]acetyl}amino)-3,3-dimethylbutanoyl]-N-[(3S)-1-(cyclopropylamino)-1,2-dioxohexan-3-yl]-3,3a,4,5,6,6a-hexahydro-1H-cyclopenta[c]pyrrole-1-carboxamide; SMILES: CCCC(C(=O)C(=O)NC1CC1)NC(=O)C2C3CCCC3CN2C(=O)C(C(C)(C)C)NC(=O)C(C4CCCCC4)NC(=O)C5=NC=CN=C5; PubChem:3010818; ChEMBL ID: CHEMBL231813) has a molecular weight of 679.8 Da, contains 4 hydrogen bond donors, 8 hydrogen bond acceptors, and has an ALogP of 2.69. The inhibitor is clearly peptide like, containing four amino acid residues, mimicking the natural substrate of the protease, and including a 'warhead' - the alpha-keto amide, which covalently binds to the catalytic serine residue of the target enzyme.

Telaprevir is available as oral film-coated tablets of 375 mg. It has an apparent volume of distribution (Vd/F) of approximately 252 L, and, in patients who received a dose of 750 mg three times a day (the recommended daily dose is therefore a large 2.25 g (equivalent to 3,310 umol)), the exposure is characterised by an AUC of 22,300 ng.hr/mL, with a Cmax of 3510 ng/mL. Telaprevir should be administered with a standard fatty meal, since its bioavailability is enhanced by 237%. In vitro, protein plasma binding ranges from 59% to 76%.

The predominant metabolites of Telaprevir in plasma are the R-diastereoisomer (VRT-127394), which is approximately 30-fold less potent than the parent drug, pyrazinoic acid, and a metabolite that underwent reduction of the α-ketoamide bond of Telaprevir (which is, as expected not active against the target). Telaprevir is also metabolised by CYP3A4, being simultaneously a substrate and an inhibitor, and therefore, other therapeutic agents metabolised by CYP3A4 may prolong their therapeutic effect or adverse reactions. See prescribing information for the extensive list of drug-drug interactions and contraindications.

Following administration of a single oral dose of 750 mg, Telaprevir is eliminated with a mean plasma half-life (t1/2) of approximately 4.0 to 4.7 hours, and it has a mean total body clearance (CL/F) of approximately 32.4 L/hr.

Telaprevir was developed almost in parallel with Boceprevir, the first-in-class inhibitor of the HCV NS3 protease. Both drugs require a high daily dose for an effective response, and are generally similar with respect to their pharmacokinetic and pharmacokinetic parameters.

Sunday, 22 May 2011

On May 20th, the FDA approved Rilpivirine (Tradename: Edurant; Research Code: TMC-278, NDA 202022), an HIV-1 Non-nucleoside reverse transcriptase inhibitor (NNRTI), for the treatment of HIV infection in treatment naive patients in combination with other HIV therapies. HIV infection is a serious, and if untreated fatal infection caused by a lentivirus, however due to intensive research, leading to a wide variety of antiviral agents the disease is treatable with a substantial increase in quality of life anticipatable.

Rilpivirine is an inhibitor of the essential reverse transcriptase (RT) enzyme of HIV-1. (ChEMBLID:CHEMBL247; Uniprot ID:Q72547), a viral protein required for the transcription of the single-stranded RNA genome of HIV-1 into double-stranded DNA - this is the opposite of the classical transcription of DNA into RNA. The RT enzyme is translated as part of a long complex gag-pol polyprotein, and requires specific proteolytic cleavage by a virally encoded protease (HIV-1 PR) - this protease is also the target of many successful HIV therapies. There are two distinct binding sites within the RT enzyme that are therapeutically targetable - first is the catalytic center for drugs such as AZT and other nucleoside analogues, the second is the 'allosteric' non-nucleoside site, which is only usefully present in the HIV-1 RT sequence, and so NNRTI agents are usually specific for HIV-1. Rilpivirine is a non-competitive inhibitor of HIV-1 RT.

There are many protein structures known for RT in complex with inhibitors, including that of the complex with TMC-278/rilpivirine itself - PDBe:3mee. The RT enzyme has an interesting composition, being a heterodimer of two proteins derived from the same gag-pol polyprotein - one called p66, the other p51, both contain the polymerase functionality (the polymerase domain is composed of four structural subunits, within each polymerase unit, these domains are arranged differently forming an asymmetric dimer), the p66 additionally contains a further catalytic function - Ribonuclease H.

Rilpivirine (IUPAC:4-{[4-({4-[(E)-2-cyanovinyl]-2,6-dimethylphenyl}amino)pyrimidin-2-yl]amino}benzonitrile; SMILES: CC1=CC(=CC(=C1NC2=NC(=NC=C2)NC3=CC=C(C=C3)C#N)C)C=CC#N
PubChem:6451164) is an achiral synthetic small molecule drug, it is a member of the diaryl pyrimidine (DAPY) class of NNRTIs. It has a molecular weight of 366.4 Da, contains 2 hydrogen bond donors, 4 hydrogen bond acceptors, and has a LogP of 4.5.

Rilpivirine is available as oral tablets contains 27.5 mg of Rilpivirine hydrochloride (equivalent to 25 mg of active ingredient). Rilpivirine should be administered with food, since in fasted patients, absorption is significantly lower. Human plasma protein binding (ppb) primarily to serum albumin is approximately 99.7%. The primary metabolising route of Rilpivirine is through oxidative metabolism by CYP3A4, with a half-life of ca. 50 hr, with elimination being largely via feces.

The license holder for Rilpivirine is Johnson & Johnson., and the full prescribing information can be found here.

Friday, 20 May 2011

Here is a job advert, not for a ChEMBL position, but as part of the (fabulous) IUPHAR-DB project.

Senior data curator to assist with the development of the IUPHAR Database (http://www.iuphar-db.org) and the British Pharmacological Society Guide to Receptors and Channels (http://www.brjpharmacol.org/view/0/GRAC.html). You should hold a PhD or equivalent in Pharmacology, Medicinal Chemistry or a related discipline together with relevant experience in bioinformatics or chemoinformatics. Based in the Queen's Medical Research Institute, you will work independently on a day-to-day basis but in close contact with Professor Tony Harmar, with the database developer and in liaison with IUPHAR and BPS. Closing date is 22nd June.

A paper has recently been published, which may be of interest; it covers the use of chemogenomics class data in activities such as target prediction, receptor deorphanisation and so forth. A link the the paper is here.

Wednesday, 18 May 2011

On the 26th May 2011 new European regulations will come into force, which are aimed at providing website users with a better understanding of data being collected when they visit a website. Covered under these regulations, is the use of cookies, which are small pieces of of text set by some websites and can be used for authentication, saving site preferences/shopping cart selections and storing session information. In short, the cookie is being used to get around the inherently stateless nature of HTTP.

There is currently a bit of confusion over what the final regulations and national implementations will be, so the Information Commissioner’s Office (ICO) has drawn up some guidelines, however these will not be finalised until after 26th May 2011 :(. In preparation for these changes we thought we would list the cookies which might be set when a user visits the the ChEMBL interface:

Cookie Name

Details

ci_session

Generated by Codeignitor web application framework and used to store session information for each user

Tuesday, 17 May 2011

On May 2nd, the FDA approved Linagliptin (BI-1356, trade name Tradjenta, ATC code A10BH05, ChEMBL ID 237500 NDA 201280), a dipeptidyl peptidase-4 (DPP-4) inhibitor, to treat type II diabetes (OMIM: 125853). Linagliptin has been approved for monotherapy or in combination with other medications, in conjunction with exercise and dietary modification. Due to a malfunction in production of or response to insulin, patients with type II diabetes suffer from high blood glucose levels.

By inhibiting DPP-4 (Uniprot P27487, OMIM: 102720, EC number 3.4.14.5), a cell surface glycoprotein receptor, Linagliptin stabilizes the level of two of its substrates, the intecrins GLP-1 and GIP, gastrointestinal peptide hormones which stimulate insulin release from beta cells of the Islets of Langerhans.

Linagliptin has been shown to have a high affinity (Ki 1 nM) for DPP-4 in cell-based fluorescence assays, and to be highly selective. DPP-4 exists in soluble form (aminoacids 39-766) or with a N-terminal single-anchor domain, linking the extracellular domain to the cell membrane. There are several homodimeric crystal structures available, e.g. PDB 1J2E.

Tradjenta is dosed as a 5 mg tablet, once daily (equivalent to a daily dose of 10.6 umol).

After a single administration, a maximum concentration (Cmax) of 8.9 nmol/L is reached after Tmax=1.5 h. Linagliptin has a long terminal half-life (>100 h) and steady-state plasma concentrations are reached after the third daily dose. At steady state, Cmax is increased by a factor of ~1.3 as compared to the single administration. The mean apparent volume of distribution (Vd) is approximately 1110 L.

Chronic hepatitis C genotype 1 is a prolonged infection that affects the liver and is caused by a small single-stranded RNA virus, which is transmitted by blood-to-blood contact. Chronic hepatitis C is normally asymptomatic, but may lead to liver fibrosis, and thus liver failure.

Boceprevir is a first-in-class inhibitor of the hepatitis C virus (HCV) non-structural protein 3 (NS3) protease (ChEMBLID:CHEMBL4893; Uniprot ID:A3EZI9), a viral protein required for the proteolytic cleavage of the HCV encoded polyprotein (UniProt:P27958) into mature forms of the NS4A, NS4B, NS5A and NS5B proteins (NS3 is Uniprot: P27958[1027-1657]). These proteins are involved in the formation of the virus replication complex, and therefore are vital to its survival. HCV NS3 is a serine proteinase (Pfam:PF02907). Through a reactive center, the (alpha)-ketoamide functional group, boceprevir binds covalently to a serine in the active site of NS3 protease (S139), inhibiting viral replication in HCV-infected host cells. In a biochemical assay, Boceprevir inhibited the activity of recombinant HCV genotype 1a and 1b NS3/4A protease enzymes, with Ki values of 14 nM for each subtype.

Boceprevir is available as oral gelatin capsules of 200 mg. It has an apparent volume of distribution (Vd/F) of approximately 772 L, and, in patients who received a dose of 800 mg three times a day (the recommended daily dose is therefore a large 2.4 g (equivalent to 4,600 umol)), the exposure is characterised by an AUC of 5408 ng.hr/mL, a Cmax of 1723 ng/mL and a Cmin of 88 ng/mL. Boceprevir should be administered with food, since food enhances its bioavailability by up to 65%. Human plasma protein binding (ppb) is approximately 75% following a single dose of boceprevir of 800 mg.

The primary metabolising route of boceprevir is through the aldo-ketoreductase (AKR)-mediated pathway to ketone-reduced metabolites that are inactive against HCV (Pfam:PF00248). Boceprevir is eliminated with a mean plasma half-life (t1/2) of approximately 3.4 hours, and it has a mean total body clearance (CL/F) of approximately 161 L/hr.

Boceprevir is a strong inhibitor of CYP3A4, and therefore, other therapeutic agents primarily metabolised by these enzyme may prolong their therapeutic effect or adverse reactions, see prescribing information for the extensive list of drug-drug interactions, contraindications.

The license holder for Boceprevir is Merck & Co., and the full prescribing information can be found here. For more information, please visit the product website here.

Here is a snapshot of the Clinical Development Phase kinase inhibitors we have identified, all 313 of them. Any feedback on the data would be great, so any missing compounds, mismapped synonyms, wrong highest phase data, etc. would really help tune our data discovery approaches for this sort of thing. Please do not send us any proprietary/licensed data that you are not free to share!

The agenda for the first ChEMBL user group meeting is below, we also have some additional speakers on various collaborative projects likely to be of interest to attendees, as extra-meeting items. Many thanks to the speakers and to Brad Sherbourne of Merck for putting this together.

The presentations will be a mix of slides and generous discussion time, giving plenty of opportunity to shape the future development of ChEMBL.

There is still time to register, details on the LinkedIn ChEMBL User Group. Travel details are on the EBI website www.ebi.ac.uk, when you arrive, present yourself to Security at the Campus Visitor Center, then you will be sent to the EBI reception where you'll be met and taken to the meeting room.

Wednesday, 11 May 2011

We are involved in a fascinating collaboration with Prof. Aroon Hingorani from the Clinical Epidemiology Dept of UCL Division of Medicine - working with clinical data to identify new approaches to the treatment of cardiovascular disease. Further details of the position are here. Closing date is May 23rd 2011 - so get your skates on!

Monday, 9 May 2011

Just a reminder of the first ChEMBL User Group, which will be held here on campus on Friday May 27th. Full details are on the LinkedIn group. If you are not a member, just send a join request! Attendees will each get a stick of ChEMBL rock.

The agenda, with speakers, titles, etc. will be posted shortly, both here and on the LinkedIn group.

Thursday, 5 May 2011

So we can run some numbers, to get an idea of the scale of the issue, and then draw things together (in the next post) with a couple of things we are thinking of ourselves to do in ChEMBL.

Stereocenters
For a molecule stored in a database with a single undefined sp3 stereocenter, there are two possible distinct physical molecules (enantiomers), remember some properties are invariant w.r.t. the stereochemistry (e.g. logP) others aren't (e.g. binding energy to a receptor). As further undefined stereocenters are introduced, the number of possibilities increases as a simple combinatoric product. For three stereocenters, there are therefore 2**3 = 8 possibilities. There are a number of programs available to perform this stereo-enumeration - including stereoplex.

Tautomers
Enumeration of possible tautomers is a complex issue, and there has to be introduced the concept of an energy difference (which will reflect how frequent that tautomer occurs) - however the energy difference between two tautomers is crucially dependent on solvation and stabilisation of a particular tautomer by complexation in a binding site, so it is a not trivial task to treat this matter both comprehensively and accurately. There is probably an interesting average scaling of the number of 'reasonable tautomers' as a function of molecular weight for typical drug like molecules in a database such as ChEMBL (we just haven't looked yet). However, it is an area of active research, and there are many tools available to treat these systems - including MN.TAUTOMER.

Ionizable centers
As per tautomers there is the concept of reasonableness that needs to be applied here - in theory, benzene can act as both a base and an acid, but the pKb and pKa will be so outside the range encountered in what we currently understand as life-like conditions that it is irrelevant (the pKa of benzene is ~43). However, for molecules with 'regular' basic and acidic groups, there will be two (or more) states for each ionizable centers. There can be multiple states for some functional groups (e.g. polyprotic acids), such as phosphates which will have multiple pKa values, reflecting increasing levels of overall charge. These charge effects will greatly affect binding to a target. A further complicating factor is that the pKa of molecules is often perturbable by their molecular environment, and this shifting of 'standard' pKa values is often a defining feature of catalytic residues in enzymes.

However, for simple groups such as carboxylic acids, there are two states, and for simple aliphatic amines there are two states to be considered. These combine again in a combinatorial fashion, so a molecule with a simple basic and simple acidic center will have four possible states.

A key feature in dealing with the treatment of the ionization state of a molecule from a normalised chemical database is via prediction of the pKa/pKb of a molecule. There are many tools available to do this - including ACD/PhysChem Suite.

To further complicate things, the calculation of pKa/Pkb depends on the tautomers that the calculation is performed on.

Conformational flexibility
It is tempting to think of storing three-dimensional structures for molecules, since this information can be used in tasks such as docking of a library of rigid molecules to a receptor, or the generation of pharmacophores from a set of molecules that are known to bind to a receptor. However, in general the number of possible conformers is very large, and when combined with the additional complexities above of undefined stereocenters, tautomers, ionizable centers makes this a very challenging task. To give an idea of the complexity, two sp3-sp3 bonds will have three energy minima around that bond; as the number of rotatable bonds is increased, there will be an approximate combinatoric product, so for three independent sp3-sp3 rotatable bonds there will be 3**3 = 27 plausible conformers. Again these will have different energies (and therefore population frequencies), but again these energies are crucially dependent on tautomers, ionization and the solvent/receptor environment. It is complicated.

A very widely used estimate of the conformational flexibility of a molecule, and the implied entropic cost of ordering a flexible molecule on binding to a receptor, is the number of rotatable bonds.

Again there are many tools for the generation of reasonable three-dimensional conformers for a molecule - including CORINA.

Conclusion
So there is therefore a scale of complexity, molecules that are rigid, have no tautomeric forms, are not acids or bases and have completely defined (or no) stereochemistry are unambiguous, and can be safely used from a database like ChEMBL for things like docking. It is also possible to calculate a variety of descriptors for these molecules, and these calculations will be 'robust'. However it is important to appreciate that many property calculation methods require the selection of a single representative structure from the set of possibles.

Other molecules are more complex, and to obtain a physically relevant structure (or set of low energy structures), may require substantial processing/enumeration, the number of possible physical forms can easily extend into hundreds for drug-like molecules; for example, the simple molecule below, has 96 possible forms to consider for something like docking (and this ignores the very large number of conformational states from the 10 rotatable bonds in the structure, which alone are about 3**10 or over 59,000 'states').

An annotated form of the points of interest in this molecule is

In the next post, we will try and bring this together in the context of the ChEMBL database, and how we normalise our chemical structures on registration, and also how we calculate our descriptors, and some of the assumptions we make in this process.

Wednesday, 4 May 2011

There is a publication just out covering some ideas around the prioritisation and development of drug discovery strategies for infectious diseases, involving collaborators from Dundee, UPenn and IBM Almaden. RAPID is one of those recursive acronyms, just like ZINC, or GNU.

I came across this interesting paper from earlier this year - "HIV proteinase inhibitors target the Ddi1-like protein of Leishmania parasites", published in FASEB J. HIV protease inhibitors were known to decrease levels of Leishmania in vivo, but the molecular target was not known. This paper shows that HIV-1 proteinase inhibitors are probably functional inhibitors of Ddi1 from Leishmania spp. Nelfinavir (the structure above) is a 440 nM IC50 inhibitor of L. major Ddi1 (and weaker against the human ortholog 3.3 uM). This is on the face of it, a lovely example of drug repositioning - the use of a drug for a new, and in this case, a non-obvious use.

HIV-1 PR (UniProt:Q9YQ34) and Ddi1 (UniProt:A4H334) are both aspartyl proteinases (Pfam:CL0129), share a common mechanism, and overall architecture (although HIV-PR is a homodimer, and Ddi1 is a single chain containing two 'copies' of the HIV-PR sequence). There is a human Ddi1 ortholog as well (UniProt:Q8WTU0), and poking around in UniProt shows there is a Tryp orthologue as well, but not an obvious Plasmo one - but I haven't really looked too hard, so far.

It is interesting to speculate what would be the most efficacious/potent launched/clinical stage HIV-1 PR inhibitor for the treatment of Leishmaniasis - and probably homology modelling and docking could allow a pretty good guess at this.

Sunday, 1 May 2011

On 28th April 2011, the FDA approved Abiraterone acetate (Brand name ZytigaTM, NDA 202379) for the treatment of castrate-resistant prostate cancer (CRPC). Specifically, it is indicated for use in combination with prednisone for the treatment of patients with metastatic CRPC who have received prior chemotherapy containing docetaxel. Response biomarkers are amplified Androgen Receptor (AR); PTEN loss and hormone driven ERG rearrangement.

Abiraterone acetate (research code: CB-7630) is a steroid derivative with a chemical formula C26H33NO, (IUPAC: (3β)­ 17-(3-pyridinyl)androsta-5,16-dien-3-yl acetate; SMILES= (CC(=O)O[C@H]1CC[C@]2(C)[C@H]3CC[C@@]4(C)[C@@H](CC=C4c5cccnc5)[C@@H]3CC=C2C1 ); InChI=1S/C26H33NO2/c1-17(28)29-20-10-12-25(2)19(15-20)6-7-21-23-9-8-22(18-5-4-14-27-16-18)26(23,3)13-11-24(21)25/h4-6,8,14,16,20-21,23-24H,7,9-13,15H2,1-3H3/t20-,21-,23-,24-,25-,26+/m0/s1). It has molecular weight 391.55 and a LogP of 5.12. Abiraterone acetate is an ester prodrug which is converted in vivo to the active component, abiraterone.

Abiraterone irreversibly inhibits 17-α-hydroxylase/C17,20-lyase (CYP17A1), an androgen biosynthesis enzyme expressed in testicular, adrenal, and prostatic tumor tissue. (Uniprot:P05093, canSAR link) and a member of the cytochrome P450 family of enzymes (PFAM:P450). Cyp17A1 normally catalyses a two step reaction: the conversion of pregnenolone and progesterone to their 17α-hydroxy- derivatives and then the formation of dehydroepiandrosterone (DHEA) and androstenedione, the precursors of testosterone. No structure of CYP17A1 itself exists, but it is homologous to other Cytochrome p450 enzymes such as CYP1A2 (PDBe:2hi4). Abiraterone's safety and effectiveness were established in a clinical study of 1,195 patients with late-stage castration-resistant prostate cancer who had received prior treatment with docetaxel chemotherapy, and showed improved survival. Zytiga is supplied in 250 mg tablets and the recommended dose is 1000 mg administered orally once daily in combination with prednisone 5 mg administered orally twice daily.