Incorporating DrugCentral data in our network

I spoke with @TIOprea and Oleg Ursu from the University of New Mexico. They are constructing a highly curated yet highly integrative database of pharmacology named DrugCentral. They have not yet published a journal article detailing their database. However, they have posted an alpha webapp and data repository, which provide access to select components of the database.

My impression was that the database is similar in concept to DrugBank but has key advantages in certain areas. First, it has integrated types of data which are not currently part of DrugBank. Second, it takes a more clinical approach to curation compared to DrugBank. For example, drug–target relationships in DrugCentral adhere more to the "three pillars [1]" of pharmacological activity.

I created a repository (dhimmel/drugcentral) to process parts of DrugCentral for inclusion in our network. Details of the integration will follow.

Contributions to our hetnet

I processed DrugCentral data and converted it into the identifier systems used by our network (notebook). I have initially added two relationship types from DrugCentral into the hetnet (commit).

Drug targets

I extracted drug–target relationships from DrugCentral and converted them into the DrugBank and Entrez Gene identifiers in our network (dataset). The table below shows the sources from which DrugCentral compiled drug targets and how many relationships each source contributed.

Pharmacologic classes

DrugCentral has compiled the membership of compounds in pharmacologic classes from several sources, which contain the following types of classes:

FDA — Mechanism of Action

FDA — Physiologic Effect

FDA — Chemical/Ingredient

FDA — Established Pharmacologic Class

MeSH — Pharmacological Action

CHEBI — Application

I decided to assign all of these classes to a single node type (Pharmacologic Class). I added a new relationship type for Pharmacologic Class–includes–Compound. DrugCentral contributed 10,959 relationships for 1,262 pharmacologic classes.

Medical indications

In my conversation with DrugCentral team members, we first discussed PharmacotherapyDB, our recently-released physician-curated catalog of indications. One major takeaway was that we needed to more clearly explain that our definition of disease modifying differs from the clinical definition. Also, we need to more clearly state that NOT refers to non-indications.

As part of DrugCentral, they've constructed their own indications catalog. Their seeded their catalog from OMOP in 2012 and have since then manually added additional indications. OMOP has now become OHDSI and hosts their vocabular on GitHub at OHDSI/Vocabulary-v5.0. As a side note, we were not aware of OMOP [1] or OHDSI [2] when we assembled our indications for version 1.0 of PharmacotherapyDB.

Aligning indications with PharmacotherapyDB

I converted the DrugCentral indications to the slim sets of DrugBank drugs and Disease Ontology diseases in PharmacotherapyDB 1.0 (notebook, dataset). For each disease, I aggregated direct indications as well as indications for subtypes (referred to as propagation).

In the converted dataset, I included a category column giving the indication's PharmacotherapyDB 1.0 status. Of a total of 671 indications extracted from DrugCentral, 210 were not in PharmacotherapyDB 1.0. Of the 461 indications in PharmacotherapyDB, 359 were classified as disease modifying (78%), 77 were classified as symptomatic (17%), and 25 were classified as non-indications (5%).

6 of the non-indications were for anemia and 8 were for hypertension, two diseases for which we have a known problem with their generality. Compared to the four sources of PharmacotherapyDB indications, DrugCentral appears to have a higher percentage of disease modifying indications. However, we're basing this assessment on indications that appeared in DrugCentral and at least one other resource, so it's potentially biased.

@pouyakhankhanian, if you are up for curating the 210 new indications as DM, SYM, or NOT, we could potentially:

add these indications to a future release of PharmacotherapyDB

use these indications to test our predictions

Pouya Khankhanian: I'm up for it. I should have some time either late this week or early next week.

Pharmacologic Classes that are indications

We've noticed that many of the pharmacologic classes are essentially indications. This could be problematic since it could confound our classification approach. Specifically, it could lead to the appearance that our method predicts indications when in reality it just regurgitates indications which were encoded by a pharmacologic class.

@sergiobaranzini and I looked through the 6 sources and found that 3 were less problematic:

FDA — Chemical/Ingredient

FDA — Mechanism of Action

FDA — Physiologic Effect

The other 3 were more problematic:

FDA — Established Pharmacologic Class

MeSH — Pharmacological Action

CHEBI — Application

Therefore, I excluded classes from the 3 more problematic sources. This reduced the number of classes from 1,262 to 345, the number of edges from 10,959 to 1,029, and the number of compounds in a class from 1,423 to 724 (commit).

One step would be to salvage many of the filtered classes by manual curation. The majority of the removed classes did not overlap with DO Slim diseases and thus shouldn't confound our analysis. If we decide to curate, we'll have to decide whether to exclude all indications or just indications in DO Slim.

@dhimmel does bosentan indication for hypertension originate from DrugCentral? if so, there might be an error in your pipeline, the files uploaded to Github have pulmonary hypertension as an indication.

Greetings @olegursu! I used transitive closure [1] on to convert diseases to the level of specificity in Hetionet. This is what I meant by saying:

For each disease, I aggregated direct indications as well as indications for subtypes (referred to as propagation).

I think you've picked up on an issue that came up during our curation. Specifically the Disease Ontology defines pulmonary hypertension (DOID:6432) as a subtype of hypertension (DOID:10763). However, our curator considered the definition of hypertension to be distinct from pulmonary hypertension.

So in conclusion, DrugCentral included a bosentan indication for pulmonary hypertension, which was translated to an indication for hypertension in Hetionet. In the future, I'd like to make transitive closure a query-time decision rather than a builtin, but for now that's not the case.

I tested it by searching for "lipitor", and was surprised to find hypertension listed as an "indication". I don't think this is right, and I don't immediately see a way to determine how hypertension was assigned as an indication for lipitor.

However, there is a lot of good information here as well. For example, a search for "nifedipine" turned up a contraindication of which I was entirely unaware, and which I could easily confirm by a web search.

Pouya Khankhanian: @mkgilson Would you be interested in doing a case study of hypertension predictions by this algorithm? For example, a case study similar to the case study done for epilepsy here (https://thinklab.com/p/rephetio). The algorithm made a great number of high probability predictions for hypertension (there can be a link to a file in thinklab here), as it did for epilepsy. We chose to do a case study of epilepsy in part because our three physician curators were all neurologists. It would be great to get input from someone who is more experienced in general internal medicine to evaluate the predictions for hypertension.

Mike Gilson: I'd love to contribute, but don't have time, and also am probably too far from clinical practice these days. I'll bet someone at Stanford could look at this with you, though!

Thank you for feedback! Regarding indications for atorvastatin, most of indications for drugs approved before 2012 come from OMOP v4 which in turn imported data from First Data Bank, while quality assessed by us for few samples for this dataset appears to be high there are still indications which either address a disease symptom or associated co-morbidity and it is not clear what is the actual association. We will amend the data and re-upload.

Atorvastatin for hypertension

@mkgilson, thanks for the feedback. According to the DrugCentral publication, here's their method for compiling indications [1]:

Indications (10,707), contra-indications (27,851) and off-label indications (2496) were initially extracted from OMOP data model version 4.4 (http://omop.org/Vocabularies). Since the OMOP project transitioned to OHDSI (http://www.ohdsi.org), updated drug indication and contra-indication data are covered under a revised license agreement that in turn requires subscription licenses (i.e. it is no longer open-access). Therefore, indications for drugs approved after 2012 (322 pairs) were extracted from approved drug labels and mapped onto SNOMED-CT and UMLS concepts.

Note that we also created a catalog of indications called PharmacotherapyDB. When creating this resource, we had three physicians curate all of our indications. Interestingly, all three of our curators classified atorvastatin (lipitor) as a disease-modifying indication for hypertension. Atorvastatin was also considered disease-modifying for coronary artery disease but not for type 2 diabetes mellitus.

It appears that the verdict is still out on whether statins lower blood pressure [2, 3, 4, 5], but perhaps physicians are prescribing atorvastatin as an off-label treatment for hypertension and this is what our curators picked up on. @pouyakhankhanian, do you remember your reasoning here?

@dhimmel Thanks, Daniel. Before posting my original comment, I did a quick web search for atorvastatin and HTN, and found something very equivocal: there seemed to be supportive statistics, but the mean drop in BP observed was paltry, something like 0.5 - 1 mm Hg, on typical systolic and diastolic values of 120 and 70 mmHg. I'd be interested to know if a more robust effect has in fact been observed.

Though I cannot speak for the other curators, my own clinical suspicion was call atorvastatin as NOT for hypertension (HTN), because, for example, the other two statins in our curation database are also listed as NOT for HTN. As the only curator who was not blind to the other two curators' selections, I saw the choice of DM by the other two reviewers and therefore did a cursory round of research, found [1], which is specific for atorvastatin, and therefore agreed with the other reviewers. Upon more detailed review of this, my thoughts below.

Hyperlipidema (HLD) is treated with an HMG-COARi (the 'statins', such as atorvastatin, simvastatin, lovastatin). The decision to treat HLD with a statin, and the strength of statin to use, is a decision guided by a "risk factor" score which predicts poor cardiovascular outcomes, the ASCVD score is the latest in use in the last few years. The ASCVD risk score and many other scoring systems use your blood pressure as a major factor in determining if and how much statin you get for your HLD.

HTN is treated with antihypertensives, commonly guided by the JNC8 paradigm [2]. The decision to treat HTN and the aggressiveness of therapy is also guided toward reducing poor cardiovascular outcomes.

Therefore, in clinical practice, the treatment of HTN and HLD is generally thought to be really two parts of the same battle, with the goal being to decrease the number of poor cardiovascular outcomes (death or major disability from MI or stroke or PVD). And the latest trend is combination treatments which include statins and antihypertensives such as [3].

Given that clinically we are moving toward the use of mixing antihypertensive drugs with statins in clinical practice, I'm not sure we will have more evidence in the future as to the efficacy of a statin alone (in the absence of antihypertensive use) on hypertension alone (in the absence of hyperlipidemia). For example, note the possibility of confounding between HTN and HLD in the articles referenced by Daniel above. Therefore, the best evidence we have would be [1]. In that case, I suppose one could say atorvastatin is DM for HTN, but I wouldn't disagree with calling it NOT for HTN. Furthermore, one could make a case that all three statins should be DM if one of them is DM, but I would personally think that's too much of a stretch. Here is the list of how the all of the statins were designated by the three curators. You will note the lack of completeness of the PharmacotherapyDB list (every statin not listed for every indication), and I would say this was quite usual other drug classes in the database as well.

drug

disease

CSH

AJG

PK

Lovastatin

atherosclerosis

DM

DM

DM

Pravastatin

atherosclerosis

DM

DM

DM

Rosuvastatin

atherosclerosis

DM

DM

DM

Simvastatin

atherosclerosis

DM

DM

DM

Atorvastatin

coronary artery disease

DM

DM

DM

Lovastatin

coronary artery disease

DM

DM

DM

Pitavastatin

coronary artery disease

DM

DM

DM

Pravastatin

coronary artery disease

DM

DM

DM

Rosuvastatin

coronary artery disease

DM

DM

DM

Simvastatin

coronary artery disease

DM

DM

DM

Atorvastatin

hypertension

DM

DM

DM

Lovastatin

hypertension

NOT

NOT

NOT

Simvastatin

hypertension

NOT

NOT

NOT

Pravastatin

prostate cancer

NOT

DM

NOT

Atorvastatin

type 2 diabetes mellitus

NOT

NOT

NOT

Simvastatin

type 2 diabetes mellitus

NOT

NOT

NOT

Also of interest is how the statins were ranked to help in each disease.

Tudor Oprea: see my post below about CADUET and the most likely (Occam's razor) explanation for how Atorvastatin got annotated as anti-hypertensive. I remain skeptical that this is the case, speaking from a molecular interactions perspective. the algorithm may be biased by the mixtures that feed into the system confounding factors.

Thanks, this is very informative! I agree that treatment of HTN and HLD are two parts of the same battle — reduction of cardiovascular risk — but this in itself would not be a good rationale for saying atorvastatin is indicated for HTN; only that elevated cardiovscular risk may be viewed as an indication for both statins and antihypertensives. If being two parts of the same battle were valid, then one would, by the same token, say that hypercholesterolemia is an indication for antihypertensives!

As to the literature regarding antihypertensive effects of statins– I'm skeptical that any physician would regard HTN as an off-label indication for a statin. I wonder if there is a way to find out...

Pouya Khankhanian: agree that this is not a rationale for saying atorva it is indicated in htn. i present this as a rationale for why we may never know the true answer.

i also wholly agree that physicians do not regard HTN as an off-label indication for a statin. (note that this is not how "DM" was defined).

Daniel Himmelstein: It's a shorthand for disease-modifying indication. In PharmacotherapyDB, the three physicians classified each indication as disease modifying (DM), symptomatic (SYM), or non-indication (NOT).

disease modifying (DM) — a drug that therapeutically changes the underlying or downstream biology of the diseasesymptomatic (SYM) — a drug that treats a significant symptom of the diseasenon-indication (NOT) — a drug that neither therapeutically changes the underlying or downstream biology nor treats a significant symptom of the diseaseGuidelines:

reasonable evidence of efficacy is required to be classified as disease modifying or symptomatic. This includes off-label use.if no classification accurately describes an indication, the most appropriate (although imperfect) classification should be chosen

Amendment 1: if a drug was previously indicated, but is no longer used due to side effects, or because there are better drugs, it is still considered DMAmendment 2: it doesn't matter whether it is first line or fifth line, it's still considered DMAssumptions:

Assumption 1: DM trumps SYM. If a drug is clearly both disease modifying and also treats symptoms, then I will call it disease modifying. This is because most disease modifying drugs also treat symptoms.

Assumption 2: SYM trumps NOT. If a drug is clearly symptomatic treatment, but can actually exacerbate the downstream biology of disease, then I chose SYM. I made this choice because this was the choice I saw most often made by AJG and CSH

I can relate to the challenge of arriving at hard definitions for concepts in biology and medicine that turn out to be complicated and case-dependent!

One thing that comes to mind is that, in medicine, a "symptom" is something a patient experiences. Thus, HTN is not a symptom. Instead, it is a "sign", something the physician may observe. There's the further complexity that essential HTN is probably best regarded as its own disease, whereas secondary HTN (e.g. due to renal artery stenosis), might not be best to regard as its own disease.

Since other people may have similar questions to mine, how about putting your definitions/usage of DM, SYM and NOT in, e.g., the FAQ? Sorry if it's there and I'm missing it.

Back to the details... my off the cuff thought would have been that essentially none of the common HTN drugs are disease modifying because they don't treat the underlying cause. They only compensate for it, so if you stop taking them, the HTN is back the same as ever. So the disease isn't modified. In contrast, an antibiotic truly eliminates the root cause of an infection.

Regarding your comment about a drug not being designated as DM because withdrawal of the drug causes relapse of the disease, one could make the same argument for many other diseases: anti-epileptic drugs do not cure epilepsy, immuno-suppressants do not cure auto-immune disease, and chemo-therapies do not cure most cases of cancer. But this decision (DM vs SYM) is actually academic.

When looking at the input to the algorithm, as you do here, recall that the data feeds in essentially as binary (and I believe SYM was essentially treated as "on" in the main report but we also ran it as "off", @dhimmel would have to confirm this). So to switch all of a disease's agents from SYM to DM would not really change the output of the algorithm. Also recall that there are abundant false negatives in the input data. This level of false negatives was unfortunately quite necessary but also very proved very important in testing the output of the data.

So, assuming a connection is truly DM but we mis-label it as NOT, then that would add to the already abundant false negative rate in the input data and presumably have little effect on the output. Therefore, I would not be strongly against removing any edge (changing DM to NOT) in the input data in general. And I know that @dhimmel tested his algorithm to be robust to such perturbations in the input.

Moving forward, the algorithm is meant to be automatically update-able in the future. I think it would be cool to crowdsource the input, essentially taking a vote as to whether things should be DM or SYM or NOT.

Daniel Himmelstein: Great points. Just wanted to clarify that symptomatic treatments were not used as positives to train the model. Only disease-modifying (DM) treatments were. In fact, symptomatic treatments were considered negatives, but excluding them all together wouldn't have made a big difference (since there were 29,044 negatives, of which only 390 were symptomatic treatments).

I'm perhaps overly influenced by the use of "disease modifying" in the context of rhematoid arthritis: https://en.wikipedia.org/wiki/Disease-modifying_antirheumatic_drug The specific meaning is that such a drug prevents joint damage, rather than just reducing pain. By analogy, I'd agree that antiepileptics are not disease modifying. Other cases get tougher. I'm impressed in any case by the level of care you guys have put into all of this.

Dumb question: what algorithm? I was viewing this only as a database.

Daniel Himmelstein: @mkgilson Just to clear things up. Three different resources have been discussed for whether a drug treats a disease. First, DrugCentral — a resource created by @TIOprea and @olegursu[1] — which is the main topic of this thread. Second, PharmacotherapyDB [2, 3], which is a catalog of physician curated indications that @pouyakhankhanian helped create for this study (Project Rephetio). Third, the Project Rephetio predictions available at http://het.io/repurpose/[4, 5]. DrugCentral and PharmacotherapyDB both compiled known indications and both considered atorvastatin as a treatment for hypertension. I opened an issue on the PharmacotherapyDB GitHub repository, so future versions will correct this. The Project Rephetio predictions are the result of an algorithm, but are not themselves just probabilities of treatment, not definitive evidence of drug efficacy.

Should have responded sooner, but got side-tracked with my own work and was told by @olegursu that he had answered this. So here's my two cents as to why Atorvastatin got annotated as treatment for essential hypertension, an indication "bleeding" from OMOP (now rebranded as OHDSI) that probably should have been carefully revised. First off, I agree with @mkgilson, atorvastatin has no business treating HTN. It simply does not lower blood pressure. Some preliminary results suggested this to be the case, but systematic analysis did not reproduce this. http://www.medscape.org/viewarticle/494555 - in particular the study from Ostra Sjuikuset / Gothenburg / Sweden shows no difference (though the UCSD Statin Study claims a small effect). I went to STITCH and looked at direct evidence for interacting partners between atorvastatin and proteins (http://stitch.embl.de/cgi/network.pl?taskId=JEpR8IjKFotd) but could not piece together any direct (or even indirect) way for this molecule to lower blood pressure. As additional qualifier, I did my PhD in molecular physiology and studied catecholamines for 5 years, and am somewhat familiar with mechanisms for lowering blood pressure. Second, and here's where I hypothesize that FirstDataBank annotators (hence OMOP and now DrugCentral) got this wrong: Atorvastatin is formulated not only as LIPITOR but also as CADUET. And CADUET contains amlodipine besylate in addition to atorvastatin calcium (https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=909fad96-a941-443a-a39f-4f93607410fb). Error understandable, case closed.

@mkgilson: definitely, I would start with 1-active ingredient drugs only, and build my Indications that way. Then go through 2-APIs and match known indications, and look for synergies (e.g., are there new indications for the combo that do not work when taking the 2 drugs separately). And so forth...

Just one additional point of clarification, with respect to the 1-2 mm Hg blood pressure lowering effect of atorvastatin from the UCSD Statin Study group (http://www.medscape.org/viewarticle/494555). In the first 4 years of medical school, I measured blood pressure (manually) for more than 100 patients, as well as 20 healthy volunteers. Differences of 5 mm Hg are found just by shifting from left hand to right hand; measuring the same person the same time, next day, can give that variation; measurements done by someone else (recall this was done using a stethoscope under the cuff) can give even more variations; and so forth. This is important enough that it warrants its own error table... https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3104931/table/T2/. Since that Statin Study group Medscape reference is an abstract at a conference (i.e., no follow-up peer reviewed paper), we can most likely attribute those differences to experimental error, and conclude that the effect is not there.

I agree with you, Tudor. That's why I characterized the drop as "paltry" :-) (Though, in principle, if one averages over enough data, one could resolve a shift in the mean of 2 mm Hg using data with a 5 mm Hg standard deviation.)