PractiKPharma is a collaborative research project funded by the French National Research Agency (ANR). It studies computer sciences approaches to extract, compare, validate state of the art knowledge in the biomedical domain of Pharmacogneomics.

We focus on the description of phenotypes related to drug responses in term of signs and symptoms e.g., “anemias”, or “low RBC”.

A challenge for the extraction of PGx knowledge is to associate with PGx relationships a validation level on how hypothetical a relationship is. For evaluating knowledge extraction quality we use the content of the Pharmacogenomic Knowledge Base (PharmGKB) developed at Stanford.

2. Knowledge Extraction from Electronic Health Records (EHRs)

We treat Electronic Health Record data from the HEGP and annotate their content with biomedical ontologies. Because EHRs are in French, we reuse and adapt the SIFR Annotator to annotate French clinical narratives with French ontologies.

3. Comparison of state-of-the-art and observational knowledge

Knowledge previously extracted from literature and clinical notes are from different nature and come from multilingual data. We develop a framework for comparing those knowledge units and detect concurring, ambiguous and conflicting knowledge units. We study Formal Concept Analysis to establish correspondences between knowledge representations and data sources. We consequently map ontologies natively used in both worlds (literature and EHRs), considering that they can be either in English or in French.

4. Explaining PGx knowledge using omics data

We consider several omics databases (e.g., DrugBank, Uniprot, KEGG) to investigate molecular mechanisms that may explain adverse reactions of pharmacogenomic drugs. Some of these databases are available as Linked Open Data, facilitating their reuse, some are not. Once these databases connected, we will evaluate novel data analysis methods to provide elements of explanation of how PGx genes impact drug response.

We published in international journals (Journal of BioMedical Semantics, Gigascience, Scientific Reports), in international conferences (EKAW, ISMIS), in international peer-reviewed workshops (LOUHI, NETTAB, BioOntologies, etc.) and a popular article in ERCIM News magazine.

Publications associated with the project are referenced on the HAL platform.

Pharmacogenomic Linked Open Data (PGxLOD)

Linking PGx data

We started with six data sources which focuses on drugs (DrugBank), genes (ClinVar, DisGeNET) and drug responses (Side, MediSpan) and added pharmacogenomic knowledge units extracted from PharmGKB and the litterature. These data sources were mapped and transformed into a single RDF graph available at https://pgxlod.loria.fr. PGxLOD is intended to host pharmacogenomic-related data and knowledge units of various provenance.

Pharmacogenomic Ontology (PGxO)

A simplistic representation of PGx relationships and their provenance

To structure and allow the coexistence of pharmacogenomic knowledge elements of diverse origins within PGxLOD, we propose a minimal ontology, called PGxO. This simple schema enables both representing pharmacogneomic knowledge units and documenting their provenance.

Pharmacogenomic Corpus (PGxCorpus)

Deep Learning for knowledge extraction

We aim at enabling deep learning to extract PGx relationships from text. Because these approaches require large sets of training data we are actively building an manually annotated corpus called PGxCorpus. Meanwhile, we are investigating transfer learning approaches to enable PGx relationship extraction.

SIFR Annotator

Semantic annotation of clinical notes

We have adapted the French Annotator developed within the SIFR project to annotate French clinical text. We added enhanced functionalities such as scoring, detection of context (negation, experiencer, temporality), new output formats, and coarse-grained concept recognition.

Clinical Data Bioinformatics Workflow

Facilitate clinical records process

We developed a bioinformatics pipeline using NextFlow to facilitate the processing and populating of HEGP Electronic Health Records. The resulting workflow enables the design of reproducible studies associated with metadata and parametrization. Molecular biologists use it to annotate tumor variants.