Informatics Seminar Recordings

Mining the literature for genes associated with placenta-mediated maternal diseases

Mining the literature for genes associated with placenta-mediated maternal diseases

Automated literature analysis could significantly speed up understanding of the role of the placenta and the impact of its development and functions on the health of the mother and the child. To facilitate automatic extraction of information about placenta-mediated disorders from the literature, we manually annotated genes and proteins, the associated diseases, and the functions and processes involved in the development and function of placenta in a collection of PubMed/MEDLINE abstracts. We developed three baseline approaches to finding sentences containing this information: one based on supervised machine learning (ML) and two based on distant supervision: 1) using automated detection of named entities and 2) using MeSH. We compare the performance of several well-known supervised ML algorithms and identify two approaches, Support Vector Machines (SVM) and Generalized Linear Models (GLM), which yield up to 98% recall precision and F1 score. We demonstrate that distant supervision approaches could be used at the expense of missing up to 15% of relevant documents.

Public Health Information Credibility in the Era of “Fake News”

Helping the Public Evaluate Health Information Credibility in the Era of “Fake Health News”

With the emergence of new Web media platforms and the ubiquity of social media, critical evaluation of online health information has taken on a new dimension and urgency. At the same time, many established information quality evaluation guidelines address information characteristics other than the content (e.g., authority, currency) and do not address information presented via novel Web technologies. This talk will describe a research program that develops a methodological approach for analyzing diverse online health information sources. It will also present a window into the universe of non-evidence-based online health information, particularly as it pertains to the possibility of curing type 2 diabetes. The presentation will use the above evaluation criteria to describe how these sites portray complexity of type 2 diabetes, characterize healthcare establishment, use language and emotional cues, discuss medical research, and convey certainty. It will also address the potential role of technology in supporting users in the changing digital health ecosystem.

As part of the BioCreative VI Track IV we built a supervised relation extraction model capable of taking a test article and returning a list of interacting protein pairs identified by their Entrez Gene IDs. Such pairs represent proteins participating in a binary protein-protein interaction (PPI) relation where the interaction is additionally affected by a genetic mutation (PPIm). In this study, we explored a PPIm relation extraction by deploying a three-component pipeline involving deep learning-based named entity recognition and relation classification models along with a knowledge-based approach for gene normalization. We propose several recall-focused improvements to our original challenge entry which placed 2nd in the competition. On exact matching, the new system achieved test results of 37.78% micro-F1 with a precision of 38.22% and recall of 37.34% which corresponds to an improvement by approximately 3 micro-F1 points. When matching on HomoloGene IDs, we report similarly competitive test results at 46.17% micro-F1 with a precision and recall of 46.67% and 45.59%, corresponding to an improvement of more than 8 micro-F1 points over the prior best result.

Characterization of Critically Ill Patients

Characterization of Critically Ill Patients: A Clinical Application of the Health Facts Data Set

Using the Health Facts EMR data, critically ill pediatric patients that had at least one admission to the Intensive Care Unit (ICU) were characterized in terms of the number of hours they were administered with drugs usually administrated when either intubated or in mechanical ventilation. The study analyzed the vectors containing the number of hours each combination of medicines was administered to each patient during different periods of ICU admission and floor admission, using a class of Bayesian regression models with the Dirichlet-Multinomial distribution for the response and random effects to capture the inherent variability of each encounter and hospital, adjusting for demographic information. During this seminar, we will describe the process of cohort and records selection, the model and the interpretation of the parameters, and the results of the characterization. We will also explain how the model can be used for treatment comparisons for similar patients in different hospitals.

Clinical Outcome Prediction through Deep Learning

Clinical Outcome Prediction through Deep Learning

Accurately predicting clinical outcomes in advance can benefit both healthcare providers and patients, though it remains a challenging task. Artificial Intelligence (AI) has recently generated much excitement due to the breakthroughs made by Deep Neural Networks (DNNs) on many tasks such as image classification and speech recognition. Inspired by those developments, there has been great interest in applying DNNs to the biomedical domain. In this seminar, I will present a DNN-based predictive modeling approach applied to two clinical use cases. DNN models are usually considered as black boxes which would hinder their acceptance by clinicians. Therefore, we also developed a novel method for explaining the predictions of our DNN models.

The FABRIC environment

The FABRIC environment: Architectural Features and Big Data Analytics.

The Flexible Architecture for Building Research Informatics Collaborations (FABRIC) is an informatics platform (in development) which offers a service oriented research toolbox that investigators, clinicians, and patient advocates can use to easily access a wide array of data repositories integrated with customizable query tools. This cloud environment is able to support the rapid formation of dedicated cross-domain research teams, the sharing of raw and de-identified datasets in secured enclaves, and the access to a suite of advanced analytics tools commonly used throughout clinical translational research (CTR). This seminar will focus on some of the architectural features of FABRIC, its connection to the Colonial One HPC cluster, and the integration of the HealthFacts database.