A global network of biomedical relationships derived from text

Percha, Bethany;
Altman, Russ B.

This repository contains labeled, weighted networks of chemical-gene, gene-gene, gene-disease, and chemical-disease relationships based on single sentences in PubMed abstracts. All raw dependency paths are provided in addition to the labeled relationships.

PART I: Connects dependency paths to labels, or "themes". Each record contains a dependency path followed by its score for each theme, and indicators of whether or not the path is part of the flagship path set for each theme (meaning that it was manually reviewed and determined to reflect that theme). The themes themselves are listed below and are in our paper (reference below).

PART II: Connects sentences to dependency paths. It consists of sentences and associated metadata, entity pairs found in the sentences, and dependency paths connecting those entity pairs. Each record contains the following information:

PubMed ID

Sentence number (0 = title)

First entity name, formatted

First entity name, location (characters from start of abstract)

Second entity name, formatted

Second entity name, location

First entity name, raw string

Second entity name, raw string

First entity name, database ID(s)

Second entity name, database ID(s)

First entity type (Chemical, Gene, Disease)

Second entity type (Chemical, Gene, Disease)

Dependency path

Sentence, tokenized

The "with-themes.txt" files only contain dependency paths with corresponding theme assignments from Part I. The plain ".txt" files contain all dependency paths.

This release contains the annotated network for the October 19, 2018 version of PubTator. The version discussed in our paper, below, is an older one - from April 30, 2016. If you're interested in that network, it can be found in Version 1 of this repository. We will be releasing updated networks periodically, as the PubTator community continues to release new versions of named entity annotations for Medline each month or so.