README.md

This project provides a neural network based multi-task learning framework for biomedical named entity recognition (BioNER).

The implementation is based on the PyTorch library. Our model collectively trains different biomedical entity types to build a unified model that benefits the training of each single entity type and achieves a significantly better performance compared with the state-of-the-art BioNER systems.

Links

Installation

For training, a GPU is strongly recommended for speed. CPU is supported but training could be extremely slow.

PyTorch

The code is based on PyTorch. You can find installation instructions here.

Dependencies

The code is written in Python 3.6. Its dependencies are summarized in the file requirements.txt. You can install these dependencies like this:

pip3 install -r requirements.txt

Quick Start

To reproduce the results in our paper, you can first download the corpora and the embedding file from here, unzip the folder data_bioner_5/ and put it under the main folder ./. Then the following running script can be used to run the model.

./run_lm-lstm-crf5.sh

Data

We use five biomedical corpora collected by Crichton et al. for biomedical NER. The dataset is publicly available and can be downloaded from here. The details of each dataset are listed below:

Embedding

We initialize the word embedding matrix with pre-trained word vectors from Pyysalo et al., 2013. These word vectors are
trained using the skip-gram model on the PubMed abstracts together with all the full-text articles
from PubMed Central (PMC) and a Wikipedia dump. You can download the embedding files from here.

Usage

train_wc.py is the script for our multi-task LSTM-CRF model.
The usages of it can be accessed by

Users may incorporate an arbitrary number of corpora into the training process. In each epoch, our model randomly selects one dataset i. We use training set i to learn the parameters and developing set i to evaluate the performance. If the current model achieves the best performance for dataset i on the developing set, we will then calculate the precision, recall and F1 on testing set i.

Benchmarks

Here we compare our model with recent state-of-the-art models on the five datasets mentioned above. We use F1 score as the evaluation metric.