THE TASK

A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another, for example, raising its blood drug levels and possibly intensifying its side effects or decreasing drug concentrations and thereby reducing its effectiveness. The detection of DDI is an important research area in patient safety since these interactions can become very dangerous and increase health care costs. Although there are different databases supporting health care professionals in the detection of DDI, these databases are rarely complete, since their update periods can reach three years [1]. Drug interactions are frequently reported in journals of clinical pharmacology and technical reports, making medical literature the most effective source for the detection of DDI. Thus, the management of DDI is a critical issue due to the overwhelming amount of information available on them [2].
Information Extraction (IE) can be of great benefit in the pharmaceutical industry allowing identification and extraction of relevant information on DDI and providing an interesting way of reducing the time spent by health care professionals on reviewing the literature. Moreover, the development of tools for automatically extracting DDI is essential for improving and updating the drug knowledge databases. Most investigation has centered around biological relationships (genetic and protein interactions (PPI)) due mainly to the availability of annotated corpora in the biological domain, a fact that facilitates the evaluation of approaches. Few approaches have focused on the extraction of DDIs.

In the last decade, Information Extraction techniques have received an increasing interest as suitable solution to extract and analyse the huge volume of published documents in the biological domain. The BioCreAtIvE (http://www.biocreative.org/) (Critical Assessment of Information Extraction systems in Biology) challenges have played a key role in improving the Information Extraction techniques applied to the biological domain by providing a common benchmark for evaluating these techniques. Recently, medical and pharmacological domain also benefit from the application of such technology. However, there is no forum to allow the comparison among the various techniques. Likewise the BioCreative challenge evaluation has devoted to provide a common framework for evaluation of text mining driving progress in text mining techniques applied to the biological domain, this task is intended to provide a benchmarck forum for comparasing the latest advances of these techniques applied to the extraction of drug-drug interactions that will enable researchers to compare their algorithms applied to the extraction of drug-drug interactions. We think that this new task is very appealing to groups studying Protein-Protein Interaction (PPI) extraction because they could adapt their systems to extract drug-drug interactions.

We have created a specific corpus, the corpus DrugDDI, consisting of a collection of biomedical texts annotated with drug-drug interactions (DDI). The main value of the DrugDDI corpus comes from its annotation since all the documents have been marked-up with drug-drug interactions by a pharmacist. Although there may be relations between drugs in different sentences, they have not been annotated in the DrugDDI corpus. We provide our corpus in two different fomats: (1) a format based on the information provided by the UMLS MetaMap tool (MMTx) and (2) the unified XML format for Protein-Protein Interaction Extraction proposed in [1]. Hence, participants should choose between these two formats depending on their preferences, since some systems may have no use for MMTx information.

Each team participating in this task will initially have access only to the training data. Later, the teams will have access to unlabeled testing data (that is, there will be shallow syntactic and semantic information provided by MetaMap, but drug-drug interactions are not labelled). The teams will enter their algorithms' guesses for each pair of drugs in the same sentence. The training dataset contains a total of 2809 sentences than contain two or more drugs, although only 1532 contain at least one interaction. A total of 2421 drug-drug interactions have been identified in the training dataset. The test dataset necessary for the evaluation part of the task will be available in May 30th. When DDIExtraction2011 is over, the labels for the testing data will be released to the public. Algorithms will be ranked according to their F-scores.

Participants are encouraged to submit a paper to the workshop in order to describe their systems for DDI extraction to the audience in a regular workshop session together with special invited speakers. Submitted papers will be reviewed by our program committee.