penn_love

(For the Workshop, please follow this link, for SMM4H’18, please follow this link)

Shared Task

The proposed SMM4H shared tasks involve NLP challenges on social media mining for health monitoring and surveillance. This requires processing imbalanced, noisy, real-world, and substantially creative language expressions from social media. The proposed systems should be able to deal with many linguistic variations and semantic complexities in various ways people express medication-related concepts and outcomes. It has been shown in past research that automated systems frequently underperform when exposed to social media text because of the presence of novel/creative phrases and misspellings, and frequent use of idiomatic, ambiguous and sarcastic expressions. The tasks will thus act as a discovery and verification process of what approaches work best for social media data.

Similar to the first three runs of the shared tasks, the data will include annotated collections of posts on Twitter. The training data is already prepared and will be available to the teams registering to participate. This year, we will standardize the competition platform, utilizing Codalab competitions.

The designed system for this sub-task should be able to distinguish tweets reporting an adverse effect (AE) from those that do not, taking into account subtle linguistic variations between adverse effects and indications (the reason to use the medication). This is a rerun of the popular classification task organized in 2016, 2017, and 2018.

For each tweet, the publicly available data set contains: (i) the user ID, (ii) the tweet ID, and (iii) the binary annotation indicating the presence or absence of ADRs, as shown below. The evaluation data will contain the same information, but without the classes. Participating teams should submit their results in the same format as the training set (shown below).

Task 2: Extraction of Adverse Effect mentions

As a follow-up step of Task 1, this task includes identifying the text span of the reported AEs and distinguishing AEs from similar non-AE expressions. AEs are multi-token, descriptive, expressions, so this subtask requires advanced named entity recognition approaches. The data for this sub-task includes 2000+ tweets which are fully annotated for mentions of AEs and indications. This set contains a subset of the tweets from Task 1 tagged as hasADR plus a random set of 800 nonADR tweets. The nonADR subset was annotated for mentions of indications, in order to allow participants to develop techniques to deal with this confusion class.

For each tweet, the publicly available data set contains: (i) the tweet ID, (ii) the user ID, (iii) the start and end of the span and (iv) the annotation indicating an ADR, and Indication or a Drug as shown below. The evaluation data will contain the same information, but without the classes.

Task 3: Normalization of adverse drug reaction mentions (ADR)

This is a mapping task where systems must map colloquial mentions of adverse reactions to standard concept IDs in the MEDDRA vocabulary (preferred terms). It requires a concept normalization system that receives ADR mentions, understands their semantic interpretations, and mapping those to standard concept IDs. As we have seen in the first and second SMM4H shared tasks, this is more challenging and is likely to require a semi-supervised approach to successfully address it. About 9000 annotated mappings will be made available for training and 5000 will be made available for evaluation.

For each ADR mention, the publicly available data set contains: (i) an internal ID, (ii) the mention of the ADR, (iii) the concept ID in the MEDDRA vocabulary. The evaluation data will contain the same information, but without the concept IDs.

This binary classification task identifies whether a tweet indicates a first-person mention of a health concern or condition [1], for example distinguishing if someone personally has an illness or is merely conversing or sharing information about an illness. The goal is to build classification models that can generalize across different health issues, which would reduce the burden of creating illness-specific models and datasets. Toward this end, this task will provide at least three Twitter datasets in different health domains: detecting if someone has the flu [2], detecting if someone was vaccinated [3], and detecting if someone is changing their travel plans to avoid disease [4]. Each task will have approximately 1,000 tweets of labeled data. Two tasks will be given as training data, while a third task will be held out for testing, in order to evaluate whether the trained models can generalize to a completely different health application.