Extracting side effects from SIDER 4

SIDER is a project to extract side effects from drug labels [1], originally motivated by off-target prediction [2]. We evaluated version 2 and produced an online tutorial. We found that side effect similarity was a weak predictor of chemical and indication similarity.

Just two days ago, version 4 was released. Here, we will detail our extraction of side effects from SIDER4.

Initial processing complete

We added the side effects extracted from meddra_all_se.tsv.gz to our network. Overall, the resource contributed 139,235 compound-side effect relationships for 5,745 side effects.

Data quality

Compared to version 2, I subjectively noticed a considerable quality improvement. However, many of the problems inherent to label based NLP extraction remain. I think there are two potential methods for extracting higher confidence side effects:

Number of labels approach: Most drugs have multiple labels. Side effects reported by more labels may be of higher quality. Amphetamine is a good example.

Frequency approach: Some side effects have associated frequency information. Placebo comparisons are also sometimes present. Thus enrichment in frequency compared to placebo, other drugs, or a cutoff is feasible. Ibuprofen is a good example.

The current data release may be insufficient to apply these methods. More documentation is needed. Judging from the webapp the underlying database would support both methods.