Text Mining Improves Chemical-Gene-Disease Curation

NIEHS grantees report that text mining can help rank more relevant scientific research for inclusion in the Comparative Toxicogenomics Database (CTD). The CTD is a public resource that provides information on chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles.

The researchers used a text-mining approach that assigns each article a document relevancy score, with a high score indicating that the article is more likely relevant for the CTD. They tested this approach on 14,904 articles covering seven heavy metals and found that integrating text mining with their current system of manual curation helped prioritize more relevant articles, increasing productivity by 27 percent and novel data content by 100 percent.