Abstract

Background

Text-mining can assist biomedical researchers in reducing information overload by
extracting useful knowledge from large collections of text. We developed a novel text-mining
method based on analyzing the network structure created by symbol co-occurrences as
a way to extend the capabilities of knowledge extraction. The method was applied to
the task of automatic gene and protein name synonym extraction.

Results

Performance was measured on a test set consisting of about 50,000 abstracts from one
year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a
gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and
21.36% recall), with high efficiency in the use of seed pairs.

Conclusion

The method performs comparably with other studied methods, does not rely on sophisticated
named-entity recognition, and requires little initial seed knowledge.