WikiFactMine: Liberating facts for Wikimedia

ContentMine was funded by Wikimedia Foundation (the organisation which operates
Wikipedia) to run a project called WikiFactMine. WikiFactMine sought to make Wikidata the
central resource for identifying objects in bioscience.

Wikidata is a free and open knowledge base that can be used by machines and humans
alike. It is a store of structured data that used by Wikipedia and other Wikimedia projects, as
well as by individuals.

Our team extracted facts (words) linked to concepts in bioscience to build dictionaries of
these words from a variety of sources and we then linked them to Wikidata.
We not only extracted words themselves, but also a small amount of the surrounding text to
provide context. For example, we have made a list of all those plants in Wikidata that yield
cereals.

The tools we developed are extremely powerful; in the case of cereal outlined above,
whenever a paper mentions a cereal-producing plant we will be notified, even if are not
aware that the plant produced cereals. To date using our Fatemah tool, we have
contributed more than 10 million items to Wikidata from individual scientific papers!