Semantics in Enterprise Search

Enterprise Search, a complex challenge in extracting accurate results from the ever-increasing volumes and variability of data.
But (and it’s a big but), how does a search system know what to look for?

The role of semantics

Enterprise Search, a complex challenge in extracting accurate results from the ever-increasing volumes and variability of data.

But (and it’s a big but), how does a search system know what to look for? Hammond’s syndrome, Anton Vogt syndrome, Athetosis – all terms referring to the same condition yet a computer will not be able to determine this without some help. Try searching for these in your web browser, do you get the same results? No?

Here lies the role of semantics to bring clarity to synonymous data.

Semantics are at the heart of what we do at SciBite. We offer a collection of over 85 scientific vocabularies covering a diverse range of topics across the Biopharmaceutical R&D process (Genes, Adverse Events, Pathology, Indications etc.). Each vocabulary contains a list of terms (we call them entities) and their various synonyms, which, enable searching to become more scientifically aware and ultimately simplifying the experience for the end user – nobody wants a system more complicated than it needs to be right?

Keeping up to date

So you’re now sold on the power of Semantic enrichment. The next step is to ensure that these vocabularies are both exhaustive and well maintained, as what is the value if they are out of date?

Our semantic library is constantly updated and expanded by a team of scientific experts. Across the collection of vocabularies we can count over 20 million synonyms, many fold enriched on what may be available in the public domain.

Consider MeSH, a series of controlled vocabularies for the purpose of indexing information for life sciences a reference example. Search for Hammond’s syndrome and MeSH has 14 synonyms; SciBite has twice that at 28. Abetalipoproteinemia has 7 synonyms in MeSH, SciBite has >100. A greater reference library will yield more extensive results, simple.

Hedgehog – the gene or animal?

Once a system can identify and extract entities, it then needs to determine the correct meaning in the case of ambiguity. EGFR could be Epidermal Growth Factor Receptor or Estimated-Glomerular Filtration Rate. Our tools not only identify and extract entities in text but they also deal with disambiguation to uncover, where possible, the correct meaning of such terms.

Related articles

TERMite v5.9 now available

The Evolution of Data

Over the 50 years how we collect and play music has changed dramatically from physical copies on Vinyl through to electronic mp3s. Each new technology often requires a new device and format to play yet it is still essentially just music.