What is an ontology?

A set of concepts and categories in a subject area or domain that shows their properties and the relations between them.

Essentially, an ontology’s purpose is to properly define something. In the instance of life sciences organizations, ontologies could be created to categorize diseases, drugs, genotypes/phenotypes, mechanisms of action, and other biomedical concepts. By adding a layer of meaning to raw text, it makes a document easier to synthesize and process further.

“For those who do study ontologies, there’s a very famous concept learned in their first year of university: the pizza ontology,” Lee said. “The idea is that pizzas are split up into bases and toppings, and how those relate to each other. It’s really a conceptualization of a particular domain in a computer-readable format.”

How are ontologies produced?

Ontologies are produced by the scientific community, and funded by both private and public money across the globe. SciBite is in a unique space, in that the organization is both a consumer and a producer of ontologies.

“The ontologies we work with aren’t the result of one, two, three, four people,” Lee said. “They’re the result of thousands of experts, everyone contributing a tiny little bit of knowledge to an overall coherent map of a particular set of cells, tissues, diseases, etc. The power to be able to leverage that expertise in a computer-readable format is incredible.”

This collaborative process gets to the heart of why organizations are doing this research in the first place.

“I think the power is in the openness, the fact that they are done in the public domain, they are free to use by everybody,” Lee said. “It promotes data interoperability, and the ability to do these experiments.”

How can ontologies be applied to text?

When ontologies are applied to text, the result is a semantically-enriched text document.

Lee breaks down the concept with the example of a hedgehog. If you’re not a scientist in the life sciences realm, you’re likely to think of a hedgehog as a little, spiky animal. But to many scientists, hedgehog is the better-known name of a protein that’s critical in cell division, a major process involved in cancer.

“When you say hedgehog to a life scientist, particularly in molecular biology or human genetics, they’re much more likely to be thinking about the hedgehog gene or protein, and not the loveable animal,” Lee said. “When you are trying to apply ontologies to text, and you see the word hedgehog, you’ve got to build systems that say right, OK, this could mean one of two things. I’m not going to annotate it as the hedgehog protein unless I really think it is, and similarly I’m not going to annotate as the hedgehog animal unless there’s something that tells me that it is the animal. That’s disambiguation.”

Today, when organizations like SciBite apply ontologies to text, they’re providing the ability to search through a document, or thousands of documents, to find relevant terms, ultimately enhancing and accelerating the R&D process.