Abstract

Motivation

In this paper we demonstrate the usage of RIO; a framework for detecting syntactic
regularities using cluster analysis of the entities in the signature of an ontology.
Quality assurance in ontologies is vital for their use in real applications, as well
as a complex and difficult task. It is also important to have such methods and tools
when the ontology lacks documentation and the user cannot consult the ontology developers
to understand its construction. One aspect of quality assurance is checking how well
an ontology complies with established ‘coding standards’; is the ontology regular in how descriptions of different types of entities are axiomatised? Is there a similar
way to describe them and are there any corner cases that are not covered by a pattern?
Detection of regularities and irregularities in axiom patterns should provide ontology
authors and quality inspectors with a level of abstraction such that compliance to
coding standards can be automated. However, there is a lack of such reverse ontology
engineering methods and tools.

Results

RIO framework allows regularities to be detected in an OWL ontology, i.e. repetitive
structures in the axioms of an ontology. We describe the use of standard machine learning
approaches to make clusters of similar entities and generalise over their axioms to
find regularities. This abstraction allows matches to, and deviations from, an ontology’s
patterns to be shown. We demonstrate its usage with the inspection of three modules
from SNOMED-CT, a large medical terminology, that cover “Present” and “Absent” findings,
as well as “Chronic” and “Acute” findings. The module sizes are 5 065, 20 688 and
19 812 asserted axioms. They are analysed in terms of their types and number of regularities
and irregularities in the asserted axioms of the ontology. The analysis showed that
some modules of the terminology, which were expected to instantiate a pattern described
in the SNOMED-CT technical guide, were found to have a high number of regularity deviations.
A subset of these were categorised as “design defects” by verifying them with past
work on the quality assurance of SNOMED-CT. These were mainly incomplete descriptions.
In the worst case, the expected patterns described in the technical guide were followed
by only 5% of the axioms in the module.

Conclusion

It is possible to automatically detect regularities and then inspect irregularities
in an ontology. We argue that RIO is a tool to find and report such matches and mismatches,
for evaluations by the domain experts. We have demonstrated that standard clustering
techniques from machine learning can offer a tool in the drive for quality assurance
in ontologies.