Development of an VO Registry Subject Ontology using Automated Methods

Thomas, Brian

Ontologies promise a rich user interaction with large amountsof data. They may be used to map the heterogeneous semantics >which various data repositories use to label their data into a common ontology (or set of ontologies) which describe the aggregate of all available data. This common ontology may in turn then be used to create complex queries which can precisely describe the data of interest using concepts which are familiar to the end user scientist.

The development of such an ontology is non-trivial matter however. Problems include the amount of human effort required to both populate and keep up to date individuals (instances) of the ontology (more data may be added after the initial ontology is developed). Furthermore, there are maintenance costs associated with maintaining the ontology itself. The semantics in use at the various data repositories will evolve (ex. new classes of subjects are added) and the common ontology must evolve to encompass these changes.

The VO Registry presents an ideal test case for developing automated methods to do these tasks. The VO Registry describes many data repositories and, while the registry entries each conform to the data model for the registry, the semantics of the VO Registry model which describe the nature of the data (the "subject" field) are not constrained, and each repository is free to label the subject of the data as they wish. Taming this jungle of terms into a common subject ontology is a difficult task.

We present our work to automate the capture of subject metadata and our solutions for using this information to develop and populate an VO Registry Subject Ontology for the purposes of querying the VO for data at repositories.