Infectious diseases inflict a disproportionate burden on developing countries. A lack of adequate treatment and prevention resources has resulted in a higher prevalence of infectious disease among the world's poorest people. This underscores the need for approaches to informatics support which leverage limited resources as effectively and efficiently as possible. Increasingly, response to infectious disease requires the use of information from multiple, constantly changing data sources. Currently, such information is collected using discipline-specific methodologies and is stored in heterogeneous databases, electronic health records, paper charts, and clinical and public health data repositories. Infectious disease data thus often remains only locally accessible, and because it
is expressed in incompatible formats it does not allow for broad spectrum computational processing, querying, inference, or verification. Data silos hinder translational and comparative research. Aggregating data through use of a common data format and a common terminology (i.e., an ontology) allows a variety of secondary uses of data such as: rapid determination of pathogen type in infections or disease outbreaks, treatment decision support based on genetic characteristics of host and pathogen (e.g., drug resistance), and research to improve understanding of disease pathogenesis leading to development of new types of treatments. The Infectious Disease Ontology (IDO) addresses the problem of data silos by providing a consistent terminology, taxonomy, and logical representation of entities relevant to all infectious diseases. IDO is already being applied to the study of seven diseases, including diseases of bacterial, viral, and eukaryotic origin. The objectives of this workshop are to introduce IDO and the methodology for creating disease-specific IDO extensions, to present applications of the ontologies to the study of
Malaria, HIV, and Influenza, and to open up the IDO enterprise to a wider audience of medical informaticians.

Workshop Description

The workshop will consist of five presentations addressing the scope, design, evolution, and practical utility of IDO. Specifically, the first two will describe the principles of IDO and of the ontologies derived from it, and provide also a description of existing database efforts in the infectious disease domain. The remaining presentations will describe disease-specific ontologies developed from IDO and their application to specific data integration and processing tasks, including the planning of disease control measures, the integration of data from
studies in humans and studies in model organisms, the study of co-infection, disease surveillance, and the study of viral evolution.

Presenters

Talk Abstracts

Infectious Disease Ontology: The Very Idea. Speaker: Barry Smith

When computers are used in the storage and processing of data about infectious diseases and their causes, incidence, and treatment, there are obvious advantages to the use of a common controlled vocabulary or ‘ontology’. By making data comparable even when it derives from different sources, the ontology not only aids information-driven research directed towards elucidation of the disease mechanisms involved and the development of novel therapeutics, it also enhances our ability to use legacy disease data in rapid analysis of data pertaining to novel pathogens or mutations. This talk will describe the IDO strategy for creating a common ontology resource that can support these ends, and outline the work of the IDO Consortium, which is attempting to realize this strategy in a variety of disease domains by means of a general purpose core containing disease-neutral terms (such as ‘host’, ‘pathogen’, ‘virulence’) together with a number of extension vocabularies created by communities of researchers managing data pertaining to specific human, animal and plant infectious diseases, as well as by vaccine researchers. One advantage of this strategy is that, as each new group of infectious disease researchers confronts the need for a controlled vocabulary to represent the phenomena in its specific domain, it has a ready-made set of terms to begin this process in the application of which it can draw on the lessons learned by others while at the same time ensuring interoperability of their own data with the data collected in other disease domains.

Introduction to the IDO Core. Speaker: Albert Goldfain

Despite the recent surge of interest in biomedical ontology, there was until recently little ontology coverage of the infectious disease domain, resulting in both an urgent need for ontology development in this field and the opportunity for a coordinated, community-wide development effort producing broad interoperability across the disease-specific specialties and across the clinical care, public health, and biomedical research domains. To provide the foundation for such an effort, we have developed a general infectious disease ontology (IDO Core) designed to serve as a central ontology (hub) from which disease-specific extensions (spokes) can be built through a process of specialization. The IDO core was designed i) to provide ontology coverage of terms generally relevant to infectious disease research, ii) to ensure interoperability between the IDO extensions, and iii) to achieve these ends on the basis of a W3C standard logical formalism intended to ensure extensibility of the ontologies while preserving their utility for computational applications. Each extension can be developed and maintained by domain experts, allowing for rapid progress towards the needed set of ontologies, ensuring biological accuracy of the extensions, and increasing the likelihood of broad adoption by the infectious disease research community. IDO Core thus provides an invaluable ontology resource for infectious disease researchers, allowing cross-domain data integration and supporting the computation-intensive data processing and analysis tasks becoming the basis of biomedical research.

IDO Core makes distinctions between infections, infectious diseases, infectious disease courses, the diagnosis of infectious disease, and the signs and symptoms of infectious diseases. The conflation of any of these entities can lead to incoherent reasoning, inconsistent models of specific diseases, and even medical errors in electronic health records.

As a case-study in the use of IDO across diseases, we will describe the IDO Core representation of protective resistance. The resistance of pathogens to certain drugs is a central obstacle to the treatment and management of infectious disease. More generally, resistance can be used to describe phenomena such as the immunity of an individual to specific diseases and the resistance of disorders to specific treatments.

Malaria Ontology (IDOMAL) Speaker: Christos Louis

Hundreds of millions of cases of vector-borne diseases occur annually, the vast majority of which affect populations in tropical regions of the world. Malaria, the most prominent among them, causes more than one million deaths each year, mostly among small children in these areas. For a number of reasons, including the development of resistance against both drugs and insecticides, the numbers of cases, and of deaths, has not decreased substantially, in spite of the progress achieved in medical sciences in recent decades. There is there-fore a need for new tools that will help alleviate this problem. These will include IT tools such as decision support systems (DSS), which, especially in the cases of emerging epidemics, will use data collections to help local authorities plan their disease control measures. This module will describe our work on the IDOMAL malaria ontology, an IDO extension ontology that is designed to advance work in this broad area and will be used to drive both databases and the DSSs that rely on them. IDOMAL will contain terms from all four corners of the malaria domain, which is to say: the biology of vectors and of disease, epidemiology, and clinical features. We have already recruited the participation of expert collaborators in order to ensure the broadest and most accurate coverage of the disease. Our plans for the future include the expansion of IDOMAL to cover other vector-borne diseases.

HIV Ontology (IDOHIV) Speaker: Lindsay G. Cowell

In 2007, 33 million people were living with Human Immunodeficiency Virus (HIV), including 2 million children. Annually, there are approximately 2 million deaths as a result of HIV, with the leading cause of death being co-infection with Myco-bacterium tuberculosis (Mtb), a problem that is increasing with the appearance of new, highly virulent, drug-resistant strains of Mtb. A number of factors suggest that real progress towards alleviating the global HIV burden will require development of the terminological and logical infrastructure for broad data interoperability, and in particular for data interoperability across multiple disease domains. These include the central role of secondary infections in the HIV disease course, the importance of model infections to the study of HIV pathogenesis and vaccine efficacy, and the increasing need to synchronize the collection of data from patients enrolled in observational studies and clinical trials being carried out throughout the world. IDO is designed to provide the basis for the needed common terminological and logical architecture. This talk will describe the HIV Ontology (IDOHIV) developed as an IDO extension and will sketch the application of IDOHIV to i) the integration of results from Simian Immunodeficiency Virus (SIV) and HIV studies, ii) the study of HIV co-infection with Mtb and Hepatitis C Virus (HCV), and iii) the creation of an interactive, comprehensive database for the Duke Center for Aids Research (CFAR).

Intended Audience

The workshop does not assume any prior knowledge of ontologies or the infectious disease domain, and is aimed at a broad audience, including:

medical informaticians

biologists

health care providers

epidemiologists and public health workers

bioinformatics researchers

biomedical researchers

computer scientists

Goals

To demonstrate the advantages of a common approach for annotating infectious disease data

To show how the IDO terminology can be used as a teaching tool.

To showcase the multidisciplinary nature of IDO.

To enable interested attendees to begin using IDO for annotating their own datasets.

To enable interested attendees to begin creating their own IDO extensions.