Persons using assistive technology might not be able to fully access information in this file. For assistance, please send e-mail to: mmwrq@cdc.gov. Type 508 Accommodation and the title of the report in the subject line of e-mail.

Removing a Barrier to Computer-Based Outbreak and
Disease Surveillance --- The RODS Open Source Project

Abstract

Introduction: Computer-based outbreak and disease surveillance requires high-quality software that is well-supported
and affordable. Developing software in an open-source framework, which entails free distribution and use of software
and continuous, community-based software development, can produce software with such characteristics, and can do so rapidly.

Objectives: The objective of the Real-Time Outbreak and Disease Surveillance (RODS) Open Source Project is
to accelerate the deployment of computer-based outbreak and disease surveillance systems by writing software
and catalyzing the formation of a community of users, developers, consultants, and scientists who support its use.

Methods: The University of Pittsburgh seeded the Open Source Project by releasing the RODS software under the
GNU General Public License. An infrastructure was created, consisting of a website, mailing lists for developers and users,
designated software developers, and shared code-development tools. These resources are intended to encourage growth of the Open Source Project community. Progress is measured by assessing website usage, number of software downloads, number
of inquiries, number of system deployments, and number of new features or modules added to the code base.

Results: During September--November 2003, users generated 5,370 page views of the project website, 59 software downloads, 20 inquiries, one new deployment, and addition of four features.

Conclusions: Thus far, health departments and companies have been more interested in using the software as is than in customizing or developing new features. The RODS laboratory anticipates that after initial installation has been
completed, health departments and companies will begin to customize the software and contribute their enhancements to the public code base.

Introduction

In October 1999, researchers at the University of
Pittsburghbegan developing the Real-Time Outbreak and
Disease Surveillance system (RODS), with the goal of improving public health agencies' capability to detect a specific threat: a large-scale, surreptitious release of Bacillus
anthracis. The rate of this technology's adoption, although accelerating, is
not commensurate with the severity of the health threats posed by biologic terrorism, emerging infections, and common
disease outbreaks. Such threats warrant rapid deployment; therefore, barriers to the technology's adoption need to be identified
and removed.

This paper describes the evolution of the RODS system, previous efforts to transition the technology, and
the rationale behind the creation of an open-source project. It also describes how the software is licensed, the infrastructure created to enable growth of the RODS open-source community, efforts to publicize the project, metrics collected to assess its progress, the software architecture of the latest version of RODS, and plans for additional software development.

RODS System Description

The first version of RODS collected patient chief-complaint data from eight hospitals in a single health-care system
via Health Level 7 (HL7) (1) messages in real time, categorized these data into syndrome categories by using a classifier based on International Classification of Diseases, Ninth
Revision (ICD-9) codes, aggregated the data into daily syndrome counts,
and analyzed the data for anomalies possibly indicative of disease outbreaks. The system provided an Internet-based
interface enabling users to view the data in graphs and maps (Figure 1). After demonstrating the feasibility of such a system within
a single health-care system in Pittsburgh and conducting research to support the hypothesis that such a system could
detect disease outbreaks (2,3), RODS' developers expanded the system to collect additional data types and then deployed RODS
in multiple states. The application service provider (ASP) version of RODS at the University of Pittsburgh collects
de-identified chief complaints from 76 hospitals in Pennsylvania, Utah, and Ohio
(4,5) and also serves as the user interface for the
National Retail Data Monitor (NRDM), which collects and analyzes daily sales data for over-the-counter (OTC) medication sales (6,7).

The feasibility of rapid deployment of RODS was demonstrated during the 2002 Winter Olympics in Salt Lake City,
Utah (4,8,9). In addition, the capability to integrate other surveillance data types (e.g., electronic laboratory reports [10], free-text chief complaints
(11,12), laboratory orders, dictated radiology reports, dictated hospital reports
[13--15], and poison control center calls
[16]) was added. Much of the code (originally in Perl and C) was rewritten in
Java, and basic research was conducted on data and algorithms relevant to this emerging science
(17).

Technology Transition

The initial effort to make RODS software available involved licensing it for noncommercial use. In December 2002,
the University of Pittsburgh began offering the RODS system as compiled byte code, free of charge to public health
departments. To date, >180 downloads of this version of the RODS system and >200 downloads of the Bayesian parser have been counted. Despitereports of successful installations in Hong Kong [David Wong, Hong Kong RODS Team, personal communication,
May 15, 2003] and Missouri [Terry Tabor, Missouri Department of Health and Senior Services, personal
communication, January 28, 2003], certain state health
departments expressed interest in accessing the RODS source code.

Giving the software away without providing technical support soon proved insufficient. Using the RODS
software requires expertise in database, network, geographic information system (GIS), HL7, and system management, capabilities
not widely available at that time. Users made multiple requests for customization, support, and assistance with
installations, for which resources were not available. Therefore, in September 2003, the University of Pittsburgh released the RODS software under an open-source license, thereby creating the RODS Open Source Project to catalyze the sharing of knowledge and skills related to the software, including its design, installation, configuration, and customization.

Materials and Methods

This section describes the RODS Open Source Project, including the particular license under which RODS is
distributed, the infrastructure created to enable growth of the RODS open-source community, methods for publicizing the project and recruiting developers, and the metrics collected to assess its progress.

GNU General Public License

RODS is distributed as open-source software under the GNU General Public License (GPL)
(17), the same open-source license under which
Linux® is distributed (18).
Unlike the license under which RODS was initially released in
December 2002, GPL permits anyone to use, copy, and modify RODS freely. GPL allows consultants and companies to use, install, support, and customize RODS and permits these entities to redistribute their enhanced versions of RODS, provided they make the source code available. This requirement fosters continuous software improvement, benefiting all users
and preventing companies from creating proprietary, closed-source versions of RODS.

Support for Developers and Users

To coordinate community-based development of the code, the RODS Laboratory organized the Open Source Project.
The RODS modules were classified into six functional areas: data collection, syndrome classification, data warehousing, database encapsulation, outbreak detection, and user
interface. Specialists from the laboratory's research and development group
named development leaders for each functional area. These development leaders are responsible for recommending new features based on user requests and evaluating whether a developer has the qualifications to contribute source code.

Online resources were created to support the Open Source Project, including the RODS Laboratory website (http://www.health.pitt.edu/rods) and a project website hosted on Sourceforge (http://openrods.sourceforge.net). The latter site provides standard software project management tools
(a concurrent versions system server and patch submission area
enabling developers to contribute code), e-mail lists
enabling developers and users to communicate, a software-bug reporting
system, contact information for the development leaders, and source code for stable versions of the
system.

Recruitment of Developers and Users

E-mail announcements were sent to 181 persons who had previously downloaded the byte-compiled releases and to all 226 users in the United States who held passwords to the RODS ASP system. Users were given an opportunity for
a face-to-face meeting with the core developers at two national conferences, the 2003 National Syndromic
Surveillance Conference in New York City and the 2003 American Medical Informatics Fall Symposium in Washington, D.C. Project leaders of other computer-based surveillance projects were also invited.

Metrics

The following metrics are collected monthly to manage the project and assess its progress:

cumulative number of installations;

cumulative number of developers who have contributed code;

number of new features;

funding sources;

cumulative number of mailing list subscribers (one general mailing list, one for announcements, and one for
development questions);

total website page views;

total downloads of source code;

number of e-mail announcements sent;

cumulative number of inquiries from consultants and companies;

cumulative number of inquiries from health departments;

cumulative number of inquiries from academics; and

cumulative number of inquiries from other groups.

The number of installations and the number of contributing developers are considered the two most important metrics.

Results

Current Software Architecture of RODS Version 2.0 and Features in Development

A complete technical description of RODS has been published
(8). This section describes the system's software
architecture and how the modules that comprise that architecture can be used to accomplish different surveillance tasks.

RODS 2.0 consists of >42,000 lines of Java code contributed by a team of eight programmers. RODS is a modular system that adheres to CDC's National Electronic Disease Surveillance System (NEDSS)
(19) and Public Health Information Network (PHIN)
(20) standards so that any of the components can be incorporated into a foreign surveillance system or
used to create a native end-to-end RODS system.

The RODS software architecture consists of six functional areas: data collection, syndrome classification, data warehousing, database encapsulation, outbreak detection, and user
interface (Figure 2). Within the following categories, additional modules are being developed under the Open Source Project (Table 1):

Data collection. The data-collection modules consist of 1) an HL7 listener that accepts and maintains connections from a hospital's HL7-integration engine; 2) an HL7 parser that extracts patient-visit data from HL7 messages; and 3) a text-file parser that extracts patient-visit data from text files uploaded in batches by non-HL7--capable hospitals. In addition to modules to parse patient data from HL7 messages, modules are being developed to parse microbiology culture
results from HL7 messages and to import poison center call data to RODS.

Another module is proposed that will fully integrate detailed OTC medication sales data from the NRDM.
Also planned is an extensible markup language (XML) module that works with proposed or currently used
XML-document--type definitions for public health surveillance data
(21,22).

Syndrome classification. RODS Version 2.0 consists of a single module for syndrome classification, Complaint
Classifier (CoCo) (12). CoCo uses a naïve Bayesian classifier to assign a free-text chief complaint to a syndrome category. These syndrome categories are user-specifiable, and the mappings are created automatically through
machine learning from a user-provided training set.
The RODS Laboratory has rewritten (in Java) and
intends to release a module for ICD-9--based classification (8). Additional classification modules, including keyword-based methods and additional natural language processing
modules to identify radiology reports indicative of inhalational anthrax
(15), are in development.

Data warehousing. These modules function to store and provide efficient access to surveillance data. RODS efficiently stores and retrieves time-series data from the database through a data warehouse. The data-warehousing module consists of a cache table updater that keeps running counts of the number of visits for each syndrome, stratified by age and sex.
RODS 2.0 assumes the existence of an Oracle database. However, RODS does not use Oracle-specific
structured query language (SQL) functions (e.g., database triggers), and a port to an alternative relational database system
(e.g., PostgreSQLor Microsoft SQL Server) should be straightforward.

Detection algorithm. The detection-algorithm modules provided in the current open-source release include an implementation of the recursive least-squared (RLS)
algorithm (23) and an initial implementation of a wavelet-detection algorithm. The RLS algorithm can
detect sudden increases in daily surveillance data counts (e.g.,
an increase in the number of respiratory-type visits that would accompany a large-scale, covert release of
Bacillus anthracis). The wavelet algorithm can automatically model weekly, monthly, and seasonal data fluctuations. NRDM uses wavelet modeling to indicate zip-code areas in which OTC medication sales are substantially
increased; this algorithm will be applied to the analysis of health-care registration data.
Another set of modules are planned that will enable any outbreak-detection algorithm to analyze data from the system. Currently, the architecture allows algorithms written or wrapped in Java to retrieve data directly from the
database-encapsulation modules. A module will be released that outputs data as common text files so that stand-alone algorithms and statistical software packages can be used to analyze the data. This method was used by the What's Strange About Recent Events algorithm (WSARE) to analyze data from RODS during the Salt Lake 2002 Olympic Winter Games (24).

User interfaces. These modules 1) authenticate users, 2) display surveillance data as time-series graphs, and 3) work with a GIS to depict data spatially. The graphing and GIS modules consist of Java server pages and servlets that use
JFreeChart, an open-source graphing package, and the GIS functions of Environmental Systems Research Institute's ArcIMS software.

Certain state health departments have requested Lightweight Directory Access Protocol (LDAP) support to enable the creation of seamless links between existing state surveillance systems and the surveillance functions provided by RODS; outside development of such a module is encouraged.

State, local, or national health departments can use RODS modules to collect, analyze, and view hospital surveillance data and to view OTC medication sales data from NRDM. A health department can use a subset of these modules to accomplish
a specific surveillance task (e.g., receiving and processing free-text chief complaints from hospitals), or it can use all of them (with the RODS database, analytic modules, and user interface) to create an end-to-end surveillance solution. (Examples of how health departments can mix and match RODS modules for different surveillance tasks are available at
http://openrods.sourceforge.net.)

Project Metrics

A total of 480 e-mail announcements about the RODS Open Source Project were sent during the first 3 months of
the project. This publicity generated 5,370 page views of the project website, 59 downloads of the source code, and
14 new members to the project mailing lists. One additional
installation is using the open-source version of RODS.

To date, users are more interested in using the software "as is" and less interested in collaborative feature development. For example, users have asked when the ICD-9 classifier module will be released or whether the system yet works with
Microsoft SQL Server. Developers at the RODS Laboratory contributed four new features (drilldown of age and sex, customized jurisdictions, a simplified GIS interface, and user preferences) (Table 2). However, at least one health department and
one consulting company have expressed interest in collaborating to develop a module that will import XML data into RODS.

Discussion

The goal of the RODS Open Source Project is to accelerate the deployment of computer-based outbreak and
disease surveillance systems by writing high-quality surveillance software and catalyzing the formation of a community of users, developers, consultants, and scientists. In the initial years of computer-based outbreak and disease surveillance system development, the main barriers to deployment appeared to be doubts about its efficacy, cost of the technology, concerns about the cost and effect of false alerts on the practice of public health, and legal and administrative issues (25,26). Basic research about data and detectability has been conducted to address concerns about efficacy (2,3,27--29). To address concerns about the
effects of false alerts, the RODS laboratory has deployed systems and discovered that persons working in health
departments could incorporate the output of these systems into their workflows
(4,7). The deployments also established that the cost
and effort of deployment is much lower than expected. Finally, the deployments demonstrated that certain concerns about privacy
could be addressed. The Health Information Portability and Accountability Act of 1996 (HIPAA), which had not yet
become law, nevertheless had a substantial inhibitory effect on hospitals and other covered entities that had data needed by the project. The enactment of the final privacy rule, precedents set by system deployments
(4,30--32), and new state laws have helped address certain concerns of data providers
(33).

Open-source projects can create a community of like-minded persons --- scientists, programmers, consultants, and users
--- who have the vision of creating innovative,
well-supported software. The importance of catalyzing such a community
cannot be overstated. It can strengthen the position of information technology (IT) managers and public health officials who wish to deploy computer-based surveillance systems during planning deliberations. They will be able to assure their supervisors
that source code is available, that a pool of developers and consultants exists who can be hired to support the health department if needed, and that ongoing projects in other health departments can help them predict project costs and set
appropriate timelines.

The RODS Open Source Project enables public health professionals to have a greater role in developing IT solutions to
the problem of early detection. Just as public health researchers publish their results in scientific journals, so can they contribute publicly available IT solutions to the RODS Open Source Project. This role might become more apparent as public health personnel become increasingly knowledgeable about public health informatics and work more closely with IT subcontractors and consultants.

Continued goals for the RODS Open Source Project are to increase the number of deployments, developers, and
supporters of the software. The proposed path for RODS software development is to increase the number of data types the system can accept and implement a range of high-performance outbreak-detection algorithms. One consulting company and one
health department have separately expressed interest in collaboratively developing an XML module that can parse non-RODS data sources. The RODS Laboratory and its collaborators at the Auton Laboratory will continue to develop outbreak-detection algorithms (e.g., the wavelet-detection module and WSARE, respectively).

Conclusion

The RODS Open Source Project is making software modules available that span the spectrum of processing
tasks involved in public health surveillance. Through open source, the project hopes to accelerate the deployment of real-time public health surveillance by lowering costs, increasing reliability, preventing vendor lock-in, and ensuring
software customizability. By catalyzing the formation of a community of open-source public health surveillance software advocates,
this approach will result in a high-quality software product that achieves mainstream acceptance.

Acknowledgments

The RODS Open Source Project is supported by the Pennsylvania Department of Health Bioinformatics Grant ME-107.

Raymond ES. The cathedral and the bazaar: musings on Linux and Open Source by an accidental revolutionary. Rev. ed. Beijing; Cambridge,
MA: O'Reilly, 2001.

CDC. National Electronic Disease Surveillance System: the surveillance and monitoring component of the Public Health Information
Network. Atlanta, GA: US Department of Health and Human Services, CDC, 2004. Available at
http://www.cdc.gov/nedss/.

Use of trade names and commercial sources is for identification only and does not imply endorsement by the U.S. Department of
Health and Human Services.References to non-CDC sites on the Internet are
provided as a service to MMWR readers and do not constitute or imply
endorsement of these organizations or their programs by CDC or the U.S.
Department of Health and Human Services. CDC is not responsible for the content
of pages found at these sites. URL addresses listed in MMWR were current as of
the date of publication.

DisclaimerAll MMWR HTML versions of articles are electronic conversions from ASCII text
into HTML. This conversion may have resulted in character translation or format errors in the HTML version.
Users should not rely on this HTML document, but are referred to the electronic PDF version and/or
the original MMWR paper copy for the official text, figures, and tables.
An original paper copy of this issue can be obtained from the Superintendent of Documents,
U.S. Government Printing Office (GPO), Washington, DC 20402-9371; telephone: (202) 512-1800.
Contact GPO for current prices.

**Questions or messages regarding errors in formatting should be addressed to
mmwrq@cdc.gov.