Abstract. The Institute of Medicine's The Computer-Based Patient Record: An Essential Technology for Health Care, Revised Edition, still stands as the most comprehensive set of requirements, analysis, and recommendations for patient record systems [CPR]. It should be adopted as the starting point for all efforts to build consensus around standards in patient record systems. Many years later, a lack of uniform content and format standards remains the biggest barrier to development of a Computer-Based Patient Record (CPR). The experience of recent development in ontology languages and cross-domain ontologies needs to be brought to bear on the recommendations that were developed without benefit of these advances. Through application of several contemporary methodologies in biomedical terminology, knowledge representation, patient record work flow vocabulary, and patient record infrastructure, this position paper proposes a uniform core set of data elements (whose formal semantics are captured in OWL) for the CPR: The CPR ontology.

1. Introduction and Background

The formal definition of a Computer-Based Patient Record as proposed by the Institute of Medicine is of:

an electronic patient record that resides in a system specifically designed to support users by providing accessibility to complete and accurate data, alerts, reminders, clinical decision support systems, links to medical knowledge, and other aids.

This definition describes information systems that are much different from most contemporary Electronic Medical Record (EHR) systems. Most modern EHR systems are not typically required to make use of formal languages and methods in their representations and they often are primarily composed of natural language text entries. We believe it is imperative that modern patient record systems and knowledge bases adopt representation formats that make use of widely adopted formal languages, especially those that are predicate calculus derivatives. In addition other languages such as RDF, RDFS, OWL, Notation 3, RIF, CycML, SWRL have an intuitive "mapping" to the Architecture of the World-wide Web [WEBARCH]. These are languages often associated with the Semantic Web [SW].

Adoption of a consistent definition of a patient record is critical for the same reasons that the clear definition of requirements is a critical component of the use of software development to solve problems.. The institute recognized the importance of this and took care to follow the necessary due diligence in compiling requirements of a CPR. The user requirements [CPR] for patient records and record systems are:

Control and Access: Easy access for patients and their advocates and safeguards against violation of confidentiality.

Training and Implementation: Minimal training required for system use and graduated implementation possible.

Record Content: Uniform core data elements, standardized coding systems and formats, a common data dictionary, and information on outcomes of care and functional status.

Record Format: "Front-page" problem list, ability to "flip through the record", and integration among disciplines and sites of care.

Linkages. Linkages with other information systems, transferability of information among specialties and sites, linkages with other institutional databases and registries, and linkages with records of family members..

The requirement for a set of uniform core elements is directly addressed by the ontology included in this document. We will refer to this as the CPR Ontology. This ontology is meant as a guide and not an enforcement of exclusive use of a single dialect. If other vocabularies exist, it should be considered good practice to at least attempt a map to this ontology in order to ensure a consistent use of language when representing asserted facts (statements) about information that benefits the care of a patient whose longitudinal record (records from different times, providers, and sites of care that are linked to form a lifelong view of a patient's health care experiences [CPR]) is in a patient record system. For the purpose of a canonical example, this ontology includes a mapping to an OWL-DL serialization of the Unified Medical Language System's (UMLS) "Semantic Network" , which provides a consistent categorization of all concepts represented in the UMLS Network. The 54 links between the semantic types provide the structure for the network and represent important relationships in the biomedical domain [UMLS].

2. Attributes of a Computer-based Patient Record System

Upon analysis of the requirements and the state of the art (at the time), the committee identified 12 attributes associated with "comprehensive" CPRs. Below are a few of these:

The CPR contains a problem list that clearly delineates the patient's clinical problems and the current status of each..

The CPR states the logical basis for all diagnoses or conclusions as a means of documenting the clinical rational for decisions about the management of the patient's care. (This documentation should serve as a basis for clinical research).

The CPR can be linked with other clinical records of a patient to provide a longitudinal record of events that may have influenced a person's health..

The CPR system can be linked to both local and remote knowledge, literature, bibliographic, or administrative database and systems (including those containing clinical practice guidelines or clinical decision support capabilities) so that such information is readily available to assist practitioners in decision making.

The CPR can assist and, in some instances, guide the process of clinical problem solving by providing clinicians with decision analysts tools, clinical reminders, prognostic risk assessment, and other clinical aids.

The CPR is sufficiently flexible and expandable to support not only today's basic information needs but also the evolving needs of each clinical specialty and sub-specialty.

The attributes above will serve as the primary requirements which the proposal set forth in this paper will attempt to address

3. Correspondence with Web Architecture

Amongst the attributes listed above, 3 and 4 are directly addressed by the best practices documented within the larger W3C Web Architecture recommendation [WEBARCH]. An entire generation of web-based application infrastructure has successfully demonstrated the ability to harness linkage across electronic documentation and multimedia over the Hypertext Transport Protocol (HTTP). In addition, a new breed of architectural styles which constrain application development with XML-based web frameworks.

In particular, use of Universal Resource Identifiers (URI) as a global identification system as well as a mechanism for accessing referenced resources - dereferencing the URI [WEBARCH] - adequately fulfills the need for relevant links between clinical records. Through pervasive use of URIs, CPRs can be deployed either as a centralized data store (a data warehouse) or as a decentralized collection of systems, each of which supports a focused service [CPR].

4. Post-coordinating Vocabularies

In formulating the core vocabulary proposed here, a methodology for composing medical terminology was followed: post-coordination. Recent systems of medical terminology such as OpenGALEN Common Reference Model (freely available version of the General Architecture for Languages, Encyclopedias, and Nomenclatures in Medicine: GALEN [GALEN]) and SNOMED CT are designed to provide an initial set of terminology as well as a decidable logic foundation against which new terms can be formulated. Unlike traditional terminologies where codes or terms are predefined exhaustively (or with the intention of exhaustiveness), post-coordinating vocabularies tend to demonstrate wide, uniform coverage of a particular domain. At the very least, a dialect of Description Logic (such as OWL or OWL1.1) should be adopted as the basic mechanism for composition.

"General Architecture for Languages, Encyclopædias, and Nomenclatures in Medicine (GALEN) was a consortium of universities and vendors. The project was a three-phase European Union funded project, where GALEN was committed to making reusable and application-independent representations for medical concepts. They developed standards for representing coded patient information to be used in applications for medical records, clinical user interfaces and clinical information systems [67]. But work in GALEN also included systems for natural-language understanding, clinical decision support, and management of coding and classification schemes." - Elisabeth Bayegan (Knowledge Representation for Relevance Ranking of Patient-Record Content in Primary-Care Situations: 4. Efforts in Representing Clinical Information) [POMR_KR]

"Systematized Nomenclature Of MEDicine (SNOMED) is a nomenclature that can be used to index, store, and retrieve information about a patient in a computerized medical record [81]. The SNOMED nomenclature allows clinicians to record entities and observations related to a particular disease [17]. SNOMED is not a classification, it is a coded vocabulary of names and descriptions in health care that allows for multiple coding of terms such as topography (anatomic), morphology, etiology, function, procedures, and occupation. A primary strength of SNOMED can be attributed to the coding complexity of clinical concepts into a multi-axial, multiple-coding nomenclature [93]." - Elisabeth Bayegan (Knowledge Representation for Relevance Ranking of Patient-Record Content in Primary-Care Situations: 4. Efforts in Representing Clinical Information) [POMR_KR]

Either through use of the UMLS Metathesaurus, SNOMED (HL7-based vocabularies are often integrated SNOMED-CT), or GALEN, new terms or terms that exists in these larger vocabularies and have useful meaning for specific domain experts can be composed (or post-coordinated) from the CPR ontology using Description Logic- based formalisms such as OWL. In this way, new domains can be modeled in a controlled way that accommodates any precedent that may already exist in robust biomedical terminology systems and still adhere to an overall logical consistency. Formal, foundational ontologies can have a large role to play in this.

5. Foundational Ontologies

Smith et al define [OBR] the notion of a formal, top-level ontology (in the context of the use of formal biomedical ontology) as those that provide domain-independent theories through a framework of axioms and definitions involving categories. They are marked by a high degree of representational adequacy and are used as controls on the remaining two types of ontologies. The other two are domain reference ontologies and terminology-based application ontologies:

“Domain reference ontologies, such as the Foundational Model of Anatomy (FMA), [...] Terminology-based application ontologies, which are systems of terms (or ‘controlled vocabularies’) purpose-built and designed to meet particular needs, such as annotating biological databases (e.g., the Gene Ontology and other OBO ontologies) or the medical record (e.g., ICD-10, SNOMED)”Smith, Barry et. al. (A Strategy for Improving and Integrating Biomedical Ontologies [OBR])

Masolo et al introduce [WONWEB] a very similar notion of a foundational ontology as an ontology that can be used to negotiate meaning and demonstrates the explicit representation of ontological commitment [KR] through rich and exhaustive axiomatization. Below is an excerpt from Rector, et. al. about GALEN's top-level ontology

"However, just as all ontologies are approximations, all high level ontologies are to some degree arbitrary. Our conceptualisation of the world does not break down into a sequence of disjoint partitions. There are several existing starting points – PENMAN [46], Cyc [17, 47], traditional schemes deriving from Shank [48] and others in the AI/Linguistics field, and those deriving from Pierce via Sowa [45]. GALEN’s is adapted from that of Lenat and Guha." - Alan Rector and Jeremy Rogers (Ontological Issues in using a Description Logic to Represent Medical Concepts: Experience from GALEN [GALEN_EXP])

Modern, formal representations of clinical content should seek alignment with foundational ontologies with a richly-axiomatized usage specification such as DOLCE-Lite, GALEN, DnS (Descriptions and Situation) Ontology, Basic Formal Ontology, Cyc, and the Minimal Ontology of Information Objects and Communication Theory. At various points of composing defining axioms for each term in this OWL ontology, the definitions were vetted against the backdrop of these foundational ontologies. Together, the terms comprise an axiomatic theory for the basic components needed to construct a patient record and populate a patient record system in expressive RDF.

Notice that the approach for composing this collection of terms differs from another (more common) approach where existing, legacy (non-computational) terminology sets are used as the 'foundation' for a mere syntactic mapping to OWL. Such terminology sets are typically merely a list of terms or adhere to Information Models that do not have an axiomatic semantics.

6. 'Problem Orientation

The motivation for the Problem-oriented Medical Record (introduced by Lawrence L. Weed [POMR]) is the notion that a medical record specifically oriented around medical problems associated with a patient is an approach that intuitively accommodates clinical problem solving. Although this approach has yet to gain wide-spread acceptance, it is heavily endorsed by the Institute of Medicine (it is listed as the first defining attribute of a CPR) as well as within Elisabeth Bayegan's comprehensive knowledge representation methodology for patient records as outlined in her 2002 thesis: Knowledge Representation for Relevance Ranking of Patient-Record Contents in Primary-Care Situations [POMR_KR].

Problem-orientation is incorporated via two classes (cpr:screening-act and cpr:medical-problem) that capture theories for the epistemology of medical problems, their causal relations to other clinical activity and outcomes (such as morbidities). Through application of computational inference, views can be generated automatically that emphasize the list of medical problems in a patient record and do so using the POMR categorization scheme (Clinical data, clinical problem data, planning data, and follow-up data [CPR]). The expressiveness of axioms used in the CPR ontology in combination with the foundation ontology it aligns with affords a decent amount of useful inference.

7. Overview of the Method

This section introduces the methodology taken in this document. Several considerations were taking into account. First, the CPR recommendations and requirements as set forth by the Institute of Medicine were considered as primary input. The recommendations provide a strong metric for those areas where the possible benefit of adopting Semantic Web technologies is most evident.

The Reference Terminology for Biomedical Ontology Research and Development [REFTERM] was adopted when discussing more fundamental aspects of the CPR ontology (universals, entities, representations, etc,.).

Elizabeth Bayegan's approach for modelling clinical processes as a work flow model was considered. This approach corresponds well with both the HL7 RIM design intent (2.1 The Act-Centered View of Healthcare: [HL7]) and with the way in which underlying foundational ontologies model processes and events, the actors involved and the roles played. The excerpt below from her thesis summarizes (5.5 Activities in the Primary-care Process Model - Conclusion: [POMR_KR]) the approach:

"We have identified a limited set of generic subprocesses and care activities for the primary-care process that enable us to model the primary-care clinic as an organization that performs processes, which are composed of a set of activities, where each activity defines a purpose and is performed by one or more participants."

Smith et al's analysis of ontological merits of the HL7 Reference Information Model emphasizes a critical distinction between what they call primary and secondary acts [HL7]. The primary act associated with a HL7 Act instance (in a HL7 RIM-based message) is an act of documenting some other secondary act. Typically, In a patient record system, this secondary act is some phenomenon associated with the care of the patient. This distinction (though verbose) is the connection between what they refer to as a 'Reference Ontology of the Healthcare Domain' and a 'Model of Healthcare Information.'

Finally, in selecting a uniform set of terms, correspondence with the UMLS Semantic Network was the primary criteria for measuring the coverage of biomedicine as well as the precision of the definitions. With the exception of representational artifacts, all the terms have a synonym in the UMLS Semantic Network. The human-readable definitions associated with these synonym terms are adopted in the annotations of the OWL classes defined in the CPR ontology.

8. The Core Vocabulary

This section includes a listing of the core terms along with diagrams and screen shots from Protege. The base URI associated with terms in this ontology is: http://purl.org/cpr/owl.

Archetypal Health care Primitives in the UMLS Semantic Network. The UMLS terms below are considered "archetypal primitives". Their synonyms often reappear as the major fields in standard message exchange formats such as SDTM, HL7 RIM, and Clinical Document Architecture (CDA). In addition, they have strong correspondence with the names of code lists and the roots of primitive concept [NORM] hierarchies.

Definitions for Main Terms

clinical-finding ("That which is discovered by direct observation or measurement of an organism attribute or condition, including the clinical history of the patient. The history of the presence of a disease is a 'Finding' and is distinguished from the disease itself.")

medical signs and symptoms ("sign/symptom: An observable manifestation of a disease or condition based on clinical judgment, or a manifestation of a disease or condition which is experienced by the patient and reported as a subjective observation."

health care activity ("An activity of or relating to the practice of medicine or involving the care of patients.")

disease ("disease/syndrome: A condition which alters or interferes with a normal process, state, or activity of an organism. It is usually characterized by the abnormal functioning of one or more of the host's systems, parts, or organs. Included here is a complex of symptoms descriptive of a disorder.")

substance or chemical ("Compounds or substances of definite molecular composition. Chemicals are viewed from two distinct perspectives in the network, functionally and structurally. Almost every chemical concept is assigned at least two types, generally one from the structure hierarchy and at least one from the function hierarchy.")

8.1. Representational Artifact

The primary goal of the CPR ontology is to serve as a minimal reference ontology for the categories of referents most often represented in a Computer-based Patient Record (CPR). The family of HL7 RIM messaging protocols share a reference model of health care information (need a reference to CDISC / BRIDGE) . The CPR ontology is intended to be a foundation that coordinates [OBR] the use of more fine-grained terms such as those found in vocabularies like SNOMED CT. The HL7 family of interchange protocols are primarily concerned with categories such as message, document, record, observation, etc.., that specify how information is composed within a patient record system. These are considered Representational Artifacts [REFTERM] that are fixed in some enduring medium in such a way that they communicate information about instances of categories that would appear in a reference ontology for biomedicine (such as UMLS).

8.1.1. cpr:patient-record

The patient record is a class whose members are representational artifacts composed of relevant clinical information about a specific patient. cpr:patient-records instances are comprised of one or more "entries" or cpr:clinical-descriptions.

8.1.2. cpr:clinical-description

The class whose members are electronic recordings (representational artifacts) of significant natural phenomena by an individual who plays a role in a health care process.

Class: cpr:clinical-description
EquivalentTo:
cpr:representational-artifact
that cpr:representation-of some bfo:Entity
and cpr:composedBy some cpr:person
and dol:has-quality some time:TemporalEntity
and ro:has_part only cpr:representational-artifact

This class maintains a clear separation between primary and secondary acts [HL7]. The primary and secondary acts are related through the cpr:representation-of predicate. The former is the act of recording and the latter is the subject of the recording. Members of this class are the result of carrying out the primary act such that it manifests as an entry in a patient record system (the corresponding bytes). Meta knowledge about the primary act is stored as provenance assertions about the cpr:clinical-description instance. For example, the dc:creator term (from the Dublin Core set of meta data terms) is used below to relate the individual associated with the primary act. In addition, the OWL Time ([OWLTIME]) ontology can be used to model a point in time as a temporal quality of the primary act, using the DOLCE dol:has-quality relation [WONWEB].

8.1.3. cpr:clinical-finding

UMLS T033: Finding - "That which is discovered by direct observation or measurement of an organism attribute or condition, including the clinical history of the patient. The history of the presence of a disease is a 'Finding' and is distinguished from the disease itself."

8.1.4. cpr:medical-sign

UMLS T184: Sign/symptom - "An observable manifestation of a disease or condition based on clinical judgment, or a manifestation of a disease or condition which is experienced by the patient and reported as a subjective observation."

8.2. cpr:symptom

The symptom class consists of the physiological sensation associated with a medical problem reported by a patient. The primary act of recording the symptom results in a clinical-finding that is a representation-of the symptom.

Wordnet definition:

“"(medicine) any sensation or change in bodily function that is experienced by a patient and is associated with a particular disease."”

8.3. Diagnosis (cpr:clinical-diagnosis)

A scientific hypothesis is “a cardinal part of design specification which states testable cause and effect relationships of certain facts or observations.”

8.4. Disease (cpr:pathological-disposition)

UMLS T047: Disease or Syndrome - "A condition which alters or interferes with a normal process, state, or activity of an organism. It is usually characterized by the abnormal functioning of one or more of the host's systems, parts, or organs."

The Basic Formal Ontology defines a disposition as “"A realizable entity that essentially causes a specific process or transformation in the object in which it inheres, under specific circumstances and in conjunction with the laws of nature."” By extension, a pathological-disposition is a disposition that causes a specific pathological process that interferes with the normal behavior of an organism.

8.5. cpr:medical-problem

This defined class corresponds with the notion of a medical problem as defined [PO_CPR] within the Problem-oriented Medical Record methodology. In particular, its instances are:

8.6. Medication (Pharmacologic Substance)

"UMLS T121: Pharmacologic Substance - "A substance used in the treatment or prevention of pathologic disorders. This includes substances that occur naturally in the body and are administered therapeutically."

The UMLS Semantic Network has a decent coverage of chemical substances. In particular, it has a specific category for chemicals grouped by their "functional characteristics or pharmacological activity." cpr:medication falls under this category.

8.7. Anatomical Structures and Spatial Entities

"UMLS T017: Anatomical Structure - "A normal or pathological part of the anatomy or structural organization of an organism."

The two definitions above account for place holders in the UMLS Semantic Network for both material and immaterial anatomical structures (covering both canonical anatomy and anatomic abnormalities). The FMA was initially intended to enhance the anatomical content of UMLS, but was gradually transformed into a reference ontology for anatomy. All indications are that it is the most appropriate reference ontology for anatomy [OBR]. For this reason, the CPR ontology only specifies a placeholder for the most broad of categories in anatomy with the expectation that ontologies such as FMA should be adopted for formalizing these categories via post-coordination.

9. Discussion

9.1. Stative conditions (Syndromes)

Unfortunately, the BFO framework does not seem to have a decent foundation for the notion of a clinical situation, or a state:

"By physiological and pathological state we mean a certain enduring constellation of values of an objects [sic] aggregate physical properties." [OBR]

The DOLCE framework distinguishes stative from eventive occurrences:

"... according to whether it holds of the mereological sum of two of its instances, i.e. if it is cumulative or not. A sitting occurrence is stative since the sum of two sittings is still a sitting occurrence.In general, events differ from situations because they are not assumed to have a description from which they depend. They can be sequenced by some course, but they do not require a description as a unifying criterion. On the other hand, at any time, one can conceive a description that asserts the constraints by which an event of a certian type is such, and in this case, it becomes a situation. [...] If we want to consider all the aspects of a process together, we need to postulate a unifying descriptive set of criteria (i.e. a description), according to which that process is circumstantiated in a situation. The different aspects will arise as a parts of a same situation. " - [WONWEB]

9.2. Negation Inference

Negation inference is problematic for healthcare informatics representation. Rector et.al. outline the various challenges with capturing negative information in a medical record for the benefit of an inference mechanism or database query.

[GALEN_EXP] Rector, A - Rogers, J. Ontological & Practical Issues in using a Description Logic to Represent Medical Concepts: Experience from GALEN. December, 2005. School of Computer Science, The University of Manchester.