Chapter 10 of the book "Clinical Decision Support - The Road Ahead"
The chapter introduces a number of prediction methods that are currently in practical use for Clinical Decision Support.
The authors explain what models are used (or not used) in the community, and how models are evaluated. In particular,
"simple, understandble models" (such as linear and logidtic regression) are preffered to "Sophisticated models" such as
SVMs and neural networks.
Here is a short summary of the provided case studies:

APACHE (Acute Physiology and Chronic Health Evaluation) series of models
Predict the individual patient's risk of hospital death, based on a variety of physiological variables
(using linear regression)

Pneumonia Severity of Illness Index
Predicts the risk of death within 30 days for adult patients with pneumonia. The authors claim that
using this model, 26 to 31 percent of patients can be treated safely as outpatients, which would
result in a savings of more than 1.2 Billion dollars per year in US.

Penetration and Availability of Clinical Decision Support in Commercial Systems

This chapter provides an overview of the progress of availability
of commercial Clinical Decision Support systems and their usage
in health care organizations. In particular, the chapter focuses
the progress of availability and usage of CPOE (Computerized
Physicisian Order ENtry) and ÒComplex CDSÓ from 2002 to 2005.
The chapter also discusses a number of challenges for deploying
CDS systems, fro mthe point of view of healthcare proffesionals.

HELP system is one of the CDSSs. This system has been developed and tested for more than 25 years and it has been installed in about 20 hospitals operated by Intermountain Healthcare (IHC).
Clinical decision support in HELP system contains two different subsystems. One is for defining data to make decision and the other is for encoding the conversion of raw data into decision. This information system includes two types of CDSSs. First one focuses on limited medical conditions and it is just simple logic which doesn’t need much data. The second type tries to predict a diagnosis among diagnostic entities using raw data. Alerting system, critiquing system and suggestion system are three of these experimental diagnostic applications. These are all rule-based systems and the protocols are created by physicians, nurses and specialists.
Another system in HELP is diagnostic decision support system (DDSS) which differs from the clinical decision support systems. There are two types of applications in this system: proven diagnostic applications and research into complex diagnostic applications. The proven diagnostic application contains "adverse drug events" which is a rule-based system and "Nosocomial infections" which is based on different decision algorithms and uses logistic regression to estimate the risk of Nosocomial infections. DDSS also contains an "antibiotic assistant" which collects some relevant data for the physician and suggests some appropriate course of therapy for that patient. It also allows the physician to review the experience of the hospital for infections in short and long terms. Research into complex diagnostic applications is used to assist in collecting data using questionnaires and to assess the quality of medical reports.

This paper talks about using Bayesian Networks for disease outbreak detection. They have evaluated a performance called PANDA (Population-wide ANomaly Detection and Assessment). The concentration is on modeling non-contagious outbreak diseases, such as airborne anthrax. The network that is used for this work is divided into 3 groups: global(G), interface(I) and people(P). There are two assumptions in this network. First is that the I nodes d-separate the P subnetworks form each other and second, it also d-separates nodes in G from the nodes in P.
In the network, people with the same attributes are group in same class to make the calculations easier. And because this technique has been applied or millions of people, the calculations will be time consuming. So instead of recalculating each time the status of a person changes, they just incrementally update the calculation.
The evaluation is done using a simulator. They have injected simulated ED cases into a background of actual ED cases. Given weather conditions (from Historical meteorological conditions for a region), parameters for location and amount of airborne anthrax a Gaussian plume model derives the concentration of anthrax spores that are estimated to exist in each zip code.
And at last they have compared a non-spatial with a spatial model and realized that with spatial data they can get better results according to false positive rate.

The paper discusses using POMDPs (Partially Observable Markov Decision Processes) on an already
exisiting outbreak detection method, in order to improve the accuracy of current detection methods.
The main idea is that one can Quantify the potential costs and effects of intervention, and use it
to optimize the alarm function. The model consists of:

A cost associated with each action (Investigation (false and true positive),
Intervention (false and true positive), Outbreak by day (false negative))

The paper concludes that POMDPs can improve the accuracy of the current outbreak detection methods. However,
there still seems to be an unacceptable level of falase alarams, e.g. in every 100 days , we will have 3 false alarms!

Chen et al. investigate the possible effects of multiple drug exposures at different stages of pregnancy on preterm birth, using SmartRule, a data mining technique for generating associative rules. In this work, two subsets of Danish National Birth Cohort (DNBC) dataset are used. The first subset contains 4454 records including 1000 women who were depressed and/or exposed to various active drugs. This set is used for finding the side effects of anti-depression drugs. The second subset contains 6231 records, including 414 preterm cases. This set is used for finding side effects of multiple types of drugs. The authors develop a tree hierarchical model for organizing the generated rules, in order to ease the recognition of interesting rules by human experts. Using this system, the authors claim that they are able to find novel and interesting rules.

Publication: HIKM '06: Proceedings of the international workshop on Healthcare information and knowledge management

Year: 2006

Presenter: Mojdeh

Ordonez applies different classifiers, associative classifier and decision trees, for predicting the percentage of vessel narrowing (LDA, RCA, LCX and LM) compare to a healthy artery [35]. The dataset contains 655 patient records with 25 medical attributes. Three main issues about mining associative rules in medical datasets are mentioned in this work. A significant fraction of association rules are irrelevant and most relevant rules with high quality metrics appear only at low support. On the other hand, the number of discovered rules becomes extremely large at low support. Hence, association rules are used with constraints. Each item corresponds to the presence or absence of one categorical value or one numeric interval. First constraint is that there is a limit on the maximum item-set size. Second, the items are grouped and in each association, there is at most one from each group. The third constraint is that each item can only appear in antecedent or consequent. The result from associative classifier is compared with two decision tree algorithms: CN4.5 and CART. The authors demonstrate that associative rules can do better than decision trees for predicting diseased arteries.

Zhang et al. introduce a framework for video mining in vivo microscopy images. The goal is to track leukocytes in order to predict inflammatory response. In vivo microscopy allows researchers to capture images of the cellular and molecular processes in a living organism. However, automatic mining of the imagery is challenging due to severe noise, background movement of the living organism, and change of contrast in different frames. Zhang et al. first apply a frame alignment technique, using RANSAC, to correct the camera-subject movement, and then apply a number of probabilistic methods to detect moving leukocytes. Adherent leukocytes are detected, after the moving ones are removed, by finding thresholds for contrast values. The experimental results show 1% false positives and 50% recall on detecting moving leukocytes, and 2% false positives and 95% recall on detecting adherent leukocytes.

Brown et al. apply Support Vector Machines on gene expression data to classify genes based on functionality. This is based on previous experiments suggesting that genes with similar functionality have similar patterns in microarray data. The authors claim that SVMs are well suited to the problem of microarray gene classification, because they perform well in extremely high-dimensional feature space. A training set is generated by combining the DNA microarray data of a set of genes that have certain functionality (i.e. positive labels) and a set of genes known not to be a member of this functional class (i.e. negative labels). Once SVM is trained on this training set, it can determine whether a new gene belongs to the certain functional class, or not. The authors apply SVM, with a number of different kernels, on gene expression data from the budding yeast Saccharomyses cerevisiae, with 5 predefined functional classes. The prediction performance of SVM is compared to predictions by a number of other classification methods, including decision trees, Fisher's linear discriminates, and Parzen Windows. The authors claim that SVM outperforms all the other classification methods.

Author(s): [Tasha R. Inniss and John R. Lee and Marc
Light and Michael A.
Grassi and George Thomas and Andrew B. Williams]

Publication: [TMBIO '06: Proceedings of the 1st
international workshop
on Text mining in bioinformatics, '06, 7--14]

Year: [2006]

Presenter: Amit

This paper deals with issues that came up in the initial steps of building an ontology for Acute Macular Degeneration (AMD). The issues discussed are not confined to a specific area or subfield, and hence can be applied for ontology building in other areas. A comparison of three popular methods viz. Text mining, Natural Language Processing (NLP), and manual (human) analysis was carried out. On the basis of description of AMD images by retinal specialists the results in the form of log likelihood ratios for bi-grams in case of NLP are compared to text mining using SAS text miner. It is found that text-mining agrees more often with the human expert as compared to NLP. However, the results of both the techniques – NLP and text-mining show that there were some important bi-grams that the human expert missed. The authors provide a schematic diagram of the complete process of ontology generation that they hope to carry out in the future.

Novel approaches for small biomolecule classification and
structural similarity search

In this paper the authors talk about classification of bio-molecules based on structural similarity. It is a well-established fact that bio-molecules with similar structure have similar biological and physico-chemical properties. The paper focuses on determining similarity distance measures for bio-molecules, then and using a k-NN classifier to classify the bio-molecules based on the similarity distance measures. They also propose a data structure for fast similarity search called the DMVP tree, which is an improvement over the SCVP tree that was proposed previously. Some of the similarity distance measures that the authors consider are Tanimoto coeffcient, Minkowski coefficient etc. In their experiments the authors found that k-NN does better than Linear Discriminant Analysis (LDA) and Multiple Linear Regression (MLR) in almost all of the cases. Also k-NN was fund to be as much as 100 times faster as compared to other methods. The authors believe that k-NN classification techniques could help rationalize the design and discovery of new drugs.

In this paper the authors use real-life case histories to put forward the advantages of a
Clinical Data Warehouse (CDW). The authors consider a CDW that was constructed around a claims
and authorization processing OLTP system. Before the use of a CDW the company had problems
doing aggregate queries because the OLTP procedures for generating reports would slow down the
system. After the use of CDW not only were the two process found to be more streamlined, the
information services department of the company was able to run more time-intensive, but better
quality, queries on the CDW. The authors suggest using a separate data mart for each
department, and then feeding the data from the data mart to the CDW, however in our opinion a
CDW should be able to suffice the needs at the enterprise-level. The idea of data marts seems
to be redundant and expensive, and will probably introduce an extra level of complexity. The
authors also talk about the various categories of OLAP such as Relational-OLAP (ROLAP),
Multidimensional databases, and Web-based OLAP.

From a paper-based transmission of discharge summaries to
electronic communication in health care regions

This paper talks about the transition from paper-based discharge summaries being sent out to the general practitioners (GP) to paperless electronic transmissions, and the benefits derived thereof. The health@net project was envisioned, in Austria, as a long-term project to achieve the transition in four steps. At each step the lessons learnt were to be applied to the steps that were to follow. The systems used by the GP’s earlier, followed EDIFACT format that cannot receive images and multimedia contents, but by changing those over a period of time, and using PDF format this hurdle was overcome. Initially LDAP related problems were observed in case of transmissions of the data; these errors were ironed out by using public key infrastructure (PKI). Other problems that were encountered were related to legal aspects, organizational problems, and hesitation on the part of the general practitioners to join the project.

Publication: [SDSOA '07: Proceedings of the
International Workshop on
Systems Development in SOA Environments]

Year: [2007]

Presenter: Amit

Te authors discuss the need for the use of Service Oriented Architecture (SOA) in the field of Health Informatics. SOA helps in the integration of data and services between different operating Systems, File Types, Network Protocols etc. Such a scenario is widely common in the field of Health Informatics where the data is fed from different sources – hospitals, OLTP systems, data marts etc. where each one of those could have a different architecture. The authors talk about the services of Decision Support Systems in terms of a flow with two components – the best-services flow, and the customizable-services flow. While the best-services flow represents the best clinical practices developed by researchers and practitioners the customizable-services flow should allow the users to be able to generate new guidelines. The amalgamation of mined-knowledge with best practice workflows will allow the professional to make more accurate decisions.

The Relevance of Data Warehousing and Data Mining in the Field of Evidence-Based Medicine to
Support Healthcare Decision Making

In this paper the authors discuss as to how data mining and data warehousing can play an important role in the field of evidence-based medicine. They talk about how the field of medicine is changing, and this change us calling for a need for the clinical treatment pathways based on population aggregations with similar symptoms coupled with the individual characteristics of the patients. Also the use of such technologies for controlling of clinical treatment pathways helps make moderately accurate predictions such as (1) the hospitalization time of the patient, at the time when the patient is checked in, (2) the resources that will be required in treating the patient, (3) the amount and kind of medicine that will be needed, (4) the management of the staff schedule to ensure a smooth operation etc. The data in the DW, when combined with various data mining techniques, can help the clinician at the point-of-care by generating reports on-the- fly, as desired by the clinician, leading to quick and better informed decisions on the part of the clinicians. The authors discuss how one can cut down on various costs such as operations cost, treatment costs etc. They provide an example of how the administration can regulate the number of clinical tests, and the kind of clinical tests based on the results of data mining for patients with similar medical problems.

In this paper the authors describe the construction of a CDW for analyzing obstetrical data in order to identify the factors that contribute to pre-term births. The authors make use of Microsoft SQL server for storing the data, and use a method called Exploratory factor Analysis in order to carry out data mining. This method makes use of Statistical Analysis to identify data elements that can be combined together in order to account for the variation between the two patient groups – normal and pre-term. The authors mention that this method is ideally suited for the cases where the number of data points is large while there is very little information regarding the mutual dependence (or independence) of attributes. The authors go onto say that this study with a huge database would not have been possible without the use of a Clinical Data Warehouse because of the fact that previously their data was residing in legacy systems, and integration issues alone would have discouraged the use of such data.

This paper compares various methodologies in practice for building a DW, and discusses the pros and cons of each of those methodologies. The authors also discuss various issues with the Data warehousing products from different vendors. The financial issues while shopping for a data warehouse are also dealt with such as in case of scalability of the DW; the authors provide an insight into the functionalities that one should be looking at while shopping around for a DW in the market in order to get the biggest bang for their buck. Finally the authors talk about the important issues such as change management etc. that are currently not supported by any of the vendors, however these issues are very important in the lifecycle of a DW, and play an important role in the success or failure of the DW in the long run.

This paper talks about the role of data warehouses in disease management, and how data mining techniques are being applied to Clinical Data warehouses for the purpose of identifying individuals at risk for targeted diseases such as Asthma, Diabetes etc. This is done by analyzing clinical test-data obtained from the labs, drug prescriptions etc. by looking for the factors that are early signs of a disease. By recruiting patients who have been identified to be at risk one can get manage diseases better, and avoid expenses for the treatment of the disease in the later stage.
A clear distinction is made between the ETL step in case of normal data warehouses as compared with the Disease management data warehouses. A lot of data such as a patient’s address, their workplace could shed light into health/environments hazards that they face leading up to he disease, and hence needs to be kept – a clear distinction from ordinary data warehouses. The author talks about the various steps involved in building a data warehouse or a data mart, and also the differences between a Data Mart and a Data Warehouse. For organizations that are stressed for financial resources the author advises on building a Data Mart, first, with an open design—the scope of data mart can be broadened if need be, later on.

The authors cover, in the greatest details, the functionalities/requirements of a CDSS and how it can improve the quality of patient-health care in the future. Some of the core requirements underlined by the authors in 2001, when the article was published, have become a part and parcel of today’s Medical Health Information Systems (HIS). The paper talks about how a CDSS can play a dual role by providing point of care services for each patient, and at the same time carry out an aggregate analysis that can help in building better models, and make more accurate predictions for outbreak, treatment and prevention of diseases. The authors talk about how the previously mentioned approaches can deliver an effective, efficient and a streamlined Health Information System for the masses. The authors also talk about the personalization of medicine by the use of technologies such as genetic un-coding to make early predictions about events that can be foreseen in the future based on a persons genetic–make up. Finally, the article also describes as to how a CDSS can help increase the probability of insurance companies by the identification and risk factors, should the companies decide to invest in prevention programs.

This paper discusses the issues related to constructing an in-house clinical data warehouse. The authors stress on the fact that a DW is a process not a product for assembling and managing data from various sources for the purpose of gaining a simple detailed view of part or all of a business. DW can be considered as a pro-active approach to information integration, as compared to the more traditional passive approaches where processing and integration starts when a query arrives. In this paper the authors talk about the different methodologies for the construction and design of a DW. They tested five different methodologies, and carry out a comparative analysis of those in the context of the kind of data warehouse that they are building. They present a case study of the problems encountered such as in case of integration, the process of Extraction-Transformation-Loading (ETL) etc. when constructing their own DW.

Towards shared patient records: An architecture for using routine
data for nationwide research

In this paper the authors introduce an architecture, called eardap, for sharing patient health records amongst various clinical trial centers in Germany for the purposes of research in pediatric oncology. This happens to be a complicated task because of the following reasons: (1) The clinical trial centers are geographically spread throughout Germany, (2) The data to be transmitted can be very complex, and sometimes huge, (3) The underlying system architecture at various centers can be heterogeneous (4) Security issues. Eardap has a modular architecture with some core components such as Terminlogy Management System (TMS), core system for data recording and management etc. The rest of the modules are used only for the extension of the data in the core system, when necessary, in cases such as those that involve answering a specific research question such that the data is not relevant for all patients.

Toward Best Practice: Leveraging the Electronic Patient Record
as a Clinical Data Warehouse

This paper puts forward a strong case for the need for building a Data Warehouse (DW). While the transitional database handles clinical transactions in the data entry and data review modes, and provides an at the point-of–care solutions, a CDW sits on top of the transactional database extracting the transactional data into the CDW and reorganizing it for an aggregation analysis. Since the architectural design for OLTP databases is very different from that of CDW’s (OLAP) it is imperative that these be separate systems. The author discusses a case of University Health Network (UHN) in Toronto where a CDW called Decision 1 has been used for optimizing costs, cutting down unnecessary lab testing, optimizing anti-microbial therapy, use of best clinical practices etc. All of these changes are based on the set of rules derived by the feedback from the CDW obtained from the CDW. Most of the feedback from the CDW is based on the feedback- alerts that for physicians, nurses and administrators alike, in case a practice is being followed which is different from the best practice in that area.

This paper brings out the ways in which a CDW is different from other kinds of Data warehouses. Stricter timelines and smaller windows of operation for the ETL process are what differentiate a CDW from other Data warehouses. Another important distinction is the use of materialized views that are based on aggregation analysis – it is not always possible to create materialized views because a lot of medical data such as pulse rate, heart rate etc. does not have any meaning, if aggregated.
Data modeling is a big challenge in case of CDW’s and the designer needs to thoroughly understand the operations of the hospital as well as medical procedures followed by the doctors. Unlike other businesses data models for CDW’s are very complex. Data integration is much more complex because that too can depend on the procedures followed in the hospital requiring a thorough knowledge of these procedures before performing this task. Most off-the-shelf software’s designed for ETL tasks do not work for CDW’s, buying specialized software may cost millions of dollars.

Integrating feedback from a clinical data warehouse into practice organisation

This paper touches on all the beneficial aspects of a CDW, however the point that is emphasized here is that one can learn a lot from the feedback that can be derived from a CDW. Also, one should not stop at learning about the deficiencies of the system-in-place in for e.g. healthcare institutions, but make changes to those, and track those changes periodically to analyze whether they result in any net benefit to the system. With a real-life example of a study on excessive use of blood-gas measurements the authors show that getting a feedback becomes a complex task without the use of a CDW. A CDW can fulfill the demands of administration such as preventing/predicting overcrowding in hospitals, giving useful feedbacks by analyzing the need for clinical tests by carrying out a scientific analysis of the lab test data, better access to the physician with a lower turn-around time. The paper also stresses on the fact that monitoring the changes applied as a result of the feedback leads to (1) the use of best clinical practices, (2) opportunity for clinicians and others to improve, and (3) performance indicator for the clinicians and the nursing staff. Sadly, there is a lot of resistance to its use because people misconstrue as something that is used for monitoring individuals.