Abstract:The substantial growth in the number of clinical images stored as a result of advances in digital imaging technologies necessitates effective methods to query and retrieve relevant images. Medical image retrieval can be performed using text-based retrieval methods using the captions associated with the images, content-based methods or multimodal methods where the techniques are combined.

Here, the focus is on image retrieval strategies that are primarily applicable to general image retrieval in databases contanining images acquired using a variety of imaging modalities, resolutions and with varying quality of annotations. We present three key aspects of the system developed for the ImageCLEF (www.imageclef.com) challegne evaluation. The first is the use of image processing and machine learning techniques to attach additional tags to the images in the system.The second is a query classifier that can be used to appropriately weight results from the textual and visual components of a fusion image retrieval system. The final aspect of our system is the use of distance learning of relevance feedback to better understand the images that the user was searching for.

We have shown improvements in early precision resulting from the above improvements over a standard text-based retrieval system.

Abstract:Disease is one of the most important concepts in medical documents such as patient medical records, clinical literature, and health related information content on the web. A dictionary of diseases is a critical source of background knowledge for many medical language-processing systems. The performance of these systems often depends on the comprehensiveness of the underlying dictionary. Most of the disease dictionaries, however, are manually created and cater to a specific target audience, thereby limiting their comprehensiveness. We have developed an automated, unsupervised, iterative pattern learning approach for constructing a comprehensive, unbiased and self-updating medical dictionary of diseases from a large collection of randomized clinical trial (RCT) abstracts. Our dictionary comprises of 1,922,283 diseases. When used in identifying disease concepts from 100 randomly chosen, manually annotated clinical abstracts, our disease dictionary shows significant performance improvement (F1 by 35-88%) over seven of the most widely used disease dictionaries: SNOMED, Clinical Trial, Cochrane reviews, Cochrane Economic review, Cochrane technology assessment, OMIM, and PharmGKB.

Abstract:In recent years, cryo-electron microscopy has become a valuable tool in structural biology. The ability to observe large protein complexes in their native environments has opened a new window into the inner workings of cells and viruses. Often, high resolution structures of individual subunits are available, which can be used to assemble an atomic-level model of the complete system. However, existing docking methods are only successful at resolutions down to 15-25A. While near-atomic resolution has been achieved for a number of systems, many others can only be observed at intermediate or low resolution due to experimental constraints. I will present both a new pattern matching approach and a user-interactive docking solution which combine to enable atomic structure docking down to 45-60A resolution. The pattern matching employs a Hessian filter which boosts both docking accuracy and contrast considerably. The interactive phase of the docking procedure allows the user to explore the pattern matching results and select relevant candidate solutions for further examination. While the present work is aimed at a better understanding of molecular biology, the new techniques are also readily adaptable to clinical tasks such as image and volume processing of x-ray or MRI data.

Abstract:Probabilistic topic models attempt to represent documents as mixtures of topics, with each topic having a different probability distribution over words. These models have been used to process language data for purposes of document classification and information retrieval. This project focuses on using probabilistic topic models for radiology reports. We apply the latent Dirichlet Allocation (LDA) model and correlated topic model (CTM) to a corpus of reports from the electronic medical record for patients with brain cancer. Both are generative latent variable models, but differ in their modeling of the topic distribution. The strength of the models is judged using perplexity, a metric used in the language modeling community. Future work will focus on using these models to classify reports from a patient’s medical record and incorporating imaging data into the models.

Abstract:The ability to solve chemistry problems ranging from retro-synthetic analysis of natural product drugs and combinatorial library design to reaction prediction and discovery has traditionally been the domain of human experts. Presented here is an expert computer system, based on over 1,200 manually-composed reaction patterns rules, which is capable of solving many of those same problems. Applications of this expert system will be demonstrated in reaction prediction, including complete mechanism diagrams, for interactive learning in organic chemistry and validation of synthesis plans and in retro-synthetic analysis, with automated combinatorial library design, available over the Web. Select applications available at http://cdb.ics.uci.edu.

Abstract:Although power changes have been identified as a consequence of the implementation of computerized physician order entry (CPOE), our study is the first to quantify those changes. We collected data from healthcare workers both pre and post implementation of a CPOE system, and measured changes in their perceptions of power and attitudes about CPOE. We observed a negative change for both the power and attitude variables, and found a direct correlation between them. Further analysis of the data determined the individual influence of French and Raven’s six power bases (Reward, Coercive, Expert, Informational, Legitimate and Reference) on CPOE attitudes, and the relationship between power and the four aspects of CPOE attitudes (data, impact, use, and discretion) determined from exploratory factor analysis. Because negative attitudes toward CPOE reflect resistance, our work anticipates identifying modifications to the implementation processes that might mitigate the effects on personal power and thereby decrease resistance.

Abstract:The relationship between health informatics and patient safety has many dimensions. Improvements in communication and documentation are sometimes obscured by the development of new sources of error and risk. Clinical environments and implementation experiences are unique and complicated, therefore qualitative methods are ideal for examining the consequences of technology implementation. This ethnographic study examines potential sources of patient safety risk related to nursing workflow in barcode medication administration. Potential sources of risk discussed here include: 1) changes in communication genre, including a new process for nurse-to-nurse communication and an electronic medication administration record 2) adaptation of practice to incorporate substantial new physical artifacts, such as mobile computers and scanners, into patient care, and 3) tighter coupling of information processing and language between pharmacy and nursing practice. Organizations can assess the potential local impact of these risks through 1) rapid qualitative assessment of work practices before, during and after implementation and 2) engagement of nurses in reflective analysis of workflow and identification these and other potential problems.

Abstract:Outbreaks of respiratory viruses result in dramatic surges in emergency department visits and hospital admissions. These outbreaks result in overcrowding and the associated logistical challenges for hospital administrators. Viral Outbreaks are fueled by person-to-person spread, which is facilitated by conditions that keep humans indoors, chiefly inclement weather and/or air pollution. We hypothesized that these environmental variables and several clinical variables could serve as signals for a model that could predict the daily inpatient census at a tertiary care children’s hospital. Developing a forecasting tool for patient census allows for improved staffing, better resource utilization and mobilization, and improved timing of educational campaigns around the disease control process. Using a neural network approach we evaluated several different models and variables for predicting patient census prospectively. This initial work enabled selection of a subset of predictor variables and show that different network models, and variables must be used based on season. Several good indicator variables have been identified that can be further studied to develop this model.

Abstract:Objective: Clinicians who refer patients for specialty evaluation implement only half of consultants’ recommendations, but implementation improves when facilitated. We sought to integrate electronic facilitation into an inpatient computerized provider order entry system. We hypothesized that the system could improve implementation and would be valued by clinicians.

Design and Measurement: We designed and produced a tool that enables consultants to enter detailed recommended orders electronically and allows primary teams to review, modify, and directly implement the recommendations in the system. We trained clinicians about its use. The system was piloted for ten weeks, among referring internists and geriatrics consultants. Patients undergoing consultation were in a control (before pilot) or intervention (during pilot) group. For controls, internists received no facilitation. We used intention-to-treat analysis to review records and compare recommendations and implementation for cases and controls, and also surveyed internists about the system.

Results: Two geriatrics consultants provided 439 recommendations to approximately 40 internists, for 40 patients (20 controls). Consultants created electronic versions of 190 (77%) of 247 intervention recommendations. Among these, implementation was 86%. Internists implemented 192 (78%) of all 247 intervention recommendations, compared to 113 (59%) of 192 control recommendations (p<0.001 by chi-square). All 24 Internists who responded to the survey indicated that the new tool improved the quality of consultation, saved time, and wanted all consultants to use the tool to enter future recommendations. Conclusion: Facilitating implementation of recommendations via electronic approval was highly valued and improved implementation rates by more than 30%. This system could lead to improved quality of care.

Abstract:The MyHealtheVet (MHV) personal health record has more than 550,000 registered users who have made more than 16 million visits to the site. MHV offers registered users access to information on health conditions and medications. Authenticated users can access all MHV functions, including refilling VA prescriptions online. Future MHV functions include secure messaging, wellness reminders, managing appointments, and viewing extracts from the VA electronic health record. The purpose of this study was to describe which site functions were being used by current MHV users. A random sample of visitors to the MHV website was offered an online survey during the last two months of 2007, using the American Customer Satisfaction Index methodology. A 13% random sample of users viewing 4 or more pages was prompted to take the survey. A total of 27,406 surveys were received (17% response rate). This compares favorably to a 6.9% mean response for online surveys deployed by other organizations. Ninety three percent of respondents identified themselves as veterans. Half were older than 60, and 17% were over 70. This age distribution is the same as the population of all registered MHV users. Most (81%) reported using the system at least once per month. Personal health information was used by 55%, and health education libraries were accessed by 50%. Seventy seven percent reported a desire to use the system to obtain medication refills. These results indicate that MHV has good potential to meet the health information needs of veterans.

Abstract:Background: Establishing a relationship between medications and diagnoses within a functioning electronic medical record system (EMR) has many practical and valuable applications such as: improving the problem; identifying non-indicated medications; organization of medication lists (for orders and e-Prescribing); problem-based information visualization; and order entry with decision support. Additionally, these relationships may provide novel insight into clinical practice when applied to historical medical records.

Methods: We evaluated over 1.6 million de-identified patient records from the Regenstrief Medical Record System (RMRS) with over 90 million diagnoses and 20 million medications. Using RxNorm, the VA National Drug File Reference Terminology, SNOMED-CT (S-CT), and ICD-9 standard terminologies and mappings we evaluated the linkage for local concept terms for medications and diagnoses.

Results: For medication terms to S-CT, we were able to map 1885 terms to 1347 concepts accounting for 8.9 M and 7.6 M instances, respectively. For diagnosis terms to S-CT, we were able to map 44% of terms and 68% of instances. Overall we were able to map 38% of diagnosis terms, but 65% of instances. Conclusions: Medications can be mapped by machine to a disease/disorder using established terminology standards. This mapping should inform many knowledge management and decision support features in an EMR.

Abstract:Studies have shown that ventilator weaning protocols have resulted in a reduction in duration of mechanical ventilation, ventilator–associated pneumonia, and the rate of re-intubation when compared to weaning directed by a physician. These studies have focused on the use of a daily screen and subsequent spontaneous breathing trial. This project monitors weaning status and provides continuous feedback on each patient’s status. If the patient qualifies for a change in ventilator settings, and a change in ventilator settings has not been made, the medical staff is alerted by a change in color of a ventilator weaning indicator displayed on the unit’s whiteboard. The specific aim of this project is to improve the adherence to the ventilator management and ventilator weaning protocols already in place in the medical intensive care unit. The compliance and time oxygen saturations are in goal range will be investigated. The hypothesis is both of these outcomes will be improved with the use of a ventilator weaning indicator.

Abstract:Clinical trials are important for testing scientific achievements for their clinical effectiveness and moving promising discoveries from bench to bedside. However, studies report that only a tiny fraction of patients are enrolled in clinical trials even in oncology where clinical trial enrollment is considered particularly beneficial for individual patients. We present a proof-of-concept Natural Language Processing tool to automate the extraction of eligibility and exclusion medical diagnosis criteria from web page postings of clinical trial announcements. Our application relies on the automated problem list extractor we are developing for the University of Washington’s electronic medical record system. Using the National Library of Medicine’s MetaMap program and UMLS metathesaurus with in-house built program modules we extract medical problems from clinical trial web site announcements. We extract medical diagnoses for both the eligibility and exclusion sections of clinical trial study details. During proof-of-concept phase we focus on cancer clinical trial announcements but will not limit our application to predefined diagnostic categories. We hypothesize that an NLP based automated system could achieve comparable sensitivity and specificity to an oncologist in extracting medical diagnoses from clinical trial announcements. The system we present could serve as the foundation infrastructure for automated clinical trial recruitment applications.

Abstract:The most common cause of disability in older adults in the United States is osteoarthritis (OA), a disease causing degeneration of articular cartilage and bone changes at the joints. To address the problem of early disease prediction, we have developed Bayesian belief networks composed of knee OA-related symptoms to support prognostic queries. The purpose of this study is to evaluate the utility of a static and a dynamic Bayesian belief network, based on the NIH Osteoarthritis Initiative (OAI) data, to predict the likelihood of a patient having knee OA. Initial validation of the model shows promising results, outperforming a baseline logistic regression model in several scenarios. We can conclude that our model can effectively predict the symptoms that are commonly associated with the presence of knee OA.

Abstract:A growing body of evidence links inappropriate methylation of DNA domains to disease phenotypes. Global hypomethylation of genomic domains has been observed in multiple cancers. To facilitate a high-throughput study of the methylation patterns of the human genome we created a custom microarray platform capable of reading the methylation status at 339,314 distinct genomic loci. The uniqueness of this platform comes from the unprecedented breadth of genomic domains that the platform is able to sample, resulting in the most comprehensive to this date genome-wide study of DNA methylation. To design this microarray an exhaustive list of CpG islands (CpGI) had been generated based on the contemporary assembly of the human genome sequence (HG16). Based on available annotation generated using Repeat Masker, Tandem Repeat Finder, MicroRNA databases, and known gene annotation, the CpGI were classified according to the nearby (preferably overlapping) genomic element. A library of 37 head and neck squamous cell carcinoma (HNSCC) samples and 17 morphologically normal, “paired” adjacent tissue samples were used to prepare methylation profiles with the custom microarray. In addition 10 buccal scraping of unrelated normal individuals were used to provide a set of reference “normal” methylation profiles.

1Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, Utah, 2Institute for Cancer Studies, University of Sheffield Medical School, Sheffield, UK

Abstract:Breast cancer risk is thought to be due to many low penetrance variants across multiple genes. Despite recent successful associations, follow-up work has yet to identify the underlying susceptibility variants responsible. Haplotype analysis using tagging-SNPs is one method to effectively guide the search for the causal variant. In this study, we perform a follow up-analysis on CASP8, a gene recently identified to be convincingly associated with breast cancer risk (Cox et al. 2007). Our analysis was performed using hapConstructor, a new software developed to efficiently data-mine haplotypes that implements a stepwise forward-backward procedure to search for haplotypes associated with disease. Fourteen tagging-SNPs in CASP8 in 2,434 cases and controls from Sheffield UK were analyzed. Using hapConstructor we identified four, four-locus haplotypes that reached the stopping significance threshold. These four haplotypes matched at 10 out of 14 loci and indicate a single risk haplotype. Additionally, this risk haplotype concurs with the original finding for coding variant D302H (rs1045485) previously identified by the Breast Cancer Association Consortium. Our results indicate that individuals homozygous for this risk haplotype would be valuable to screen, such as with sequencing or comparative genomic hybridization experiments, to further pursue the underlying susceptibility variant disrupting CASP8.

Detecting Coevolution without Evolution? Incorporating Phylogeny is Not Necessarily Helpful

Authors:Gregory Caporaso, Larry Hunter, and Rob Knight, University of Colorado

Abstract:Identifying coevolving positions in protein sequences has applications ranging from understanding and predicting the structure of single molecules to generating proteome-wide protein-protein interaction predictions. Algorithms for detecting coevolving positions (coevolution algorithms) can be classified into two categories: tree-aware, or those that incorporate knowledge of phylogeny; and tree-ignorant, or those that do not. Tree-ignorant methods are generally faster, but thought to be more error-prone. Using a novel approach based on protein alpha helices, three tree-aware and four tree-ignorant coevolution algorithms are systematically compared. Surprisingly, the tree-ignorant methods (particularly Mutual Information) frequently out-perform the tree-aware methods. The results suggest that the current tree-aware algorithms may be over-controlling for patterns arising from ancestry. While phylogeny is important for avoiding false positives, the algorithms evaluated here appear too stringent and frequently fail to out-perform tree-ignorant methods.

Abstract:Functional RNA molecules such as ribozymes have complex three- dimensional structures that enable them to play catalytic or structural roles in the cell. Knowing the structure of these molecules is critical to understanding their functions. However, predicting RNA structure from primary sequence remains a significant challenge. We have developed the Nucleic Acid Simulation Tool (NAST), a software package that builds coarse-grained models of RNA structures in a fully automated fashion. We use an RNA-specific knowledge-based potential in a coarse-grained molecular dynamics engine to generate large numbers of plausible 3D structures. We then filter these structures using the surface measurements to identify those that are most compatible with the experimental data. NAST requires no special RNA modeling expertise, uses available information about the secondary and tertiary structure, and can run on either a single computer or a cluster. We have used NAST to address two classes of structure modeling problems for the Tetrahymena thermophila group I intron: modeling missing structural elements in RNA crystal structures and modeling folding intermediate structures.

Abstract:Pre-mRNA splicing is carried out by the spliceosome, which identifies exons and removes intervening introns. Alternative splicing in higher eukaryotes results in the generation of multiple protein isoforms from gene transcripts. The extensive alternative splicing observed implies a flexibility of the spliceosome to identify exons within a given pre-mRNA. To reach this flexibility, splice-site selection in higher eukaryotes has evolved to depend on multiple parameters such as splice-site strength, splicing regulators, the exon/intron architecture, and the process of pre-mRNA synthesis itself. RNA secondary structures have also been proposed to influence alternative splicing as stable RNA secondary structures that mask splice sites are expected to interfere with splice-site recognition. Using structural and functional conservation, we identified RNA structure elements within the human genome that promote alternative splice-site selection. Their frequent association with alternative splicing demonstrates that RNA structure formation is an important mechanism regulating gene expression and disease.

Abstract:The use of computational methods for predicting protein interaction networks will continue to grow with the number of fully sequenced genomes available. The Co-Conservation method (CC), also known as the Phylogenetic Profiles method, is a well-established computational tool for predicting functional relationships between proteins.

We examined how various aspects of this method affect the accuracy and topology of protein interaction networks. The results showed that the choice of the reference genome in this method influences the number of predictions generated involving proteins of previously unknown function, the accuracy of predicted interactions, and the topology of predicted interaction networks. Then Co-Conservation (CC) pairs were organized into clusters (Cluster Co-Conservation (CCC)). CCC has been limited to interactions within a single target species. Finally, we extended CCC to develop protein interaction networks based on co-conservation between protein pairs across multiple species, Cross-Species Cluster Co-Conservation (CS-CCC). Our results showed that CS-CCC provides unique and biologically useful information that is not identified using CCC alone. Moreover, CS-CCC can be used to help uncover systematic errors in annotation and facilitate the blind annotation of new genomes.

Abstract:The majority of emerging infectious diseases that affect humans are zoonotic; diseases that are transmittable between animals and humans. The health of animals can be a sentinel for zoonotic diseases in humans. Unfortunately, most local and state health department epidemiologists do not have automated access to this data. Using data on animal health to predict risk of zoonotic diseases in humans could allow epidemiologists to detect public health threats sooner. Earlier detection means earlier intervention which could lead to less morbidity and mortality. This presentation will describe a research plan to explore this by: gaining an understanding of the data and technology needs for zoonotic disease surveillance at the local and state level, applying these needs to the development of a pilot 'animal-human' surveillance system that integrates health data of animals and humans, and evaluating the potential of this novel system for zoonotic disease surveillance. This work will provide a framework for integrating animal and human data and demonstrate the potential of this synergy in surveillance of zoonotic disease. It will hopefully lead to the development of powerful surveillance systems in local and state health departments.

Abstract:Selection of amplified genomic segments in particular somatic cellular lineages drives tumor development. However, pinpointing genes under such selection has been difficult due to these regions’ large sizes. We propose a new method, called Amplification Distortion Test (ADT), that identifies specific nucleotide alleles that confer better survival for tumor cells when somatically amplified. ADT draws upon the Transmission Disequilibrium Test from statistical genetics and is extended to evaluate and localize distortion on haplotypes in addition to single markers. It is optimized for performance and includes computational techniques to address the intricate challenges of discerning true biological signals from technology-induced artifacts in data. ADT is used to analyze our pioneering dataset, containing 700 tumor samples from lung cancer patients that are typed for copy number variation and 240K SNPs genome-wide. We detect strong single-SNP and haplotype distortion signals on multiple chromosomes, thus revealing prime target regions for fine mapping. We conclude that this novel mode of genome scanning via ADT constitutes a new paradigm for mapping oncogenes. Analogous datasets currently being accumulated by other organizations, such as the Cancer Genome Atlas, provide compelling avenues for further investigation and application of this new methodology.

Abstract:Touch sensation in upper limb prosthesis may one day be created with artificial feedback presented through nerves, but first the transformations that underlie touch sensation must be better understood. For example, the transformation of skin indentation into neural pulse times, which likely carry rich tactile information, cannot be predicted by existing models. To address this gap, a skin-receptor model is described here that predicts the transformation of skin indentation to individual pulse times. The transformation of skin indentation to individual pulse times is achieved by decomposing the overall transformation into three coupled sub-transformations. First, skin indentation is transformed via a solid mechanics model of the skin into distortions at the receptor. Second, the distortions at the receptor are transformed via a sigmoidal transduction function into the ionic current crossing the receptor membrane. Finally, the current crossing the receptor membrane is transformed via a leaky-integrate-and-fire model of pulse generation into the times at which the receptor generates neural pulses. This work will provide a computational test bed for studying the transformations underlying touch.

Abstract:A variety of high throughput DNA sequencing experiments are being designed. Except for de novo sequencing, high throughput sequencing experiments take advantage of reference genome information, specifically data and analysis tools provided by the NLM as well as some pre and post processing for various experiment designs. A few applications of reference genomes include: transcript sequencing, genome wide Epigenome sequencing and template based bisulfite sequencing. Epigenome sequencing finds candidate methylation regions. Transcript sequencing shows RNA produced during transcript splicing and deep sequencing shows the epigenetic variation between cells. Template based sequencing has shown that cancer tissues have distinct epigenetic modifications at specific genome loci.

Abstract:Proteomic biomarker discovery from mass spectral data and subsequent protein identification is a difficult problem. The variability in experimental techniques, combined with the large numbers of possible biomarkers from high-throughput proteomic mass spectral studies often stymie researchers by requiring long and expensive manual identification and subsequent validation of the identity of any biomarker. While validation is a crucial and necessary step, several inefficiencies exist within the identification process that can be addressed by bioinformatics tools. We developed a system that (1) uses a machine learning algorithm that has been shown to select significant biomarkers from proteomic data, (2) gathers the relevant and important statistics about the putative biomarkers, and (3) finds possible protein identifications using a novel online resource that stores previously identified and validated biomarker-to-protein links. We have successfully used this system to mine a mass-spectral dataset to identify a putative biomarker, and then link it to a protein. The identity of the biomarker was then later validated by another lab.

Abstract:The Gene Ontology (GO) is the most widely used controlled vocabulary, organized as a directed acyclic graph, for annotating proteins. In the GO knowledge-base, annotation of proteins with biological concepts further connects proteins to the GO graph. This leads to a complex graph consisting of interleaved biological concepts and their associated proteins. The characteristics and properties of such a graph provide information regarding how proteins are functionally related to each other. Given this framework, graph theory offers novel methods to evaluate the functional coherence of a group of proteins.

Examination of graph topological properties of the protein-GO graphs reveals that, for each of the organisms studied, the graph belongs to the family of scale-free networks. The hub-spoke nature of the graph indicates that some proteins play pivotal roles in connecting various biological functions, while some biological concepts carry most of information regarding protein functions. These results lead to the development of a graph theoretic framework to quantify the statistical significance of protein groups based on functionally coherence. The method is then applied alongside a conventional, count-based technique, to microarray data from a retinal degeneration mouse model, to assess the functional significance of clusters.

Authors:Greg Cipriano, George Phillips, Michael Gleicher, University of Wisconsin-Madison

Abstract:Protein surfaces, while important for understanding how a protein interacts with its environment, are complicated and can take significant effort to understand. Existing methods for displaying these surfaces do little to improve this situation. Our work is in constructing visualization techniques that provide an abstracted view of the shape and spatio-physico-chemical properties of complex molecules. Unlike existing molecular viewing methods, our approach suppresses small details to facilitate rapid comprehension, yet marks the location of significant features so they remain visible.

Our approach uses a combination of filters and mesh restructuring to generate a simplified representation that conveys the overall shape and spatio-physico-chemical properties (e.g. electrostatic charge). Surface markings are then used in the place of important removed details, as well as to supply additional information. These simplified representations are amenable to display using stylized rendering algorithms to further enhance comprehension.

Our initial experience suggests that our approach is particularly useful in browsing collections of large molecules and in readily making comparisons between them. In the future, we believe that these kinds of abstract representations may improve the performance of automated analysis techniques.

Abstract:Background: Traumatic Brain Injury (TBI) is a signature injury of the current wars in Iraq and Afghanistan. Structured electronic data regarding TBI findings is important for research, population health and other secondary uses, but requires appropriate underlying standard terminologies to ensure interoperability and reuse.

Methods: We developed a comprehensive working case definition of mild traumatic brain injury composed of 68 clinical concepts. An expert panel reviewed the case definition for completeness. Using automated and manual techniques, we mapped TBI case definition concepts into SNOMED CT and MEDCIN and compared the results.

Results: SNOMED CT sensitivity (recall) as a reference terminology for our case definition of mild TBI was 89%, and its positive predictive value (precision) was 100%. MEDCIN sensitivity was 50%, with a positive predictive value of 100%. The superior performance of SNOMED CT over MEDCIN was statistically significant with a p-value of <0.001.

Discussion: SNOMED CT was significantly better able to represent TBI concepts than MEDCIN, the terminology underpinning of AHLTA, the U.S. Department of Defense electronic health record system. Traumatic Brain injury is a serious disorder, and mild cases often are missed. These current study findings may inform data gathering and management strategies for those with mild TBI.

Investigators report on a multi-disciplinary dialog examining convergences of expert opinions of different professional (health information managers, healthcare lawyers and medical informaticists). The multi-phased study deployed a modified Delphi collection methodology. A total of 25 healthcare domain experts representing diverse professional perspectives were interviewed via a web-based survey instrument. Data analysis was based upon the inductive approach established in Grounded theory. A thematic qualitative analysis was completed based on domain experts’ responses.

The study provides a resulting specification list to be used as an outcome instrument in evaluation of healthcare organizations’ preparedness for legal discovery of electronically stored information in EHRs. The specification list serves as a formative organizational framework to be considered for further development of industry specific best practices for litigation readiness and response with respects to EHR systems and electronic discovery.

Abstract:Objective: To investigate a novel approach to determining patient travel patterns based on the type of healthcare being sought by the patient. To evaluate two different geographic information systems (GISs) – ESRI ArcGIS and Google Maps – in determining patient travel patterns.

Materials and Methods: We analyzed patient travel patterns, based on billing data, over a one-year period for all outpatient visits within a large academic medical system. Patient travel was divided into two distinct care types – routine care and consult care, derived from common procedure terminology (CPT) billing codes. Patient travel was analyzed using two different GISs.

Results: A total of 487,910 encounters were analyzed. For routine care, the mean and median patient travel was 13.6 and 5.7 miles using ESRI ArcGIS and 13.5 and 5.5 miles using Google Maps. For consult care, the mean and medial patient travel was 17.9 and 8.3 miles using ESRI ArcGIS and 17.7 and 8.2 miles using Google Maps. Differences in mean and median travel distance between care categories were statistically significant; however between GISs were not. Conclusion Patient travel patterns, by care type, can be analyzed using different GIS systems and may have significant implications for healthcare delivery generally and telemedicine in particular.

Authors:Matthew L Bolton, Ellen J Bass, University of Virginia, Radu I Siminiceanu, National Institute of Aerospace

Abstract:Many modern medical systems are complex in that they depend on the interaction between technical infrastructure (mechanical systems, electrical systems, transportation systems, human-system interfaces, etc.), people (operators, maintenance crews, etc.), and environment conditions to operate successfully. While engineering these subsystems/components, system failures are often emergent as they occur as a result of subsystem interactions. While formal methods, and particularly model checking programs, have proven useful in predicting system failure in computer hardware and software systems, they have not been extensively used to evaluate one of the largest sources of emergent failure, human error; the error resulting from the interaction between human operators and the system. This paper proposes a framework to fill this gap by using formal models of systems, their human-system interfaces, and their operators’ normative behaviors in order to predict when erroneous human behavior can occur and how this behavior can contribute to system failure. This framework is illustrated using a model of the Therac 25.

Authors:Herbert Chase, David Kaufman, and Eneida Mendonca, Columbia University

Abstract:A significant proportion of physicians’ clinical questions remain unanswered which seriously impacts patients’ outcomes. The Context-initiated Query Response (CIQR) project seeks to provide answers to clinicians by facilitating clinical question capture at the point of care and returning answers via automated electronic searching. We studied the feasibility of capturing questions in a clinical milieu during workflow by providing five internal medicine residents with digital recorders and instructing them to record questions as they arose. Each resident recorded between 10-20 questions while on the hospital wards or in clinic and then participated in a structured interview. Two major themes emerged from interviews. First, most questions were recorded during chart documentation when the patients’ plan was being developed; residents were reluctant to record in front of colleagues during rounds or at the nurses’ station. Second, the recorder prompted residents to focus on their information needs resulting in their asking more questions than usual. Several residents acknowledged that had their questions not been recorded they ultimately would not have answered them. Our results suggest that capturing questions during workflow is feasible and may prompt heightened awareness of information needs which should foster answer seeking and, ultimately, improved patient care.

[This work was supported by NLM Grants 5R01 LM008799-02 and T15LM007079-16]

Abstract:Living with or preventing chronic disease requires information to develop illness coping or avoidance regimens that affect the everyday lives of patients, their families, and others in their social networks. As chronic disease rates and associated human and societal costs increase throughout the world, it is not surprising that seeking for health information on behalf or because of others has become one of the most prevalent Internet information-seeking activities. Health care researchers identify patients’ significant others as hidden patients, agents, et al., and note their unmet information needs despite many previous efforts to address them. Prior information behavior studies describe such interpersonal information seekers or sharers as gatekeepers, proxies, information-acquirers-and-sharers, et al. However, previous information seeking models intended to inform information service and system design do not explore their behavior in-depth. This project proposes that these information seekers are sufficiently related to be conceptualized as one type of information seeker, the lay information mediary (LIM), and introduces an empirically-based model of lay information mediary behavior (LIMB) intended to facilitate future empirical and theoretical work to improve information system design and health outcomes, particularly in chronic disease care.

Authors:Yichuan Hsieh1, Patricia Flatley Brennan21School of Nursing, University of Wisconsin-Madison 2School of Nursing and College of Engineering, University of Wisconsin-Madison

Abstract:Traditional consumer health informatics applications that were developed for lay public on the Web were commonly written in a Hypertext Markup Language. This type of webpage design is not equipped to generate individualized information content in a way that is efficient and straightforward for clinicians to construct and modify. As genetics knowledge rapidly advances and requires updating information in a timely fashion, a different content structure is therefore needed to facilitate information delivery. I will present the Prenatal Genetic Education Program (PreGEP), specifically designed for pregnant women of advanced maternal age, as an example to demonstrate how a dynamic database-driven web-based consumer health informatics application was developed and evaluated through design/redesign process.

Abstract:To assess the adequacy of informatics evaluation concepts contributed by health services research (HSR) informatics evaluation literature compared to informatics evaluation literature, this study updates and broadens a previous systematic review of informatics evaluation studies to include databases in addition to PubMed, and updates the review’s database. 2609 articles were reduced to 126 studies about electronic health records that met inclusion and exclusion criteria. The studies’ content were categorized as either informatics or HSR studies using HSR evaluation criteria. The studies’ evaluation components, system attributes and contexts, were compared to the criteria used in the systematic review to identify 8 new evaluation components. Delineating HSR and informatics evaluation studies by journal type, the HSR journal articles contributed disproportionately more new evaluation concepts than the informatics journal articles, including 2 uniquely different concepts. Across all the journals, there were equal distributions of informatics and HSR studies, and new concepts. Future evaluators of EHR systems should consider adding these 8 new attributes to their suite of evaluation criteria in their conceptual models.

Abstract:Background: Successful implementation of an electronic health record (EHR) presents many challenges; benefits may only be realized in the longer term. We assessed primary care clinician perceptions of a new EHR on quality of care for 12 months following implementation.

Methods: We surveyed 104 primary care clinicians in four medical groups over 12 months following implementation of a common EHR. The instrument assessed clinician perceptions regarding EHR impact on overall quality of care, patient safety, communication, and efficiency. We fit multivariable logistic regression models with generalized estimating equations to assess changes in perceptions over time.

Conclusions: Clinicians report increasing support for electronic health records over time. Health systems and clinicians should consider the longer term benefits of implementing such technology when confronting initial challenges.

Abstract:Potentially avoidable deaths occur each year among hospital patients. In an effort to eliminate avoidable deaths, many hospitals have instituted Rapid Response Teams (RRT). One of the potential weaknesses of most RRT is that activation usually relies solely upon human recognition of critically abnormal vital signs. This recognition process is not consistently reliable and as a result some patients may not receive timely life saving interventions. We performed a pilot study to determine whether a computer generated list of patients with critically abnormal vital sign parameters identifies critically ill patients not recognized as such by the patient’s care team.

An alert list of of inpatients with critically abnormal vital signs was automatically electronically generated every hour on the hour. Patients in the CCU, ER, and on comfort care were automatically filtered from the list. The list was viewed by a surveillance nurse dedicated to this task.

During this study period 36% of patients had critically abnormal vital signs that were not recognized by the team caring for the patient. The use of an electronic monitoring system along with actions generated by a surveillance nurse may prevent under-detection of critically ill patients, and may reduce potentially avoidable deaths.

A New Approach to Feature Selection in the Analysis of High Throughput Data

Authors:Chad Kimmel, James Lyons-Weiler, University of Pittsburgh

Abstract:When analyzing high throughput data to predict a clinical outcome, the number of features is often substantially greater than the number of samples. One approach to this statistical challenge is to perform feature selection, which attempts to select those features that predict the outcome well. Typically, only predictive accuracy is used to perform feature selection. This research proposes a new feature selection method called Wrapper Consistency Analysis. The method strives to optimize both the predictive accuracy of a set of features, as well as the consistency (stability) of those features when used for prediction across different datasets. The objective of this research is to discover if using both predictive accuracy and feature consistency lead to predictive models that perform better than using accuracy alone. Preliminary results, which will be presented, have been promising.

Abstract:Therapeutic lifestyle change (TLC) is an effective intervention to reduce risk of cardiovascular disease. However, the psychosocial and behavioral data elements necessary to implement this intervention are absent from existing electronic health record (EHR) systems. Our patient-centric EHR (PC-EHR) project aims to develop an integrated system to facilitate patient-provider interaction, foster cooperative chronic disease management, and promote adherence to guidelines by both providers and patients through evidence-based decision support. The objective of this phase is to identify the necessary data elements, beginning with a list from the National Cholesterol Education Program Adult Treatment Panel III (ATP III) guidelines. An eight-member expert advisory committee, representing diverse healthcare-related backgrounds, refined the list through a modified Delphi method involving two electronic mail rounds and one in-person meeting. In each round, committee members rated the priority for inclusion of each proposed data element. Overall, the committee reviewed a total of 83 proposed data elements, and finalized a set of 30 data elements for inclusion. This list will inform the next steps in the development of the PC-EHR: identification of measurement tools to assess the data elements, and development of the decision support algorithms based on the TLC recommendations contained in the ATP III guidelines.

Voted Best Poster, Day 1How Physicians Perceive and Interpret Data Using Graphical and Tabular Displays

Authors:David Bauer, Stephanie Guerlain, University of Virginia

Abstract:Presentation format can influence the way physicians perceive and interpret data. The transition to electronic medical records has created an opportunity to present information in new ways that could aid decision-making. Intensive care units collect large amounts of data from distributed sources at various time intervals. Physicians in this setting perform a wide-range of tasks by aggregating data in a number of ways depending on the task at hand. Display design can encourage or discourage physicians to make certain types of comparisons. Optimal design solutions rarely exist due to inevitable trade-offs in emphasizing particular data properties. One approach is to present data in a manner that makes cognitively changing tasks, such as determining acid-base status, easier at the expense of performance on relatively simplistic tasks. Additionally, multiple representations of the same data may improve performance across many tasks. This project aims to understand how physicians perceive and interpret data from graphical and tabular displays. Many available systems replicate tabular designs seen in paper-based documents, but this format is not well-suited for tasks requiring higher levels of data aggregation. We have developed a novel graphical design to support tasks that, based on initial observations, physicians use to make patient care decisions.

Abstract:Biomolecular structures determined by typical X-ray crystallography experiments result in models that best reproduce the sharp Bragg intensities. These models correspond to the average electron density of the unit cell convoluted with the crystal lattice and are improved when the effects of atomic variations on the Bragg intensities are taken into account, via occupancies and temperature factors that correspond to uncorrelated motions. While this approach has been very successful, insights into the ensemble of states available to the crystal and dynamic transitions between these states are lost. Fortunately, much of this information can be regained by studying the diffuse scattering throughout reciprocal space that is due to symmetry-breaking, correlated motions within the crystal. Crystal motions ranging from uncorrelated, random atomic displacements to collective lattice vibrations yield distinct diffuse scattering patterns. In this study, the total X-ray scatter from several biological molecules is investigated where correlated motions are computed with an elastic network model. The effects of treating crystal environment explicitly are highlighted.

Abstract:Single-cell based studies on nuclear receptor (NR) mediated gene regulation demonstrate a range of responses to environmental and physiological stimuli not previously appreciated by population-based studies. Using automated image acquisition and analysis complemented with mathematical modeling we aim to identify and characterize cell-to-cell variation within a population. This will enable us to determine whether this variation is normal in gene regulation or if it is a basis for the development of diseases. Using an established HeLa cell line containing GFP tagged androgen receptor (GFP-AR) we will measure the minimal threshold and variability of nuclear translocation in response to ligand treatment in wild type and mutant AR. Also, a HeLa cell variant having stable integration of a multi-copy prolactin promoter responsive to estrogen receptor (ER) will be used to measure spatiotemporal promoter occupancy to detect key coregulators for gene regulation. We will determine the proteins whose promoter occupancy cycles are synchronous with previously observed cyclic transcriptional activity upon treatment with estradiol. Finally, to further our understanding of how variability relates to the probabilistic interaction of proteins, we will develop a mathematical model based on the experimental data to explore underlying mechanisms for gene regulation control.

Abstract:The American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) provides member institutions a risk-adjusted comparison of postoperative mortality and morbidity rates against other institutions, as a means for identifying areas needing surgical quality improvement. The program requires the regular collection and submission (by specially trained clinicians) of approximately 100 pre-, intra-, and post-operative risk factors. To capture clinical data, reduce clinician time and improve data quality and accuracy, we have automated the data collection process. We have built a web interface that organizes and displays all electronically available ACS NSQIP data so that the clinician can review the information from a single point source. The system organizes and displays data only; it provides no computational inference or interpretation. The interface allows our ACS NSQIP clinicians to better utilize their clinical training, experience, judgment, and time to validate and interpret the data rather than searching through various data sources to collect it. Preliminary use of this system has greatly improved efficiency, which we believe will reduce costs associated with data collection and improve both the quality and accuracy of the data, which ultimately can enhance the surgical quality improvement process.

Abstract:The farnesoid X receptor (FXR) is a member of the nuclear receptor superfamily of transcription factors that regulate expression of their specific target genes after ligand binding. In response to natural ligand bile acid concentrations, FXR has been shown to regulate cholesterol metabolism, lipid and glucose homeostasis. We used NimbleGen ChIP-on-chip microarray, which contain tiled arrays representing all ~26,000 annotated genes in the mouse genome, to identify groups of genes that are regulated by the binding of FXR. The results of the hybridization were analyzed by using NimbleScan and ChIPOTle. We identified defined groups of genes in mouse liver that are regulated by the binding of the FXR protein. Interestingly, our data contains many genes known to be regulated by FXR and many genes previously unrecognized as FXR target genes. These results help validate the methodology but also indicate we have uncovered new and novel roles for FXR in gene regulation. Many of the new potential target genes indicate FXR is a key regulator of both phsopholipid and glycogen biosynthesis.

Abstract:Modular protein domains have been shown to mediate important interactions such as signal transduction. Prior studies have used various ways to predict these peptide binding domain targets, however, to date the prediction of their biologically relevant targets has not been addressed in an automated and integrated fashion. Therefore, we have developed a motif analysis pipeline which predicts the target binding peptides of these domains by analyzing peptides identified from experimental screens or pre-made Position Weight Matrix with comparative genomic, structural genomic and genomic data. Our system uses an efficient search algorithm to scan the target proteome for potential motif hits and assign a PWM match score to each hit. It complements the motif score with a variety of pre-computed features, such as conservation, surface propensity, and disorder, which have been previously shown to determine biologically relevant targets. It also integrates genomic features such as interaction and localization data to further improve the prediction. Finally, it applies a Bayesian learning algorithm to integrate all scores and give an optimal target prediction based upon a validated training data set. It aims to provide a comprehensive platform for researchers to predict biologically significant targets that are potentially recognized and bound by a particular domain of interest.

Abstract:Asthma is the leading chronic childhood disease affecting causing >1.8 million emergency department (ED) visits annually. This study hypothesizes that implementing a combined clinical workflow embedded reminder system with an asthma guideline will increase utilization and adherence of guideline-driven care leading to improved patient outcomes. The pediatric ED provides care for >40,000 patients annually with approximately 10% presenting with asthma exacerbations. A paper-based guideline including a validated asthma severity metric is used for only 7-10% of the asthma cases. In the first phase, a Bayesian network to detect asthma exacerbations will be integrated with the ED triage information system. The effect of the detection algorithm on prompting clinicians to use the paper-based guideline will be evaluated in a randomized controlled trial. In the second phase, the asthma guideline will be computerized and integrated with the information systems available in the ED, such as triage application, computerized provider order entry, patient tracking system, and respiratory therapy system, to create a workflow embedded guideline delivery infrastructure. The detection algorithm will prompt providers for eligible patients and will remind physicians about repetitive assessments during the visit. Outcome variables will examine parameters associated with clinical outcomes (time to disposition, admission rate, 24-hour ED return rate) and guideline utilization. The project examines a new approach to increase guideline utilization and adherence, which may provide additional insights for potential approaches for delivering guideline-based care.

Abstract:Physician sign-out is a mechanism for transferring patient information from one group of hospital caregivers to another at shift changes. Support tools are critical to the success of sign-out. To ensure that a tool is effective, designers must collaborate with end users. Collaboration can be difficult when working with end users who are hard to reach. This article reports on a collaborative effort between physicians and engineers to redesign a sign-out support tool. Strategies included focus groups, interviews, “on the fly” feedback, and an iterative design process. Task analysis methods were used to compare the original tool with the prototype in order to quantify any differences in functionality. Last, we discuss general conclusions and offer practical techniques for engaging resident physicians in the design process.

Grocery Store Point-of-Purchase Data as a Nutritional Assessment Method in Public Health Informatics

Authors:Kristina M Brinkerhoff, Kristine C Jordan and John F Hurdle, Department of Biomedical Informatics, University of Utah School of Medicine

Abstract:Nutrition plays a key role in many health outcomes, including diabetes, cardiovascular disease, and cancer. Researchers have developed nutritional assessment methods to better understand the interplay between diet and health. These methods often rely on self-reported eating behaviors and can be extremely inefficient and burdensome for participants and researchers to perform. This project explores using grocery store point-of-purchase data as a surrogate nutritional assessment method. While point-of-purchase data have their own limitations, this method has the potential to be an efficient, indirect measure suitable for large-scale, informatics-based nutritional studies. Over two million purchase records from a grocery store chain in Utah were analyzed to create a profile of the point-of-purchase information that can be garnered from such a data source. Preliminary analysis of the purchases show that there are 66,080 unique Universal Product Codes (UPCs) in the data set and 86% of all items purchased are food items. The data were also analyzed according to food groups and a more granular categorization of specific food items, which, once linked to nutritional content, would prove useful in determining the dietary makeup of customers’ purchases. These data will inform the final study that will compare point-of-purchase data to two standard nutritional measures.

Abstract:Different types of cancers could acquire the ability to become metastatic, and we hypothesize that we could utilize this information to identify the genes, as well as their corresponding functions and pathway that are involved in the metastatic behaviors and are conserved among different cancers. Gene expression data from different cancer types were analyzed in this study, and for each cancer type, we identified the genes that are differentially expressed between metastatic condition (lymph node or distant organ metastasis) and primary condition by using the pre-defined false discovery rate (FDR). Two methods were used to identify genes that are consistently differentially expressed in the metastatic versus non-metastatic conditions: genes that have conserved pattern of differential expression were identified and the genes that have common differentially expressed pattern were clustered. The binding sites for the transcription factors ETS, GABP, and STAT were enriched consistently in each approach; pathways annotations such as actin-cytoskeleton pathway, and MAP kinase pathway are also enriched. The meta-analysis of the gene expression data in this study provides a better understanding about the genes that are involved consistently in different types of cancer metastasis, and provide valuable information for us to study the mechanism of cancer progression.

Abstract:The National Cancer Institute estimates that this year 28,660 men in the U.S. will die of prostate cancer and 186,320 new cases will be diagnosed. It is a disease of particular importance to American veterans, with prostate cancer accounting for over 25% of the cancers diagnosed in veterans in 2004. The most common surgical treatment for prostate cancer is the prostatectomy, or surgical removal of the prostate. Despite the prevalence of this treatment, an estimated 28% of procedures result in tumor left at the margin of resection, increasing the risk of cancer recurrence in those patients by two to four times. Like many unintended surgical outcomes, positive tumor margins are the consequence of surgical technique. Yet the lack of accessible information describing the surgical procedure has led outcomes assessment researchers to rely on indirect but quantifiable correlates such as hospital volume and surgical experience. This research tests the hypothesis that information extraction techniques can facilitate more robust process-based surgical outcomes assessment by identifying and structuring patient conditions prior to surgery, key variations on procedures, and the result of the surgery from clinical records. Results of the use of a combination of machine-learning and heuristics to extract key quality measures from the reports of two hospitals will be presented along with discovered inconsistencies in urologists’ approach to the prostatectomy.

Abstract:The drug imatinib is currently the standard first-line treatment for chronic myeloid leukemia (CML), and is responsible for a significant improvement in CML management in recent years. However, approximately 10-15 percent of patients do not respond to the drug. An effective method for predicting non-responders will allow those patients to seek alternative therapies without delay. Methods for using clinical measures to predict a CML patient’s prognosis exist (Sokal, et al., and Hasford, et al.), and a newly developed gene expression-based classifier has been shown to effectively distinguish between patients who respond to imatinib and those who do not. This project will determine whether a gene expression-based classifier improves our ability to predict a patient’s response to imatinib over previous methods that use only clinical information. The possibility of integrating clinical data with gene expression data to further improve the classifier will also be examined. Future research will explore other ways of integrating clinical and genetic/genomic data to better model disease progression, such as: 1) using additional clinical data and genotypes for common mutations to refine a patient’s phenotype, and 2) integrating functional/genomic/clinical information to create relevant gene-gene interaction networks (pathways) to better understand disease processes.

Informatics and Communication in a State Department of Public Health: A Case Study

Authors:Rebecca A Hills, Anne Turner, University of Washington

Abstract:Both state and local health departments are witnessing fast growth in the area of informatics. Some large departments have or are working on forming informatics teams to be the center of public health information resource needs in the organization. With the emergence of these new groups, there comes a need for understanding how communication between informatics teams and other teams will be conducted. In Washington State an Electronic Death Registry System (EDRS) project was initiated in 2002. The project ran into several difficulties and by the end of 2007 was only implemented partially in six Local Health Jurisdictions in the state. Although the reasons for the implementation delays were complex, some employees working on development and support activities for EDRS felt that communication between the health statistics group (in charge of death certificates) and the informatics team was inefficient. We conducted interviews and did a survey of the employees involved with the EDRS project on both teams. The data gathered were analyzed and a picture of how communication works and doesn’t work within the two very different groups in the health department was described. The use of a role matrix helped clarify roles and illuminate discrepancies in employee perception of responsibilities. After the analysis, recommendations for improving communication between the two groups were made to the group leads.

Abstract:The objective was to develop and characterize a large, representative sample of guideline recommendations that can be used to better understand how current recommendations are written and to test the adequacy of guideline models. We refer to this sample as the Yale Guideline Recommendation Corpus (YGRC). To develop the YGRC, we extracted recommendations from guidelines downloaded from the National Guideline Clearinghouse (NGC). We evaluated the representativeness of the YGRC by comparing the frequency of use of controlled vocabulary terms in the YGRC sample and in the NGC. We examined semantic and formatting indicators that were used to denote recommendation statements. In the course of reviewing 7527 recommendation statements, we extracted 1275 recommendations from the NGC and characterized the guidelines from which they were derived. Both semantic and formatting indicators were used inconsistently to denote recommendations. Recommendation statements were not reliably identifiable in 31.6% (310/982) of the guidelines and many recommendations were not actionable as written. The YGRC provides a representative sample of current guideline recommendations and demonstrates considerable variability and inconsistency in the way recommendations are written.

A Bayesian Framework for Inferring Transcription Factor States from Gene Expression

Authors:Thomas Asbury, Xinghua Lu, Medical University of South Carolina

Abstract:Transcription factors (TFs) play a crucial role in the gene expression network. By binding to a gene’s promoter region, they can either promote or repress its expression. TFs are proteins and, unlike mRNA, are difficult to measure in large scales, where traditional methods such as reporter genes become limited. As a result, the state and network of TFs have to be inferred from gene expression data. We have designed a Bayesian network to determine the state of TFs for a given set of microarray experiments. We use a simple two-layer model to represent the TF-to-gene expression network, and build an input connectivity matrix from known experimental data. Our model is unique in its use of Boolean variables to represent the TF state. This simplifies the output analysis and interpretation to give meaningful biological results. If the TF state is on, the TF is active in the cell for that measured condition, either inhibiting or promoting transcription. We demonstrate the model’s effectiveness by applying it to a series of microarray experiments which subject yeast (S. cerevisiae) to heat shock stress.

Abstract:Recent electrophysiology and imaging studies of mitral cell neurons in the olfactory bulb reveal that inhibitory synapses must be close to the soma of a mitral cell to inhibit its firing. We therefore explored the neural network effects of synaptic location along mitral cell lateral dendrites from inhibitory interneurons, the granule cells, using computational models. We show that synchrony in the firing of two mitral cells can only occur when the mitral cells are proximally close to each other. Additionally, these models show how the mitral-granule circuit can induce spatial and temporal patterning of spiking activity which also has implications for neural coding.

Abstract:The North American Association of Central Cancer Registries (NAACCR) sets data transmission standards for cancer abstract reporting and electronic pathology laboratory reporting. NAACCR uses LOINC (Logical Observations Identifiers Names and Codes) codes for the pathology reporting and is exploring the transmission of the cancer abstract reports using the Clinical Document Architecture (CDA) format and LOINC codes. Since LOINC codes are publicly available and anyone may submit a request for additional codes, there exist LOINC codes with similar concepts which might potentially be miscoded and lead to data submission errors. Our objective was to perform an analysis and identify these potentially miscoded concepts. We systematically reviewed and performed a gap analysis on LOINC codes using Regenstrief LOINC Mapping Assistant (RELMA) software and NAACCR lookup table with LOINC codes or HL7 segment to determine if it enables users to quickly identify appropriate codes. Our primary findings were that a majority of LOINC codes were appropriately assigned to NAACCR data items. However, there were some codes which were apparently misclassified with a wrong class or panel in RELMA. Resolving these misclassifications will assist users in mapping LOINC codes to the appropriate NAACCR data items and hence enhance the quality of cancer data submissions.

Background: Healthcare data is scattered cross many healthcare systems with different identifiers, and patient records collected in an institution may have multiple identifiers referring to the same patient.

Record linkage is a key functionality for fully functioning health information exchanges. Probabilistic linkage methods produce more accurate, dynamic, and robust matching than rule-based approaches, particularly when matching patient records that have no unique identifiers. Theoretically, the relative frequency of specific data elements can enhance the F-S algorithm. However, to our knowledge, no frequency-based weight scaling modification to the F-S algorithm has been implemented and evaluated using real-world clinical data.

Methods: We are developing a frequency-based weight scaling modification using an information theoretic model, and will initially estimate the accuracy using synthetic data. We will formally evaluate the effectiveness of this modification by linking Indiana statewide newborn screening data to registration data from the Indiana Network for Patient Care, an operational health information exchange.

Hypothesis: We expect that the frequency-based weight scaling modification to F-S algorithm will improve overall linkage accuracy as characterized by sensitivity and positive predictive value.

Abstract:Inadequate dosing for nephrotoxic or renally cleared drugs in patients with acute kidney injury is common, though recent clinical decision support systems have proven successful in decreasing errors. Within the care provider order entry (CPOE) system, we developed a set of interventions with varying levels of workflow intrusiveness. The interventions alert providers about significant changes in renal function, defined as a 0.5 mg/dl change in serum creatinine, and advise discontinuation or modification of nephrotoxic or renally cleared drugs. Passive alerts appear as persistent text within the CPOE system and on rounding reports, requiring no provider response. More intrusive exit check alerts interrupt the provider at the end of the CPOE session, requiring the provider to modify or discontinue the drug order, assert the current dose as correct, or defer the alert. We evaluated the initial provider response to the interventions, using as our outcomes the resulting actions for alerted orders and the responses selected by providers as required by the exit check alert. Preliminary analysis shows the interventions to be effective in significantly improving provider response to changes in renal function, though initial usage suggests future enhancements to increase success.

Abstract:Advances in genomic technologies have led to an information revolution in both biology and medicine, improving and changing the face of health care. Because of this, large data sets are becoming omnipresent in the biomedical domain. Here, our goal is to define a set of taxonomies for large data sets that highlights the nature of biological data and addresses salient methodological issues. We present a discussion of large data sets including the growth of biomedical databases over time, ways to measure information content and complexity in large data sets, and methodological issues associated with the analysis of large data sets. Both a statistical taxonomy and an information management taxonomy have been defined. In the statistical taxonomy, we understand and analyze data based on both its size and dimensionality, recognizing issues that come in each case such as feature selection and multiple testing. In the information management taxonomy, we discuss data complexity and computational complexity in detail and suggest measures such as principle components, correalation, median skewness and kurtosis. The definition of taxonomy large data sets is critical for the purpose of guiding and improving analysis and research. The taxonomies given here serve this purpose.

Abstract:The number of articles in the MEDLINE database is expected to increase tremendously in the coming years. To ensure that all these documents are indexed with continuing high quality, it is necessary to develop tools and methods that help NLM indexers in their daily task. We present work addressing a novel aspect of automatic indexing of the biomedical literature, namely producing MeSH main heading/subheading pair indexing recommendations. A combination of statistical, rule-based and Natural Language Processing methods were assessed and the results obtained led to the implementation of a subheading attachment module for NLM’s Medical Text Indexer.

Usability Testing of an Electronic Health Record Form to Monitor Overweight Children

Authors:Barbara Moore1, Julie Wright2, BonnieLai Watson2, William Adams21VA Boston Healthcare System and 2Boston University School of Medicine, Boston, MA

Abstract:Overweight management is challenging due to time constraints and minimal training in nutrition, physical activity and brief counseling. The Telephone-linked Care for Healthy Eating and Activity Today system may help physicians counsel by providing home data and effective behavioral theory-based counseling. The physician component (PC-HEAT) of the program aims to inform physicians about their patients’ progress in a telephone-based home overweight education and behavior change program, and to facilitate discussion with those patients about overweight and self-care skills. Goal Directed Task Analysis (GDTA), Display Task Description (DTD) and usability testing helped align the interface with the user’s needs and workflow. GDTA helps produce better user interfaces, but successfully translating GDTA into a user interface is not easy. We used a tool created for the military, DTD, which gathers users’ decision and information requirements determined by GDTA to create the interface. Usability testing included: 1. a pre-testing questionnaire, 2. scenario-based thinking out loud technique with paper mock-ups of the user interface, 3. prototyping an interface from scratch and 4. post-test debriefing session. Usability testing also helped highlight topics for pre-deployment orientation. We recommend routine use of these approaches to improve interface quality and usability.

Abstract:Asthma is a known risk factor for the development of chronic obstructive pulmonary disease (COPD), but a thorough understanding of the progression to COPD among asthma patients has not been achieved. We created a robust predictive model describing which asthma patients develop COPD using electronic medical records. Demographic information and comorbidities from adult asthma patients were extracted from electronic medical records of the Partners Healthcare System using ICD9 billing codes and Natural Language Processing tools of the National Center for Biomedical Computing entitled “Informatics for Integrating Biology to the Bedside” (i2b2). A predictive model of COPD was constructed from 9349 patients (843 cases, 8506 controls) using Bayesian networks. We found a model in which sex, race, smoking history, and 14 comorbidities predict COPD. The comoborbities include diabetes, coronary artery disease, and pulmonary infections, suggesting the systemic nature of COPD. The model’s predictive accuracy was tested in an independent set of asthmatics (992 patients; 46 cases, 946 controls). The area under the Receiver Operating Characteristic curve corresponding to this prediction is 0.80, demonstrating good predictive accuracy. Our results demonstrate the promise of using electronic medical records to create clinically useful predictive models when abstracted using natural language processing methods.

Authors:Casey L Overby, Michal Galdzicki, John Gennari, University of Washington

Abstract:Biosimulation models have the potential to improve our understanding of biological function, and ultimately could support disease diagnoses and help to prioritize treatment options. Currently, a researcher must select from a wide variety of biosimulation model development environments available which support different modeling methods and techniques such as statistical modeling, agent-based modeling, or models based in differential equations. There are two ways a researcher may approach choosing a development environment. The most common approach is for a researcher to use a simulation environment with which they are most familiar. The hope is that the environment includes all of the necessary tools and capabilities for implementing the desired computational model; if these needs are not met, the research must develop ad hoc workarounds. Another approach to choosing a modeling environment is to first identify the modeling needs. Based on these needs, the investigator may then decide on the most appropriate modeling environment. Here we identify some characterizing features of computational models and a subset of considerations that may be taken into account when determining the appropriate simulation platform. Additionally, we identify some simulation platforms that are commonly used within different research domain areas. The information we provide will aid investigators in determining a simulation platform appropriate for their research needs

Abstract:Modern high-throughput experimental techniques such as gene knockouts and microarray studies provide the opportunity to greatly accelerate biological discovery. Often the results of these experiments can be interpreted as yielding a (possibly large) set of "interesting" entities (e.g., genes). Tools for the automatic analysis of the interesting set have the potential to help biological scientists to more easily understand and make use of their results. In this work we examine the use of the biomedical literature to answer two natural questions. First, what discriminates the interesting set from the non-interesting set? Second, can we discover meaningful structure within the interesting set (i.e., clustering)? We study the use of latent topic models (such as Latent Dirichlet Allocation) to answer these questions.

Abstract:Biologists are increasingly recognizing that social interactions are commonplace and have important consequences for predicting the direction and outcome of natural selection. For example, the potential for competitive and cooperative interactions among pathogens alters predictions about the evolution of virulence, and social interactions are also important in understanding biofilm formation in bacteria. We are using the social amoeba Dictyostelium discoideum as a model system in which to explore the genetics and evolution of social behaviors. We are using a whole genome sequencing approach to identify polymorphism and testing the hypothesis that candidate genes involved in social behavior evolve significantly faster than expected by chance. We have completed whole genome sequencing for two wild isolates. We find that they differ at about 0.1% of sites in the genome. More important, the distribution of polymorphism is highly skewed: most genes show little or no genetic change, whereas relatively few appear highly polymorphic. One candidate gene that may be involved in self-nonself recognition has been shown to be among the top 1% of polymorphic genes in the genome. Our work explores how molecular evolution approaches can be applied to whole genome sequences to gain an understanding of the selective forces acting on microorganisms.

Abstract:10,208 Uniform Resource Locator (URL) addresses in MEDLINE records from the National Library of Medicine from 1994 to 2006 were identified and accessed at random times once daily for 30 days. Results showed that the average URL length ranged from 13 to 425 characters with a mean length of 35 characters [Standard Deviation (SD) = 13.51; 95% confidence interval (CI) 13.25 to 13.77]. The most common top-level domains were “.org” and “.edu”, each with 34%. Titles and abstracts were also searched for the presence of archival tools such as WebCite, Persistent URL (PURL) and Direct Object Identifier (DOI). Our survey of archival tool usage showed that since its introduction in 1998, only 519 of all abstracts reviewed had incorporated DOI addresses in their MEDLINE abstracts. About 81% of the URL pool was available 90% to 100% of the time, but only 78% of these contained the actual information mentioned in the MEDLINE record. “Dead” URLs constituted 16% of the total. The rate of URL persistence during the 1-month study period parallels previous studies. As peer-reviewed literature remains to be the main source of information in biomedicine, we need to ensure the accuracy and preservation of these links.

Abstract:Introduction: We compare user acceptance of a web based interface for computerized provider order entry (CPOE) to the original client-server interface. The WizOrder CPOE system consists of a C/C++ application server and a Java client interface installed on 4,200 clinical workstations at Vanderbilt University. WizWeb, a web-based replacement user interface employing JBOSS middleware was developed to enable remote web-based CPOE access, while maintaining a similar interface to minimize or eliminate the need for training.Methods: We asked 24 providers who use WizOrder extensively to enter 15 orders on a hypothetical patient with the diagnosis of pulmonary embolism into both interfaces. Participants rated ease of use, functionality, speed, and times required to complete tasks for each system on a 7 point Likert scale.Results: Mean Likert values for ease of use, functionality, speed, and time required to complete task were 6.17, 6.17, 5.79, and 6.04, respectively, for WizOrder and 6.17, 6.17, 5.63, and 6.00, respectively, for WizWeb. There were no statistically significant differences using matched-sample t-test analysis (p<0.05) between the two interfaces.Discussion: The web-based WizWeb interface is comparable to the current client-based WizOrder interface in user acceptance, and may eventually provide a viable replacement for the WizOrder interface.

Abstract:Gene relationships reported in biomedical literature are valuable to the understanding of interaction networks. Biomedical literature is growing rapidly and scientists needs a mechanism to have up-to-date pathways that reflect new findings from the literature. In this work, we are developing informatics tools to extract gene relationships from current biomedical research literature using text mining and semantic understanding. Research articles from a selected set of journals are grouped into sets of training and testing articles. Information retrieval techniques are utilized for term parsing, frequency, stemming, proximity, and stopword removal. In addition, terms from the training set are compared and filtered using the lists of pre-defined KEGG gene, protein names, and gene relationship terms. Each phrase is associated to a block, or called a domain-concept (DC), of WNT signaling pathway. A Domain-Concept Mining (DCM) approach expands the terms in each of the gene relationship phrases. DCM identifies co-occurring terms within each domain concept (block) in order to generate gene relationship phrase thesauruses. Phrases such as “Axin binds to APC” can be parsed into two gene names {Axin}{APC} and the verb {binds to}. These components are mapped to the corresponding gene products and their relationships within the WNT signaling pathway.

Abstract:Population genetics provides one approach to identifying genes involved in adaptation. In this study we use population genetic approaches to look for signatures of selection in a set of genes involved in various abiotic stresses such as cold, salination, drought and heat. We have sequenced a set of genes from 10-15 individuals in six natural wild populations around the globe. By using a variety of coalescent methods and modeling techniques we compare these data to a model of A. lyrata history based on a large set of genes on which we do not expect to see evidence for local adaption. We use an approximate Bayesian computation method that is based on summary statistics to deal with many of the issues that arise in complex population data sets that have a variety of features such as geographic structure, population size bottlenecks, migration and founder effects. These features can confound our ability to detect selection and here we use a demographic model to deal with some of these factors.

Abstract:The goal of personalized medicine in the ICU is to predict which diagnostic tests, monitoring interventions, and treatments translate to improved outcomes given the variation between patients. Processes such as gene transcription and drug metabolism are dynamic in the critically ill; information obtained during static non-diseased conditions may have limited applicability. We propose an alternative way of personalizing medicine in the ICU on a real-time basis by using information derived from the application of artificial intelligence on a high-resolution database. The Multi-parameter Intelligent Monitoring for Intensive Care (MIMIC II) database consists of data from ICU patients admitted to Beth Israel Deaconess Medical Center. Patients on vasopressor agents greater than 6 hours on admission were identified. Variables that affect fluid requirement or reflect the intravascular volume were extracted. We represented the variables by learning a Bayesian network from the underlying data. Using 10-fold cross-validation repeated 100 times, the accuracy of the model is 77.8%. Based on the model, the probability that a patient will require a certain amount of fluid on day 2 can be predicted. In the presence of a larger database, analysis may be limited to patients with identical clinical presentation, demographic factors, co-morbidities, and current physiomic data.

Abstract:Despite speculation that Telemicroscopy and Digital Microscopy will follow the same diffusion curves as their counterparts in the world of Radiology - Teleradiology and Filmless Radiology, no study has offered definitive evidence in support of this hypothesis. To address this gap in the informatics knowledge base, dual survey instruments were created to measure current opinions on both technologies among Pathologists and Radiologists and administered to both groups via an online survey engine at two academic medical centers. Both surveys were crafted using the diffusion of innovations work of Rodgers and the FITT model of Ammenwerth et al. Revision of the survey for Pathologists is nearing completion and the same process is underway for the Radiologists’ survey. Results gathered from 15 respondents to the pilot Pathology survey indicate that at these institutions, 25% had never used a Telemicroscopy system, 85% would not find using a Telemicroscopy system for routine sign-out practical and 66% its use would improve individual turn-around-time. Further work will entail dissemination of the surveys to US Pathologists and Radiologists. In addition, we will validate the diffusion model using data from a retrospective literature search of Radiologists’ opinions on the technology applied to the current Radiology survey instrument.

Abstract:A patient’s electronic medical record can consist of a large number of reports, especially for an elderly patient or for one affected by a chronic disease. It can thus be cumbersome for a physician to go through all of the reports to understand the patient’s complete medical history, leading to the possibility of adverse drug interactions. This poster describes work in progress towards tracking medications and their dosages through the course of a patient’s medical history. 923 reports associated with 11 patients were obtained from a university hospital. Drug names were identified using a dictionary look-up approach. Dosages corresponding to these drugs were determined using regular expressions. The state of a drug (started, discontinued, or continuing) was identified using a statistical classifier and then refined by a hidden Markov model. Preliminary results are presented; although sub-optimal, the simplicity of the algorithm lends itself to significant improvement and is hence promising.

Abstract:Introduction: There are several absolute database models including relational, hierarchical, object, and network. We used VHA medical data of the treatment of age related macular degeneration (ARMD) to explore a relativism conceptual database structure and demonstrate the utility of ancillary administrative data.

Methods: Treatment regimens of age related macular degeneration were grouped by patterns of ancillary data. Grouping was based on the number of attributes, the number of possible values of attributes, relationship of attributes to an event, trigger pattern of ancillary data to an event, and the relationship of entities entering data. A model of probability based on a cumulative scale of associated data was created.

Results: Standard oral regimens utilized as a preventative measure reducing risk of progression in dry ARMD had the least robust ancillary data and therefore least certainty of validity. The bevacizumab intraocular data has the most robust ancillary data and best exemplified the value of a relativistic database model.

Conclusion: Ancillary administrative data is an important adjunct to assessing validity of medical regimens. A relativist logical database structure demonstrates the improved specificity from the use of ancillary data. In an idealized schema with many ancillary data elements, the actual value of the event of interest adds little to the knowledge it occurred.

Abstract:An increasing body of preliminary work suggests that biomedical natural language processing is capable of extracting relationships between biomedical entities (e.g. proteins and protein complexes) directly from literature that are of sufficient quality for subsequent use by biologists and bioinformaticists. Many of the relation extraction systems that have been described include as a key component a syntax parser that provides deep syntactic analysis of the texts that are processed. Syntax is seen as key to relationship extraction because it provides information about how words and concepts are related that is independent of the particular concepts being discussed and captures many generalizations of how biomedical language is structured that can be exploited. Unfortunately, attaining high quality syntax parsing for biomedical is difficult because most parsers have been trained on general English texts such as newswire and generally run very slow (O(n5)). As such, many of the current relation extraction systems point to syntax parsing as being a source of many errors and a throughput bottleneck. I will discuss strategies for building a fast, high-quality syntax parser for biomedical literature using biomedical corpora, community-curated knowledge resources, and state-of-the-art dependency parsing strategies.

Abstract:Blood pressure (BP) is a predictor of mortality in hemodialysis (HD) patients. However, BP measurements are usually used as fixed, summary values, such as mean or median BP. Utilizing serial systolic BP measurements over six months, we constructed derived variables to predict mortality for the succeeding month using averaging and differencing approaches that reflect BP trends over time. Data consisted of a random sample of 4,500 HD patients (out of 90,000) from a large dialysis provider in the United States. Models were constructed using a Support Vector Machine (SVM). Results were compared to an SVM that used six-month mean BP as a predictor. All models were adjusted for age, gender, race, diabetes, vintage, and BMI. The area under the ROC curve (AUC) for the best derived-predictor BP model using cross-validation was 0.70, compared to 0.63 for a simple average BP model (p<0.00001). The AUC was 0.73 when modeling a pooled data set with each person-month representing individual data. Application of differencing to reflect BP trends significantly improved model performance in this study.

Abstract:Comparative genomics holds promise for improving prediction of protein phosphorylation sites. Here I explore two aspects of this: the high incidence of negatively charged amino acids at the same position as the phosphosite in evolutionarily related proteins; and the appearance of conserved motifs near the phosphosite in conjunction with the switch to a phosphorylatable residue from acidic amino acids. I also examine the suitability of these data for incorporation as novel features for prediction.

Authors:Stephen R Piccolo, Lewis J Frey, and Nicola Camp, Department of Biomedical Informatics, University of Utah School of Medicine

Abstract:Existing methods for predicting an individual's breast-cancer risk are limited in accuracy or scope. Researchers have estimated the discriminatory accuracy of the Gail Model, a risk predictor based on phenotypic variables, at 58-67%. Genetic tests for breast cancer commonly focus on rare alleles with low population attributable fractions. We aim to develop a risk-prediction model for breast cancer more accurate than the Gail Model and more widely applicable than existing genetic tests. This polygenic model characterizes genetic risk as the combined effect of many alleles, each affecting risk in various degrees but together having an additive or multiplicative impact on disease susceptibility. The accuracy of the model is evaluated using publicly available data from a genome-wide association study of sporadic, post-menopausal breast cancer. Each study participant's genotypes are tied to their respective odds ratios, and risk is estimated as the product of the odds ratios for the alleles most highly associated with breast cancer in the study. Preliminary tests reveal that this model performs better than chance at discriminating between cases and controls. Further work will be done to refine the model and compare its value for predicting breast-cancer susceptibility with existing methods.

Abstract:The objective of this research is to determine the extent to which clinicians use the cognitive heuristic of representativeness in clinical reasoning. We investigated clinicians’ inference and use of disease base rates when assessing an aggregate of patients; the effect providing clinicians with base rates has on clinical reasoning; and what type of data clinicians consider in diagnosis. Physicians, engaged in think-aloud techniques, assessed clinical scenarios presented in varying formats. Think-aloud protocols were analyzed, indicating clinicians do not infer disease base rates when assessing patients, nor do they use base rates when available. When provided with disease causal and base rate information, clinicians tend to consider causal data more heavily than base rate data, largely ignoring disease base rates entirely. However, when operating within one's domain of expertise, this trend does not hold. In this study when diagnosing patients, physician specialists considered all the data available to them. These results support that the representativeness heuristic is utilized differently by general practitioners than by specialists. This research provides a method for analyzing cognitive heuristics involved in assessing clinical data, which serves as a first step in understanding what errors may result from those heuristics and how to prevent them.