Can Blockchain Technology and Artificial Intelligence Cure What Ails Biomedical Research and Healthcare

Curator: Stephen J. Williams, Ph.D.

Updated 12/18/2018

In the efforts to reduce healthcare costs, provide increased accessibility of service for patients, and drive biomedical innovations, many healthcare and biotechnology professionals have looked to advances in digital technology to determine the utility of IT to drive and extract greater value from healthcare industry. Two areas of recent interest have focused how best to use blockchain and artificial intelligence technologies to drive greater efficiencies in our healthcare and biotechnology industries.

More importantly, with the substantial increase in ‘omic data generated both in research as well as in the clinical setting, it has become imperative to develop ways to securely store and disseminate the massive amounts of ‘omic data to various relevant parties (researchers or clinicians), in an efficient manner yet to protect personal privacy and adhere to international regulations. This is where blockchain technologies may play an important role.

A recent Oncotarget paper by Mamoshina et al. (1) discussed the possibility that next-generation artificial intelligence and blockchain technologies could synergize to accelerate biomedical research and enable patients new tools to control and profit from their personal healthcare data, and assist patients with their healthcare monitoring needs. According to the abstract:

The authors introduce new concepts to appraise and evaluate personal records, including the combination-, time- and relationship value of the data. They also present a roadmap for a blockchain-enabled decentralized personal health data ecosystem to enable novel approaches for drug discovery, biomarker development, and preventative healthcare. In this system, blockchain and deep learning technologies would provide the secure and transparent distribution of personal data in a healthcare marketplace, and would also be useful to resolve challenges faced by the regulators and return control over personal data including medical records to the individual.

Open source blockchain Exonium and its application for healthcare marketplace

A blockchain-based platform allowing patients to have control of their data and manage access

How advances in deep learning can improve data quality, especially in an era of big data

Advances in Artificial Intelligence

Integrative analysis of the vast amount of health-associated data from a multitude of large scale global projects has proven to be highly problematic (REF 27), as high quality biomedical data is highly complex and of a heterogeneous nature, which necessitates special preprocessing and analysis.

Increased computing processing power and algorithm advances have led to significant advances in machine learning, especially machine learning involving Deep Neural Networks (DNNs), which are able to capture high-level dependencies in healthcare data. Some examples of the uses of DNNs are:

Generative Adversarial Networks (https://arxiv.org/abs/1406.2661): requires good datasets for extensive training but has been used to determine tumor growth inhibition capabilities of various molecules (7)

Recurrent neural Networks (RNN): Originally made for sequence analysis, RNN has proved useful in analyzing text and time-series data, and thus would be very useful for electronic record analysis. Has also been useful in predicting blood glucose levels of Type I diabetic patients using data obtained from continuous glucose monitoring devices (8)

Transfer Learning: focused on translating information learned on one domain or larger dataset to another, smaller domain. Meant to reduce the dependence on large training datasets that RNN, GAN, and DNN require. Biomedical imaging datasets are an example of use of transfer learning.

One and Zero-Shot Learning: retains ability to work with restricted datasets like transfer learning. One shot learning aimed to recognize new data points based on a few examples from the training set while zero-shot learning aims to recognize new object without seeing the examples of those instances within the training set.

Highly Distributed Storage Systems (HDSS)

The explosion in data generation has necessitated the development of better systems for data storage and handling. HDSS systems need to be reliable, accessible, scalable, and affordable. This involves storing data in different nodes and the data stored in these nodes are replicated which makes access rapid. However data consistency and affordability are big challenges.

Blockchain is a distributed database used to maintain a growing list of records, in which records are divided into blocks, locked together by a crytosecurity algorithm(s) to maintain consistency of data. Each record in the block contains a timestamp and a link to the previous block in the chain. Blockchain is a distributed ledger of blocks meaning it is owned and shared and accessible to everyone. This allows a verifiable, secure, and consistent history of a record of events.

Data Privacy and Regulatory Issues

The establishment of the Health Insurance Portability and Accountability Act (HIPAA) in 1996 has provided much needed regulatory guidance and framework for clinicians and all concerned parties within the healthcare and health data chain. The HIPAA act has already provided much needed guidance for the latest technologies impacting healthcare, most notably the use of social media and mobile communications (discussed in this article Can Mobile Health Apps Improve Oral-Chemotherapy Adherence? The Benefit of Gamification.). The advent of blockchain technology in healthcare offers its own unique challenges however HIPAA offers a basis for developing a regulatory framework in this regard. The special standards regarding electronic data transfer are explained in HIPAA’s Privacy Rule, which regulates how certain entities (covered entities) use and disclose individual identifiable health information (Protected Health Information PHI), and protects the transfer of such information over any medium or electronic data format. However, some of the benefits of blockchain which may revolutionize the healthcare system may be in direct contradiction with HIPAA rules as outlined below:

Issues of Privacy Specific In Use of Blockchain to Distribute Health Data

Blockchain was designed as a distributed database, maintained by multiple independent parties, and decentralized

Linkage timestamping; although useful in time dependent data, proof that third parties have not been in the process would have to be established including accountability measures

Blockchain uses a consensus algorithm even though end users may have their own privacy key

Applied cryptography measures and routines are used to decentralize authentication (publicly available)

Blockchain users are divided into three main categories: 1) maintainers of blockchain infrastructure, 2) external auditors who store a replica of the blockchain 3) end users or clients and may have access to a relatively small portion of a blockchain but their software may use cryptographic proofs to verify authenticity of data.

YouTube video on How #Blockchain Will Transform Healthcare in 25 Years (please click below)

In Big Data for Better Outcomes, BigData@Heart, DO->IT, EHDN, the EU data Consortia, and yes, even concepts like pay for performance, Richard Bergström has had a hand in their creation. The former Director General of EFPIA, and now the head of health both at SICPA and their joint venture blockchain company Guardtime, Richard is always ahead of the curve. In fact, he’s usually the one who makes the curve in the first place.

Please click on the following link for a podcast on Big Data, Blockchain and Pharma/Healthcare by Richard Bergström:

December 5, 2018 | The boom of blockchain and distributed ledger technologies have inspired healthcare organizations to test the capabilities of their data. Quest Diagnostics, in partnership with Humana, MultiPlan, and UnitedHealth Group’s Optum and UnitedHealthcare, have launched a pilot program that applies blockchain technology to improve data quality and reduce administrative costs associated with changes to healthcare provider demographic data.

The collective body, called Synaptic Health Alliance, explores how blockchain can keep only the most current healthcare provider information available in health plan provider directories. The alliance plans to share their progress in the first half of 2019.

Providing consumers looking for care with accurate information when they need it is essential to a high-functioning overall healthcare system, Jason O’Meara, Senior Director of Architecture at Quest Diagnostics, told Clinical Informatics News in an email interview.

“We were intentional about calling ourselves an alliance as it speaks to the shared interest in improving health care through better, collaborative use of an innovative technology,” O’Meara wrote. “Our large collective dataset and national footprints enable us to prove the value of data sharing across company lines, which has been limited in healthcare to date.”

O’Meara said Quest Diagnostics has been investing time and resources the past year or two in understanding blockchain, its ability to drive purpose within the healthcare industry, and how to leverage it for business value.

“Many health care and life science organizations have cast an eye toward blockchain’s potential to inform their digital strategies,” O’Meara said. “We recognize it takes time to learn how to leverage a new technology. We started exploring the technology in early 2017, but we quickly recognized the technology’s value is in its application to business to business use cases: to help transparently share information, automate mutually-beneficial processes and audit interactions.”

Quest began discussing the potential for an alliance with the four other companies a year ago, O’Meara said. Each company shared traits that would allow them to prove the value of data sharing across company lines.

“While we have different perspectives, each member has deep expertise in healthcare technology, a collaborative culture, and desire to continuously improve the patient/customer experience,” said O’Meara. “We also recognize the value of technology in driving efficiencies and quality.”

Following its initial launch in April, Synaptic Health Alliance is deploying a multi-company, multi-site, permissioned blockchain. According to a whitepaper published by Synaptic Health, the choice to use a permissioned blockchain rather than an anonymous one is crucial to the alliance’s success.

“This is a more effective approach, consistent with enterprise blockchains,” an alliance representative wrote. “Each Alliance member has the flexibility to deploy its nodes based on its enterprise requirements. Some members have elected to deploy their nodes within their own data centers, while others are using secured public cloud services such as AWS and Azure. This level of flexibility is key to growing the Alliance blockchain network.”

As the pilot moves forward, O’Meara says the Alliance plans to open ability to other organizations. Earlier this week Aetna and Ascension announced they joined the project.

“I am personally excited by the amount of cross-company collaboration facilitated by this project,” O’Meara says. “We have already learned so much from each other and are using that knowledge to really move the needle on improving healthcare.”

November 29, 2018 | The US Department of Health and Human Services (HHS) is making waves in the blockchain space. The agency’s Division of Acquisition (DA) has developed a new system, called Accelerate, which gives acquisition teams detailed information on pricing, terms, and conditions across HHS in real-time. The department’s Associate Deputy Assistant Secretary for Acquisition, Jose Arrieta, gave a presentation and live demo of the blockchain-enabled system at the Distributed: Health event earlier this month in Nashville, Tennessee.

Accelerate is still in the prototype phase, Arrieta said, with hopes that the new system will be deployed at the end of the fiscal year.

HHS spends around $25 billion a year in contracts, Arrieta said. That’s 100,000 contracts a year with over one million pages of unstructured data managed through 45 different systems. Arrieta and his team wanted to modernize the system.

“But if you’re going to change the way a workforce of 20,000 people do business, you have to think your way through how you’re going to do that,” said Arrieta. “We didn’t disrupt the existing systems: we cannibalized them.”

The cannibalization process resulted in Accelerate. According to Arrieta, the system functions by creating a record of data rather than storing it, leveraging machine learning, artificial intelligence (AI), and robotic process automation (RPA), all through blockchain data.

“We’re using that data record as a mechanism to redesign the way we deliver services through micro-services strategies,” Arrieta said. “Why is that important? Because if you have a single application or data use that interfaces with 55 other applications in your business network, it becomes very expensive to make changes to one of the 55 applications.”

Accelerate distributes the data to the workforce, making it available to them one business process at a time.

“We’re building those business processes without disrupting the existing systems,” said Arrieta, and that’s key. “We’re not shutting off those systems. We’re using human-centered design sessions to rebuild value exchange off of that data.”

The first application for the system, Arrieta said, can be compared to department stores price-matching their online competitors.

It takes the HHS close to a month to collect the amalgamation of data from existing system, whether that be terms and conditions that drive certain price points, or software licenses.

“The micro-service we built actually analyzes that data, and provides that information to you within one second,” said Arrieta. “This is distributed to the workforce, to the 5,000 people that do the contracting, to the 15,000 people that actually run the programs at [HHS].”

This simple micro-service is replicated on every node related to HHS’s internal workforce. If somebody wants to change the algorithm to fit their needs, they can do that in a distributed manner.

Arrieta hopes to use Accelerate to save researchers money at the point of purchase. The program uses blockchain to simplify the process of acquisition.

“How many of you work with the federal government?” Arrieta asked the audience. “Do you get sick of reentering the same information over and over again? Every single business opportunity you apply for, you have to resubmit your financial information. You constantly have to check for validation and verification, constantly have to resubmit capabilities.”

Wouldn’t it be better to have historical notes available for each transaction? said Arrieta. This would allow clinical researchers to be able to focus on “the things they’re really good at,” instead of red tape.

“If we had the top cancer researcher in the world, would you really want her spending her time learning about federal regulations as to how to spend money, or do you want her trying to solve cancer?” Arrieta said. “What we’re doing is providing that data to the individual in a distributed manner so they can read the information of historical purchases that support activity, and they can focus on the objectives and risks they see as it relates to their programming and their objectives.”

Blockchain also creates transparency among researchers, Arrieta said, which says creates an “uncomfortable reality” in the fact that they have to make a decision regarding data, fundamentally changing value exchange.

“The beauty of our business model is internal investment,” Arrieta said. For instance, the HHS could take all the sepsis data that exists in their system, put it into a distributed ledger, and share it with an external source.

“Maybe that could fuel partnership,” Arrieta said. “I can make data available to researchers in the field in real-time so they can actually test their hypothesis, test their intuition, and test their imagination as it relates to solving real-world problems.”

Blockchain-based genomic data hub platform Shivom recently reached its $35 million hard cap within 15 seconds of opening its main token sale. Shivom received funding from a number of crypto VC funds, including Collinstar, Lateral, and Ironside.

The goal is to create the world’s largest store of genomic data while offering an open web marketplace for patients, data donors, and providers — such as pharmaceutical companies, research organizations, governments, patient-support groups, and insurance companies.

“Disrupting the whole of the health care system as we know it has to be the most exciting use of such large DNA datasets,” Shivom CEO Henry Ines told me. “We’ll be able to stratify patients for better clinical trials, which will help to advance research in precision medicine. This means we will have the ability to make a specific drug for a specific patient based on their DNA markers. And what with the cost of DNA sequencing getting cheaper by the minute, we’ll also be able to sequence individuals sooner, so young children or even newborn babies could be sequenced from birth and treated right away.”

While there are many solutions examining DNA data to explain heritage, intellectual capabilities, health, and fitness, the potential of genomic data has largely yet to be unlocked. A few companies hold the monopoly on genomic data and make sizeable profits from selling it to third parties, usually without sharing the earnings with the data donor. Donors are also not informed if and when their information is shared, nor do they have any guarantee that their data is secure from hackers.

Shivom wants to change that by creating a decentralized platform that will break these monopolies, democratizing the processes of sharing and utilizing the data.

“Overall, large DNA datasets will have the potential to aid in the understanding, prevention, diagnosis, and treatment of every disease known to mankind, and could create a future where no diseases exist, or those that do can be cured very easily and quickly,” Ines said. “Imagine that, a world where people do not get sick or are already aware of what future diseases they could fall prey to and so can easily prevent them.”

Shivom’s use of blockchain technology and smart contracts ensures that all genomic data shared on the platform will remain anonymous and secure, while its OmiX token incentivizes users to share their data for monetary gain.

Blockchain will secure the DNA database for 50 million citizens in the eighth-largest state in India. The government of Andhra Pradesh signed a Memorandum of Understanding with a German genomics and precision medicine start-up, Shivom, which announced to start the pilot project soon. The move falls in line with a trend for governments turning to population genomics, and at the same time securing the sensitive data through blockchain.

With regards to Andhra Pradesh, the start-up will first launch a trial to determine the viability of their technology for moving from a proactive to a preventive approach in medicine, and towards precision health. “Our partnership with Shivom explores the possibilities of providing an efficient way of diagnostic services to patients of Andhra Pradesh by maintaining the privacy of the individual data through blockchain technologies,” said J A Chowdary, IT Advisor to Chief Minister, Government of Andhra Pradesh.

Reporter and Curator: Dr. Sudipta Saha, Ph.D.

MicroRNAs (miRNAs) are a group of small non-coding RNA molecules that play a major role in posttranscriptional regulation of gene expression and are expressed in an organ-specific manner. One miRNA can potentially regulate the expression of several genes, depending on cell type and differentiation stage. They control every cellular process and their altered regulation is involved in human diseases. miRNAs are differentially expressed in the male and female gonads and have an organ-specific reproductive function. Exerting their affect through germ cells and gonadal somatic cells, miRNAs regulate key proteins necessary for gonad development. The role of miRNAs in the testes is only starting to emerge though they have been shown to be required for adequate spermatogenesis. In the ovary, miRNAs play a fundamental role in follicles’ assembly, growth, differentiation, and ovulation.

Deciphering the underlying causes of idiopathic male infertility is one of the main challenges in reproductive medicine. This is especially relevant in infertile patients displaying normal seminal parameters and no urogenital or genetic abnormalities. In these cases, the search for additional sperm biomarkers is of high interest. This study was aimed to determine the implications of the sperm miRNA expression profiles in the reproductive capacity of normozoospermic infertile individuals. The expression levels of 736 miRNAs were evaluated in spermatozoa from normozoospermic infertile males and normozoospermic fertile males analyzed under the same conditions. 57 miRNAs were differentially expressed between populations; 20 of them was regulated by a host gene promoter that in three cases comprised genes involved in fertility. The predicted targets of the differentially expressed miRNAs unveiled a significant enrichment of biological processes related to embryonic morphogenesis and chromatin modification. Normozoospermic infertile individuals exhibit a specific sperm miRNA expression profile clearly differentiated from normozoospermic fertile individuals. This miRNA cargo has potential implications in the individuals’ reproductive competence.

Circulating or “extracellular” miRNAs detected in biological fluids, could be used as potential diagnostic and prognostic biomarkers of several disease, such as cancer, gynecological and pregnancy disorders. However, their contributions in female infertility and in vitro fertilization (IVF) remain unknown. Polycystic ovary syndrome (PCOS) is a frequent endocrine disorder in women. PCOS is associated with altered features of androgen metabolism, increased insulin resistance and impaired fertility. Furthermore, PCOS, being a syndrome diagnosis, is heterogeneous and characterized by polycystic ovaries, chronic anovulation and evidence of hyperandrogenism, as well as being associated with chronic low-grade inflammation and an increased life time risk of type 2 diabetes. Altered miRNA levels have been associated with diabetes, insulin resistance, inflammation and various cancers. Studies have shown that circulating miRNAs are present in whole blood, serum, plasma and the follicular fluid of PCOS patients and that these might serve as potential biomarkers and a new approach for the diagnosis of PCOS. Presence of miRNA in mammalian follicular fluid has been demonstrated to be enclosed within microvesicles and exosomes or they can also be associated to protein complexes. The presence of microvesicles and exosomes carrying microRNAs in follicular fluid could represent an alternative mechanism of autocrine and paracrine communication inside the ovarian follicle. The investigation of the expression profiles of five circulating miRNAs (let-7b, miR-29a, miR-30a, miR-140 and miR-320a) in human follicular fluid from women with normal ovarian reserve and with polycystic ovary syndrome (PCOS) and their ability to predict IVF outcomes showed that these miRNAs could provide new helpful biomarkers to facilitate personalized medical care for oocyte quality in ART (Assisted Reproductive Treatment) and during IVF (In Vitro Fertilization).

Personalized Medicine – The California Initiative

Curator: Demet Sag, PhD, CRA, GCP

Are we there yet? Life is a journey so the science.

Governor Brown announced Precision Medicine initiative for California on April 14, 2015. UC San Francisco is hosting the two-year initiative, through UC Health, which includes UC’s five medical centers, with $3 million in startup funds from the state. The public-private initiative aims to leverage these funds with contributions from other academic and industry partners.

With so many campuses spread throughout the state and so much scientific, clinical and computational expertise, the UC system has the potential to bring it all together, said Atul Butte, MD, PhD, who is leading the initiative.

At the beginning of 2015 President Obama signed this initiative and assigned people to work on this project.

Previously NCI Director Harold Varmus, MD said that “Precision medicine is really about re-engineering the diagnostic categories for cancer to be consistent with its genomic underpinnings, so we can make better choices about therapy,” and “In that sense, many of the things we’re proposing to do are already under way.”

The proposed initiative has two main components:

a near-term focus on cancers and

a longer-term aim to generate knowledge applicable to the whole range of health and disease.

Both components are now within our reach because of advances in basic research, including molecular biology, genomics, and bioinformatics. Furthermore, the initiative taps into converging trends of increased connectivity, through social media and mobile devices, and Americans’ growing desire to be active partners in medical research.

Since the human genome is sequenced it became clear that actually there are few genes than expected and shared among organisms to accomplish same or similar core biological functions. As a result, knowledge of the biological role of such shared proteins in one organism can be transferred to another organism.

It was necessary to generate a dynamic yet controlled standardized collection of information with ever changing and accumulating data. It was called Gene Ontology Consortium. Three independent ontologies can be reached at (http://www.geneontology.org) developed based on :

biological process,

molecular function and

cellular component.

We need a common language for annotation for a functional conservation. Genesis of the grand biological unification made it possible to complete the genomic sequences of not only human but also the main model organisms and more:

· the budding yeast, Saccharomyces cerevisiae, completed in 1996

· the nematode worm Caenorhabditis elegans, completed in 1998

· the fruitfly Drosophila melanogaster,

· the flowering plant Arabidopsis thaliana

· fission yeast Schizosaccharomyces pombe

· the mouse , Mus musculus

On the other hand, as we know there are allelic variations that underlie common diseases and complete genome sequencing for many individuals with and without disease is required. However, there are advantages and disadvantages as we can carry out partial surveys of the genome by genotyping large numbers of common SNPs in genome-wide association studies but there are problems such as computing the data efficiently and sharing the information without tempering privacy. Therefore we should be mindful about few main conditions including:

models of the allelic architecture of commondiseases,

sample size,

map density and

sample-collection biases.

This will lead into the cost control and efficiency while identifying genuine disease-susceptibility loci. The genome-wide association studies (GWAS) have progressed from assaying fewer than 100,000 SNPs to more than one million, and sample sizes have increased dramatically as the search for variants that explain more of the disease/trait heritability has intensified.

In addition, we must translate this sequence information from genomics locus of the genes to function with related polymorphism of these genes so that possible patterns of the gene expression and disease traits can be matched. Then, we may develop precision technologies for:

Diagnostics

Targeted Drugs and Treatments

Biomarkers to modulate cells for correct functions

With the knowledge of:

gene expression variations

insight in the genetic contribution to clinical endpoints ofcomplex disease and

their biological risk factors,

share etiologic pathways

therefore, requires an understanding of both:

the structure and

the biology of the genome.

These studies demonstrated hundreds of associations of common genetic variants with over 80 diseases and traits collected under a controlled online resource. However, identifying published GWAS can be challenging as a simple PubMed search using the words “genome wide association studies” may be easily populated with un-relevant GWAS.

National Human Genome Research Institute (NHGRI) Catalog of Published Genome-Wide Association Studies (http://www.genome.gov/gwastudies), an online, regularly updated database of SNP-trait associations extracted from published GWAS was developed.

Therefore, sequencing of a human genome is a quite undertake and requires tools to make it possible:

The rapid increase in the number of GWAS provides an unprecedented opportunity to examine the potential impact of common genetic variants on complex diseases by systematically cataloging and summarizing key characteristics of the observed associations and the trait/disease associated SNPs (TASs) underlying them.

With this in mind, many forms can be established:

to describe the features of this resource and the methods we have used to produce it,

to provide and examine key descriptive characteristics of reported TASs such as estimated risk allele frequencies and odds ratios,

to examine the underlying functionality of reported risk loci by mapping them to genomic annotation sets and assessing overrepresentation via Monte Carlo simulations and

to investigate the relationship between recent human evolution and human disease phenotypes.

This procedure has no clear path so there are several obstacles in the actual functional variant that is often unknown. This may be due to:

trait/disease associated SNPs (TASs),

a well known SNP+ strong linkage disequilibrium (LD) with the TAS,

an unknown common SNP tagged by a haplotype

rare single nucleotide variant tagged by a haplotype on which the TAS occurs, or

Copy Number variation (CNV), a linked copy number variant.

There can be other factors such as

Evolution,

Natural Selection

Environment

Pedigree

Epigenetics

Even though heritage is another big factor, the concept of heritability and its definition as an estimable, dimensionless population parameter as introduced by Sewall Wright and Ronald Fisher almost a century ago.

As a result, heritability gain interest since it allows us to compare of the relative importance of genes and environment to the variation of traits within and across populations. The heritability is an ongoing mechanism and remains as a key:

bThe associated allele is the SNP associated with disease, regardless of whether it is the derived or the ancestral allele. The frequencies for this allele are given.

cThe reference that claims this to be a reproducible association, as well as the reference from which the allele frequencies were taken. For allele frequencies obtained from a meta-analysis, only the reference claiming reproducible association is given.

dAllele frequency obtained from the literature involving a European population. Either the general population frequency or the frequency in control groups in an association study was used. To reduce bias, when a control frequency was used for Europeans, a control frequency was also used for Africans. The total number of chromosomes surveyed is given in parentheses after each frequency.

eAllele frequency obtained from the literature involving a West African population. The total number of chromosomes surveyed is given in parentheses after each frequency.

fδ = The difference in the allele frequency between Europeans and Africans.

bThe associated allele is the SNP associated with disease, regardless of whether it is the derived or the ancestral allele. The frequencies for this allele are given.

cThe reference that reported association with the listed disease/phenotype.

dFrequency obtained from the Seattle SNPs database for the European sample. The total number of chromosomes surveyed is given in parentheses after each frequency.

eFrequency obtained from the Seattle SNPs database for the African American sample. The total number of chromosomes surveyed is given in parentheses after each frequency.

fδ = The difference in the allele frequency between African Americans and Europeans.

gAssociated allele in database is A.

hAssociated allele in reference is A.

iThis SNP was not from the Seattle SNPs database; instead, allele frequencies from Begovich et al. (2004) were used.

They reported that “The SNPs associated with common disease that we investigated do not show much higher levels of differentiation than those of random SNPs. Thus, in these cases, ethnicity is a poor predictor of an individual’s genotype, which is also the pattern for random variants in the genome. This lends support to the hypothesis that many population differences in disease risk are environmental, rather than genetic, in origin. However, some exceptional SNPs associated with common disease are highly differentiated in frequency across populations, because of either a history of random drift or natural selection. The exceptional SNPs are located in AGT, DRD3, ALOX5AP, ICAM1, IL1B, IL4, IL6, IL8, and PON1. Of note, evidence of selection has been observed for AGT (Nakajima et al. 2004), IL4(Rockman et al. 2003), IL8 (Hull et al. 2001), and PON1 (Allebrandt et al. 2002). Yet, for the vast majority of the common-disease–associated polymorphisms we examined, ethnicity is likely to be a poor predictor of an individual’s genotype.”

In 2002The International HapMap Project was launched:

to provide a public resource

to accelerate medical genetic research.

Two Hapmap projects were completed. In phase I the objective was to genotype at least one common SNP every 5 kilobases (kb) across the euchromatic portion of the genome in 270 individuals from four geographically diverse population. In Phase II of the HapMap Project, a further 2.1 million SNPs were successfully genotyped on the same individuals.

The re-mapping of SNPs from Phase I of the project identified 21,177 SNPs that had an ambiguous position or some other feature indicative of low reliability; these are not included in the filtered Phase II data release. All genotype data are available from the HapMap Data Coordination Center (http://www.hapmap.org) and dbSNP (http://www.ncbi.nlm.nih.gov/SNP).

In the Phase II HapMap we identified 32,996 recombination hotspots3,6,36 (an increase of over 50% from Phase I) of which 68% localized to a region of≤5 kb. The median map distance induced by a hotspot is 0.043 cM (or one crossover per 2,300 meioses) and the hottest identified, on chromosome 20, is 1.2 cM (one crossover per 80 meioses). Hotspots account for approximately 60% of recombination in the human genome and about 6% of sequence (Supplementary Fig. 6).

In addition to many previously identified regions in HapMap Phase I including LARGE, SYT1 andSULT1C2 (previously called SULT1C1), about 200 regions identified from the Phase II HapMap that include many established cases of selection, such as the genes HBB andLCT, the HLA region, and an inversion on chromosome 17. Finally, in the future, whole-genome sequencing will provide a natural convergence of technologies to type both SNP and structural variation. Nevertheless, until that point, and even after, the HapMap Project data will provide an invaluable resource for understanding the structure of human genetic variation and its link to phenotype.

HMM libraries, such as PANTHER, Pfam, and SMART, are used primarily to recognize and annotate conserved motifs in protein sequences.

In the genomic era, one of the fundamental goals is to characterize the function of proteins on a large scale.

PANTHER, for relating protein sequence relationships to function relationships in a robust and accurate way under two main parts:

the PANTHER library (PANTHER/LIB)- collection of “books,” each representing a protein family as a multiple sequence alignment, a Hidden Markov Model (HMM), and a family tree.

the PANTHER index (PANTHER/X)- ontology for summarizing and navigating molecular functions and biological processes associated with the families and subfamilies.

PANTHER can be applied on three areas of active research:

to report the size and sequence diversity of the families and subfamilies, characterizing the relationship between sequence divergence and functional divergence across a wide range of protein families.

use the PANTHER/X ontology to give a high-level representation of gene function across the human and mouse genomes.

to rank missense single nucleotide polymorphisms (SNPs), on a database-wide scale, according to their likelihood of affecting protein function.

PRINTS is a compendium of protein motif ‘fingerprints’. A fingerprint is defined as a group of motifs excised from conserved regions of a sequence alignment, whose diagnostic power or potency is refined by iterative databasescanning (in this case the OWL composite sequence database).

The information contained within PRINTS is distinct from, but complementary to the consensus expressions stored in the widely-used PROSITE dictionary of patterns.

However, the position-specific amino acid probabilities in an HMM can also be used to annotate individual positions in a protein as being conserved (or conserving a property such as hydrophobicity) and therefore likely to be required for molecular function. For example, a mutation (or variant) at a conserved position is more likely to impact the function of that protein.

In addition, HMMs from different subfamilies of the same family can be compared with each other, to provide hypotheses about which residues may mediate the differences in function or specificity between the subfamilies.

Several computational algorithms and databases for comparing protein sequences developed and matured:

The profile has a different amino acid substitution vector at each position in the profile, based on the pattern of amino acids observed in a multiple alignment of related sequences.

Profile methods combine algorithms with databases: A group of related sequences is used to build a statistical representation of corresponding positions in the related proteins. The power of these methods therefore increases as new sequences are added to the database of known proteins.

Multiple sequence alignments (Dayhoff et al. 1974) and profiles have allowed a systematic study of related sequences. One of the key observations is that some positions are “conserved,” that is, the amino acid is invariant or restricted to a particular property (such as hydrophobicity), across an entire group of related sequences.

The dependence of profile and pattern-matching approaches (Jongeneel et al. 1989) on sequence databases led to the development of databases of profiles

The PANTHER/LIB HMMs can be viewed as a statistical method for scoring the “functional likelihood” of different amino acid substitutions on a wide variety of proteins. Because it uses evolutionarily related sequences to estimate the probability of a given amino acid at a particular position in a protein, the method can be referred to as generating “position-specific evolutionary conservation” (PSEC) scores.

Schematic illustration of the process for building PANTHER families.

Family clustering.

Multiple sequence alignment (MSA), family HMM, and family tree building.

Family/subfamily definition and naming.

Subfamily HMM building.

Molecular function and biological process association.

Of these, steps 1, 2, and 4 are computational, and steps 3 and 5 are human-curated (with the extensive aid of software tools).

The Future of Translational Medicine with Smart Diagnostics and Therapies: PharmacoGenomics

Curator: Demet Sag, PhD

Since Human Genome project is completed we saw several projects to understand function and how they relate to personal health. These advancements hope to improve diagnostics in preventive medicine. The future of medicine may involve a personal wireless unit to detect the vital records with genomics changes and compare the assumed “healthy” state to “unhealthy” to suggest options to treat in a palm of hand.

Pharmacogenomics is the study of how genes affect a person’s response to drugs. This relatively new field combines pharmacology (the science of drugs) and genomics (the study of genes and their functions) to develop effective, safe medications and doses that will be tailored to a person’s genetic makeup.

The American Medical Association and Critical Path Institute and the Arizona Center for Education and Research on Therapeutics developed a brochure for health care providers on pharmacogenomics. The man purpose is to help physicians to ue this information correctly by case based approach. View an electronic version of the brochure.

Like always, there are debates and controversies but the positives outweighs the negatives in this case such as some patients with the same gene abnormality may not benefit due to his or her deficiency or polymorphisms in another connected gene so it is a system approach including origin of pathways during development. There is nothing simply white or black but like Goethe said “there are shades of gray”. This shade is light compared to one size fits all drug making.

The main idea is create safer, effective and perfect dose medication to gain health for a quality life with less expense but more beneficial outcomes.

At the same token these developments decreases the cost of making drugs since they are specific to a small population or group so there are less clinical trial time, less time for approval, less adverse affects.

Functional genomics suggests how piece of information utilized in body in a nut shell. However, use of these knowledge to develop new drugs created a new area called Pharmacogenomics. Thus, FDA included the terminology for drug labeling that contain biomarkers along with several other factors containing variation of clinical response to drug exposure, possible side or adverse effects, genotype-specific dosing, drug action mechanism, polymorphic drug target and disposition genes.

What can be on the label: Age, Sex, Origin/Ethinicity (Asian, Caucasian, African, South Asian), gene of interest, possible SNPs, variation/polymorphisms warnings, dose etc.

Here are the FDA-approved drugs with pharmacogenomic information in their labeling:

Based on “Forging a path from companion diagnostics to holistic decision support”, L.E.K.

Companion diagnostics and their companion therapies is defined here as a method enabling

LIKELY responders to therapies that are specific for patients with ma specific molecular profile.

The result of this statement is that the diagnostics permitted to specific patient types gives access to

novel therapies that may otherwise not be approve or reimbursed in other, perhaps “similar” patients

who lack a matching identification of the key identifier(s) needed to permit that therapy,

thus, entailing a poor expected response.

The concept is new because:

(1) The diagnoses may be closely related by classical criteria, but at the same time they are
not alike with respect to efficacy of treatment with a standard therapy.
(2) The companion diagnostics is restricted to dealing with a targeted drug-specific question
without regard to other clinical issues.
(3) The efficacy issue it clarifies is reliant on a deep molecular/metabolic insight that is not available, except through
emergent genomic/proteomic analysis that has become available and which has rapidly declining cost to obtain.

The limitation example given is HER2 testing for use of Herceptin in therapy for non-candidates (HER2 negative patients).
The problem is that the current format is a “one test/one drug” match, but decision support may require a combination of

validated biomakers obtained on a small biopsy sample (technically manageable) with confusing results.

While HER2 negative patients are more likely to be pre-menopausal with a more aggressive tumor than postmenopausal,

the HER2 negative designation does not preclude treatment with Herceptin.

So the Herceptin would be given in combination, but with what other drug in a non-candidate?

The point that L.E.K. makes is that providing highly validated biomarkers linked to approved therapies, it is necessary to pursue more holistic decision support tests that interrogate multiple biomarkers (panels of companion diagnostic markers) and discovery of signatures for treatments that are also used with a broad range of information, such as,

traditional tests,

imaging,

clinical trials,

outcomes data,

EMR data,

reimbursement and coverage data.

A comprehensive solution of this nature appears to be a distance from realization. However, is this the direction that will lead to tomorrows treatment decision support approaches?

Surveying the Decision Support Testing Landscape

As a starting point, L.E.K. characterized the landscape of available tests in the U.S. that inform treatment decisions compiled from ~50 leading diagnostics companies operating in the U.S. between 2004-2011. L.E.K. identified more than 200 decision support tests that were classified by test purpose, and more specifically, whether tests inform treatment decisions for a single drug/class (e.g., companion diagnostics) vs. more holistic treatment decisions across multiple drugs/classes (i.e., multiagent response tests).

Treatment Decision Support Tests

Companion DiagnosticsSingle drug/class
Predict response/safety or guide dosing of a single drug or class

For descriptive purposes only, may not map to exact regulatory labeling

Most tests are companion diagnostics and other decision support tests that provide guidance on

single drug/class therapy decisions.

However, holistic decision support tests (e.g., multi-agent response) are growing the fastest at 56% CAGR.
The emergence of multi-agent response tests suggests diagnostics companies are already seeing the need to aggregate individual tests (e.g., companion diagnostics) into panels of appropriate markers addressing a given clinical decision need. L.E.K. believes this trend is likely to continue as

increasing numbers of biomarkers become validated for diseases and multiplexing tools

enabling the aggregation of multiple biomarker interrogations into a single test

to become deployed in the clinic.

Personalized Medicine Partnerships

L.E.K. also completed an assessment of publicly available personalized medicine partnership activity from 2009-2011 for ~150 leading organizations operating in the U.S. to look at broader decision support trends and emergence of more holistic solutions beyond diagnostic tests.

Survey of partnerships deals was conducted for

top-10 academic medical centers research institutions,

top-25 biopharma,

top-four healthcare IT companies,

top-three healthcare imaging companies,

top-20 IVD manufacturers,

top-20 laboratories,

top-10 payers/PBMs,

top-15 personalized healthcare companies,

top-10 regulatory/guideline entities, and

top-20 tools vendors for the period of 01/01/2009 – 12/31/2011.
Source: Company websites, GenomeWeb, L.E.K. analysis

Across the sample we identified 189 publicly announced partnerships of which ~65% focused on more traditional areas (biomarker discovery, companion diagnostics and targeted therapies). However, a significant portion (~30%) included elements geared towards creating more holistic decision support models.

Partnerships categorized as holistic decision support by L.E.K. were focused on

Implications

L.E.K. believes the likely debate won’t center on which models and companies will prevail. It appears that the industry is now moving along the continuum to a truly holistic capability.
The mainstay of personalized medicine today will become integrated and enhanced by other data.

The companies that succeed will be able to capture vast amounts of information

and synthesize it for personalized care.

Holistic models will be powered by increasingly larger datasets and sophisticated decision-making algorithms.
This will require the participation of an increasingly broad range of participants to provide the

Gil David and Larry Bernstein have developed, in consultation with Prof. Ronald Coifman,
in the Yale University Applied Mathematics Program,

A software system that is the equivalent of an intelligent Electronic Health Records Dashboard that

provides empirical medical reference and

suggests quantitative diagnostics options.

The current design of the Electronic Medical Record (EMR) is a linear presentation of portions of the record

by services

by diagnostic method, and

by date, to cite examples.

This allows perusal through a graphical user interface (GUI) that partitions the information or necessary reports

in a workstation entered by keying to icons.

This requires that the medical practitioner finds the

history,

medications,

laboratory reports,

cardiac imaging and

EKGs, and

radiology in different workspaces.

The introduction of a DASHBOARD has allowed a presentation of

drug reactions

allergies

primary and secondary diagnoses, and

critical information

about any patient the care giver needing access to the record.

The advantage of this innovation is obvious. The startup problem is what information is presented and

how it is displayed, which is a source of variability and a key to its success.

We are proposing an innovation that supercedes the main design elements of a DASHBOARD and utilizes

the conjoined syndromic features of the disparate data elements.

So the important determinant of the success of this endeavor is that

it facilitates both the workflow and the decision-making process with a reduction of medical error.

Continuing work is in progress in extending the capabilities with model datasets, and sufficient data because

the extraction of data from disparate sources will, in the long run, further improve this process.

For instance, the finding of both ST depression on EKG coincident with an elevated cardiac biomarker (troponin), particularly in the absence of substantially reduced renal function. The conversion of hematology based data into useful clinical information requires the establishment of problem-solving constructs based on the measured data.

The most commonly ordered test used for managing patients worldwide is the hemogram that often incorporates

the review of a peripheral smear.

While the hemogram has undergone progressive modification of the measured features over time the subsequent expansion of the panel of tests has provided a window into the cellular changes in the

production

release

or suppression

of the formed elements from the blood-forming organ into the circulation. In the hemogram one can view

data reflecting the characteristics of a broad spectrum of medical conditions.

Progressive modification of the measured features of the hemogram has delineated characteristics expressed as measurements of

size

density, and

concentration,

resulting in many characteristic features of classification. In the diagnosis of hematological disorders

proliferation of marrow precursors, the

domination of a cell line, and features of

suppression of hematopoiesis

provide a two dimensional model. Other dimensions are created by considering

the maturity of the circulating cells.

The application of rules-based, automated problem solving should provide a valid approach to

the classification and interpretation of the data used to determine a knowledge-based clinical opinion.

The exponential growth of knowledge since the mapping of the human genome enabled by parallel advances in applied mathematics that have not been a part of traditional clinical problem solving.

As the complexity of statistical models has increased

the dependencies have become less clear to the individual.

Contemporary statistical modeling has a primary goal of finding an underlying structure in studied data sets.
The development of an evidence-based inference engine that can substantially interpret the data at hand and

convert it in real time to a “knowledge-based opinion”

could improve clinical decision-making by incorporating

multiple complex clinical features as well as duration of onset into the model.

An example of a difficult area for clinical problem solving is found in the diagnosis of SIRS and associated sepsis. SIRS (and associated sepsis) is a costly diagnosis in hospitalized patients. Failure to diagnose sepsis in a timely manner creates a potential financial and safety hazard. The early diagnosis of SIRS/sepsis is made by the application of defined criteria by the clinician.

temperature

heart rate

respiratory rate and

WBC count

The application of those clinical criteria, however, defines the condition after it has developed and

has not provided a reliable method for the early diagnosis of SIRS.

The early diagnosis of SIRS may possibly be enhanced by the measurement of proteomic biomarkers, including

transthyretin

C-reactive protein

procalcitonin

mean arterial pressure

Immature granulocyte (IG) measurement has been proposed as a

readily available indicator of the presence of granulocyte precursors (left shift).

The use of such markers, obtained by automated systems

in conjunction with innovative statistical modeling, provides

a promising approach to enhance workflow and decision making.

Such a system utilizes the conjoined syndromic features of

disparate data elements with an anticipated reduction of medical error.

How we frame our expectations is so important that it determines

the data we collect to examine the process.

In the absence of data to support an assumed benefit, there is no proof of validity at whatever cost.
This has meaning for

hospital operations,

for nonhospital laboratory operations,

for companies in the diagnostic business, and

for planning of health systems.

The problem stated by LL WEED in “Idols of the Mind” (Dec 13, 2006): “ a root cause of a major defect in the health care system is that, while we falsely admire and extol the intellectual powers of highly educated physicians, we do not search for the external aids their minds require”. HIT use has been

focused on information retrieval, leaving

the unaided mind burdened with information processing.

We deal with problems in the interpretation of data presented to the physician, and how through better

design of the software that presents this data the situation could be improved.

The computer architecture that the physician uses to view the results is more often than not presented

as the designer would prefer, and not as the end-user would like.

In order to optimize the interface for physician, the system would have a “front-to-back” design, with
the call up for any patient ideally consisting of a dashboard design that presents the crucial information

that the physician would likely act on in an easily accessible manner.

The key point is that each item used has to be closely related to a corresponding criterion needed for a decision.

Feature Extraction.

This further breakdown in the modern era is determined by genetically characteristic gene sequences
that are transcribed into what we measure. Eugene Rypka contributed greatly to clarifying the extraction
of features in a series of articles, which

set the groundwork for the methods used today in clinical microbiology.

The method he describes is termed S-clustering, and

will have a significant bearing on how we can view laboratory data.

He describes S-clustering as extracting features from endogenous data that

a truth table, and each variable is scaled to assign values for each: message choice.

The number of messages and the number of choices forms an N-by N table. He points out that the message

choice in an antibody titer would be converted from 0 + ++ +++ to 0 1 2 3.

Even though there may be a large number of measured values, the variety is reduced

by this compression, even though there is risk of loss of information.

Yet the real issue is how a combination of variables falls into a table with meaningful information. We are concerned with accurate assignment into uniquely variable groups by information in test relationships. One determines the effectiveness of each variable by

its contribution to information gain in the system.

The reference or null set is the class having no information. Uncertainty in assigning to a classification is

only relieved by providing sufficient information.

The possibility for realizing a good model for approximating the effects of factors supported by data used

for inference owes much to the discovery of Kullback-Liebler distance or “information”, and Akaike

found a simple relationship between K-L information and Fisher’s maximized log-likelihood function.

In the last 60 years the application of entropy comparable to

the entropy of physics, information, noise, and signal processing,

has been fully developed by Shannon, Kullback, and others, and has been integrated with modern statistics,

as a result of the seminal work of Akaike, Leo Goodman, Magidson and Vermunt, and work by Coifman.

Gil David et al. introduced an AUTOMATED processing of the data available to the ordering physician and

can anticipate an enormous impact in diagnosis and treatment of perhaps half of the top 20 most common

Realtime Clinical Expert Support and validation System

We have developed a software system that is the equivalent of an intelligent Electronic Health Records Dashboard that provides empirical medical reference and suggests quantitative diagnostics options.

The primary purpose is to

gather medical information,

generate metrics,

analyze them in realtime and

provide a differential diagnosis,

meeting the highest standard of accuracy.

The system builds its unique characterization and provides a list of other patients that share this unique profile, therefore utilizing the vast aggregated knowledge (diagnosis, analysis, treatment, etc.) of the medical community. The

main mathematical breakthroughs are provided by accurate patient profiling and inference methodologies

in which anomalous subprofiles are extracted and compared to potentially relevant cases.

As the model grows and its knowledge database is extended, the diagnostic and the prognostic become more accurate and precise. We anticipate that the effect of implementing this diagnostic amplifier would result in

higher physician productivity at a time of great human resource limitations,

The main benefit is a real time assessment as well as diagnostic options based on

comparable cases,

flags for risk and potential problems

as illustrated in the following case acquired on 04/21/10. The patient was diagnosed by our system with severe SIRS at a grade of 0.61 .

The patient was treated for SIRS and the blood tests were repeated during the following week. The full combined record of our system’s assessment of the patient, as derived from the further hematology tests, is illustrated below. The yellow line shows the diagnosis that corresponds to the first blood test (as also shown in the image above). The red line shows the next diagnosis that was performed a week later.