Objective

The vision of IASIS is to turn the wave of data heading our way into actionable knowledge for decision makers. This will be achieved by integrating data from disparate sources, including genomics, electronic health records and bibliography, and applying advanced analytics methods to discover useful patterns. Big Data in healthcare is in its early days, and most of the potential for value creation is being unclaimed. One of the main challenges is the analysis of acquired data. While information is becoming ever easier to obtain, the infrastructure to collect, integrate, share, and mine the data remains lacking. These data are an invaluable resource for deriving insights to improve decision and policy making. The goal is to turn these large amounts of data into actionable information to authorities for planning public health activities and policies. The integration and analysis of these heterogeneous sources of information will enable the best decisions to be made, allowing for diagnosis and treatment to be personalised to each individual. IASIS aims to pave the way towards comprehensive access to data from disparate sources and the results of analysis, in the form of actionable knowledge for policy-making. The project will offer a common representation schema for the heterogeneous data sources. The infrastructure will be able to convert clinical notes into usable data, combine them with genomic data, related bibliography, image data and more, and create a global knowledge base. This will facilitate the use of intelligent methods in order to discover useful patterns across different resources. Using semantic integration of data will give the opportunity to generate information that is rich, auditable and reliable. This information can be used to provide better care, reduce errors and create more confidence in sharing data, thus providing more insights and opportunities. Data resources for two different disease categories will be explored, dementia and lung cancer.

Periodic Reporting for period 1 - IASIS (Integration and analysis of heterogeneous big data for precision medicine and suggested treatments for different types of patients)

The use of big data in healthcare is in its early days, and most of the potential for value creation remains unclaimed. Towards this direction, iASiS aims to enable comprehensive access to data from disparate sources and results of analysis, in order to produce actionable knowledge for policy-making, within the domain of personalised medicine. The project is developing a system that collects, integrates, and analyses big data from disparate sources, providing useful insights and high-level analysis on an aggregated knowledge graph.Given the above, the specific objectives of iASiS are:a) to design a unified conceptual schema to represent all the diverse sources of available data,b) to build an adaptive system able to manage data and content collected incrementally,c) to provide actionable knowledge about disease diagnosis, prognosis, and treatment to decision makers,d) to promote cooperation among clinicians and policy makers, ande) to define privacy- and trust-aware strategies for the use of the data and the discovered knowledge.

A. Data Acquisition and Unified RepresentationThe first project months the available data sources were selected, defining pilot plans and collecting end-user requirements for the two project use cases (Dementia and Lung Cancer). The data exploited in iASiS include:• Electronic Health Records (EHRs) and medical images from patients. • Genomic data. Acquired from various sources, such as the EGA, NCBI ClinVar and COSMIC.• Open literature data from PubMed and open structured data from various databases and ontologies.Specialized tools were developed for harvesting, semantic indexing and performing an initial analysis of all these datasets. The focus was on establishing interoperability across the different datasets.Critical to the achievement of interoperability was the definition and implementation of a unified schema, in the form of the iASiS Knowledge Graph (KG) (figure 1). By bringing different sources of data together, the KG lends itself to further, high-level analysis. Such analysis was performed, aiming to discover latent causal relations between biomedical entities of different sources and develop powerful medical inference tools.

B. Development of the First PrototypeThe first prototype of the iASiS platform has been designed and implemented, including an adaptive Graphical User Interface that can manage aggregated data and analysis results. Thus, it provides user-friendly access to collected, integrated and analysed data and additionally it provides links to outcomes and analysis results. Based on this information, users can efficiently and transparently make decisions that are tailored to individuals (figure 2).

C. Addressing Real User NeedsThe usability and usefulness of the iASiS platform was guaranteed by the continuous involvement of users in all stages of the development process. This process started in the first months of the project, with the identification of user requirements. The process of deriving the user requirements was triggered through illustrative schematic scenarios that have been developed for both pilots, highlighting desirable features and interactions.

As an example, one of the scenarios developed for the lung cancer pilot includes the identification of patterns in the data of long-surviving lung cancer patients. Similarly, an example scenario for the dementia pilot, studies the relation of symptomatic treatments within a given class of drugs with different patient types, based on patient’s genetic (allelic) status. D. Data PrivacyThe data management plans for both use cases have been developed. These plans describe how data processing in iASiS will respect the policies associated with each data source, adhering also to the EU data regulations. This process is greatly assisted by the Ethics Committee of the project, which was established in the early stages. The Committee is led by an external expert and it reviews the pertinent procedures, permissions and documents together with the privacy and trust-aware strategies.

A. Progress beyond the state of the artDuring this period, the integration of data from the various sources has been achieved. To this end, a unified schema has been defined and the KG has been implemented, which semantically connects all available knowledge. Innovative methods and technologies have been applied to different types of data.In particular, novel NLP techniques have been applied to extract rich knowledge encoded in free text in EHRs, in order to integrate the results into the KG. Those techniques reconstruct the medical history of each patient, with the use of semantic annotators for entity recognition.Moreover, an innovative module for extracting semantic (2D and 3D) and agnostic features (deep features) from CT images has been implemented and applied to an open access image database. The extracted features are used in a predictive modelling process, using Convolutional Neural Network models to search for patterns that support the discrimination of malignant and non-malignant nodules.Concerning genomics, by combining large-scale data on genetic variants which affect the expression of distant genes (“trans- eQTLs”), with information on protein-RNA interactions and clinically relevant genomic variation, several candidate molecular interactions of interest have been identified that may have impact to diseases studied in the project.Lastly, concerning open datasets, text mining and machine learning techniques have been adopted and extended, in order to analyse biomedical literature and combine it with knowledge from structured databases. A textual data analysis module has been developed to provide risk assessment regarding the Alzheimer’s disease for patients that have participated in a cognitive awareness task.On top of the iASiS Knowledge Graph (KG), mining tools that extract knowledge and uncover unknown patterns from the combination of the aforementioned data have been developed. These mining techniques extend existing community detection approaches, to exploit semantics encoded in the KG, while scalability and efficiency are enforced. B. Expected results until the end of the ProjectBy the end of the project, the consortium is planning to integrate knowledge from more datasets and ontologies. All individual modules will be extended. The platform second prototype, will incorporate changes, based on the user evaluation, providing more functionalities, which will hopefully lead to better insights for personalised diagnosis and treatment. C. Potential ImpactsiASiS aims at a significant impact on the EU healthcare system, ICT industry, and generally the wider society. The iASiS platform can provide an important tool supporting patients’ treatment, providing useful knowledge to the medical professionals. Moreover, the project results, in the form of patterns and trend detection, can support authorities for better planning of public health activities and public health strategy.

ESWC 2018 Project Networking Track Title: iASiS: Big Data for Supporting Precision Medicine and Public Health Policy-Making Description: This extended abstract describes the objectives of the iASiS project, the main characteristics of the iASiS knowledge graph and the components of the iASiS framework. Resources that will be made available to the Semantic Web community are described as well.

GARUM: A Semantic Similarity Measure Based on Machine Learning and Entity Characteristics Description: This paper presents GARUM, a semantic similarity measure that resorts to machine learning methods for learning the most suitable characteristics to consider during the computation of the relatedness of two entities. Knowledge graphs are used to represent the entities, and semantics encoded in the

Author(s):
Ignacio Traverso-Ribon, Maria-Esther Vidal

Published in:
2018

AAIC '18 - Alzheimer's Association International Conference iASiS: Big Data for Precision Medicine. Automatic Detection of Linguistic Indicators as a Means of Early Prediction of Alzheimer's and of Related Dementias