Big Data Analytics for Unstructured Patient Data

Turning unstructured patient data into meaningful clinical insights

Executive SummaryMuch clinical information, such as age and gender, is available in structured formats. But often, the information that can tell doctors the most about a patient's condition is in unstructured formats. This includes patients’ free-text clinical notes (e.g., nursing notes, lab reports, and radiology reports), which are generally difficult and time-consuming to analyze. A critical part of providing personalized medicine is being able to efficiently use this unstructured data to provide clinical insights.

Intel® Distribution for Apache Hadoop* software, GraphBuilder*, and GraphLab* are distributed computation frameworks that make it possible to analyze free-text clinical notes in a scalable, efficient way. Both Intel Distribution for Apache Hadoop software and GraphBuilder rely on the independence among patients’ records to provide a data-parallel solution for preprocessing, formatting, and normalization. GraphLab uses the dependencies in post-processed records to derive meaningful insights in a parallel way. Together, these technologies can help alleviate the bottleneck of creating topic models documented in the research article "Topic Models for Mortality Modeling in Intensive Care Units.”

Veuillez nous excuser, ce PDF peut uniquement être téléchargé

Distributed Systems for Clinical Data Analysis

Executive SummaryMuch clinical information, such as age and gender, is available in structured formats. But often, the information that can tell doctors the most about a patient's condition is in unstructured formats. This includes patients’ free-text clinical notes (e.g., nursing notes, lab reports, and radiology reports), which are generally difficult and time-consuming to analyze. A critical part of providing personalized medicine is being able to efficiently use this unstructured data to provide clinical insights.

Intel® Distribution for Apache Hadoop* software, GraphBuilder*, and GraphLab* are distributed computation frameworks that make it possible to analyze free-text clinical notes in a scalable, efficient way. Both Intel Distribution for Apache Hadoop software and GraphBuilder rely on the independence among patients’ records to provide a data-parallel solution for preprocessing, formatting, and normalization. GraphLab uses the dependencies in post-processed records to derive meaningful insights in a parallel way. Together, these technologies can help alleviate the bottleneck of creating topic models documented in the research article "Topic Models for Mortality Modeling in Intensive Care Units.”