Slides

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Description

Stroke is the third leading cause of death and the principal
cause of serious long-term disability in the United States.
Accurate prediction of stroke is highly valuable for early intervention
and treatment. In this study, we compare the
Cox proportional hazards model with a machine learning
approach for stroke prediction on the Cardiovascular Health
Study (CHS) dataset. Specifically, we consider the common
problems of data imputation, feature selection, and prediction
in medical datasets. We propose a novel automatic feature
selection algorithm that selects robust features based
on our proposed heuristic: conservative mean. Combined
with Support Vector Machines (SVMs), our proposed feature
selection algorithm achieves a greater area under the
ROC curve (AUC) as compared to the Cox proportional hazards
model and L1 regularized Cox model. Furthermore, we
present a margin-based censored regression algorithm that
combines the concept of margin-based classifiers with censored
regression to achieve a better concordance index than
the Cox model. Overall, our approach outperforms the current
state-of-the-art in both metrics of AUC and concordance
index. In addition, our work has also identified potential
risk factors that have not been discovered by traditional
approaches. Our method can be applied to clinical prediction
of other diseases, where missing data are common and
risk factors are not well understood.