Intracerebral hemorrhage (ICH), where a blood vessel ruptures into areas of the brain, accounts for approximately 10-15% of all strokes. X-ray computed tomography (CT) scanning is largely used to assess the location and volume of these hemorrhages.

Manual segmentation of the CT scan using planimetry by an expert reader is the gold standard for volume estimation, but is time-consuming and has within- and across-reader variability. We propose a fully automated segmentation approach using a random forest algorithm with features extracted from X-ray computed tomography (CT) scans.

Methods

The Minimally Invasive Surgery plus rt-PA in ICH Evacuation (MISTIE) trial was a multi-site Phase II clinical trial that tested the safety of hemorrhage removal using recombinant-tissue plasminogen activator (rt-PA). For this analysis, we use 112 baseline CT scans from patients enrolled in the MISTE trial, one CT scan per patient. ICH was manually segmented on these CT scans by expert readers.

We derived a set of imaging predictors from each scan. Using 10 scans, we used a first-pass voxel selection procedure based on quantiles of a set of predictors and then built 4 models estimating the voxel- level probability of ICH. The models used were: 1) logistic regression, 2) logistic regression with a penalty on the model parameters using LASSO, 3) a generalized additive model (GAM) and 4) a random forest classifier. The remaining 102 scans were used for model validation.

For each validation scan, the model predicted the probability of ICH at each voxel. These voxel-level probabilities were then thresholded to produce binary segmentations of ICH hemorrhage. These masks were compared to the manual segmentations using the Dice Similarity Index (DSI) and the correlation of ICH volume of between the two segmentations. We tested equality of median DSI using the Kruskal-Wallis test across the 4 models. We tested equality of the median DSI from sets of 2 models using a Wilcoxon signed-rank test.

Results

All results presented are for the 102 scans in the validation set. The median DSI for each model was: 0.89 (logistic), 0.885 (LASSO), 0.88 (GAM), and 0.899 (random forest). Using the random forest results in a slightly higher median DSI compared to the other models. After Bonferroni correction, the hypothesis of equality of median DSI was rejected only when comparing the random forest DSI to the DSI from the logistic (p < 0.001), LASSO (p < 0.001), or GAM (p < 0.001) models. In practical terms the difference between the random forest and the logistic regression is quite small. The correlation (95% CI) between the volume from manual segmentation and the predicted volume was 0.93 (0.9, 0.95) for the random forest model. These results indicate that random forest approach can achieve accurate segmentation of ICH in a population of patients from a variety of imaging centers. We provide an R package and a webpage for implementing and testing the proposed approach.