The Seizure Prediction competition—hosted by Melbourne University AES, MathWorks, and NIH—challenged Kagglers to accurately forecast the occurrence of seizures using intracranial EEG recordings. Nearly 500 teams competed to distinguish between ten minute long data clips covering an hour prior to a seizure, and ten minute clips of interictal activity. In this interview, Kaggler Gareth Jones explains how he applied his background in neuroscience for the opportunity to make a positive impact on the lives of people affected by epilepsy. He discusses his approach to feature engineering with the raw data, the challenge of local cross-validation, plus his surprise at the effectiveness of training a single general model as opposed to patient-specific ones.

The basics

What was your background prior to entering this challenge?

I have a PhD in neuroscience and currently work as a post doc at the Ear Institute, University College London, UK. My work is in sensory processing, multisensory integration and decision making.

Do you have any prior experience or domain knowledge that helped you succeed in this competition?

I have experience collecting and analysing electrophysiological data, but not with seizure prediction from EEG data specifically.

What made you decide to enter this competition?

My background is in experimental and computational neuroscience, and it was exciting to find a topic that combined neuroscience and machine learning, and has the potential to have a direct therapeutic impact on people’s lives. If seizure prediction can be done reliably, particularly without too many false alarms, it may be able to greatly mitigate the danger and inconvenience of seizures to epilepsy suffers.

Let’s get technical:

What preprocessing and supervised learning methods did you use?

Raw preictal (before seizure) and intericatal (normal activity) intracranial EEG data recorded from implants in 3 human patients were provided for this competition (Figure 1). These patients were the lowest scorers (of 15) for prediction accuracy from a previous study using the NeuroVista Seizure Advisory System (Cook et. al, 2013).

Some basic pre-processing had already been done, but no ready-to-use features were included. The lack of existing features means more work initially, but isn’t necessarily a bad thing; having the raw data and being able to extract your own features is incredibly powerful, and allows much greater scope for reasoning in the feature engineering stage. This isn’t always possible when working with pre-prepared datasets, which often contain obscured features that require reverse engineering to get the most out of.

The raw data needed a bit of additional pre-processing before features extraction. In the training set the data were split in to 10 minute recordings per file. Some of these 10 minute files were sequential with 5 other files, meaning 60 minutes of consecutive data was available. Other files in the training set (and all of the test set) were isolated 10 minute segments of data.

The sequential files were concatenated first in to 60 minute segments and the other, individual, 10 minute files were left as 10 minute segments. These segments were then subdivided in to discreet epochs of 50-400 s for feature processing (Figure 2), with each epoch and its extracted features representing one row in the data set with its corresponding feature columns. Features extracted from multiple epoch window lengths were joined together before training.

Figure 2 – temporal and frequency domain features were extracted from each individual epoch of raw data. In the frequency domain, these included EEG band powers and the correlation of these across channel. Across-channel correlations were also taken in the time domain, along with basic summary statistics for each channel.

For training, data from all three patients were used to train two general (rather than patient specific) models. I used an ensemble of a quadratic SVM and an RUS boosted tree ensemble with 100 learners (Figure 3), which performed well individually in early prototyping, despite the large class imbalances.

Finally, the predictions for each epoch were reduced (by mean) to a single prediction for each 10 minute segment (file) and then the segment predictions were combined across the models.

What was your most important insight into the data?

Local cross-validation was difficult in this competition and required an approach that grouped epoch data by segment, to prevent information leakage caused by the same segment being represented in both the training and cross-validation sets. This helped local accuracy a lot, but there was still a relatively large error between local and leaderboard scores to work around. The public leaderboard used only 30% of the test data, so overfitting was a huge risk (the final top ten for this competition had a net position gain of more than 100).

Training-wise ensembling the SVM and RUS boosted tree ensemble had the most significant, above-noise effect on score.

Feature-wise my most valuable insight was to combine features extracted from multiple epoch window lengths, which probably wouldn’t have been possible without having the raw data to work from. Identifying specifically which features were useful was difficult due to the cross-validation noise, but frequency powers, temporal summary statistics, and correlations between channels in both domains were all effective (Figure 2).

Were you surprised by any of your findings?

Two things surprised me, that training a general model rather than multiple patient-specific models worked at all, and that the most predictive frequency power bands were higher than I expected.

Often in seizure detection, models trained on single patients perform better than general models trained on multiple patient’s data. This is partly because there’s large variation between human brains, and partly because there’s not necessarily any correspondence between device channels across patients. These data came from three patients who were all implanted with the same device, with the same channel mapping, but each were implanted in different locations. The model’s predictions do vary noticeably between patients, so it’s clear the models had enough information to identify the patients. It remains to be seen if there’s any advantage in training a general model when testing on totally held out data, or to predict accurately for unseen patients.

Regarding the frequency power bands, my model included “typical” EEG frequency band powers and higher bands up to 200 Hz as features. The bands covering the range 40-150 Hz were more predictive of seizures than the lower frequency bands, which is not what I expected based on the previous UPenn and Mayo Clinic’s seizure detection competition, where Michael Hill’s winning entry used 1-47 Hz in 1 Hz bins. It’s also surprising given the lack of channel correspondence between patients when training a general model - higher frequency signals are more localised than lower frequency signals, so should be more patient-specific.

Figure 3 – Features extracted from different epoch window lengths were joined into one data set that was used to train a quadratic SVM and RUS Boosted tree ensemble. The test set was processed in the same way as the training set and each model produced predictions for each epoch in the test set. The ensembled predictions for each epoch were reduced to create a prediction for each of the files in the test set.

Which tools did you use?

MATLAB 2016b:

Classifier Leaner App

Statistics and Machine Learning Toolbox

Parallel processing toolbox

Although I usually use Kaggle as a way of practicing with Python and R, I stuck with MATLAB in this competition as it’s what I mostly use in my professional work. I also really like the Classifier Learner App to quickly try out different basic models. My code is available here.

How did you spend your time on this competition?

About 70% feature processing, split 50/50 between extraction from the raw data and engineering. The rest of my time was spent on developing more accurate cross-validation and training.

What was the run time for both training and prediction of your winning solution?

On a 4 GHz 4-core i7, around 6-12 hours in total (mostly dependent on how many epoch windows lengths needed to be extracted and combined), with extracting and processing features taking up 80% of the time. Training and predicting from the SVMs (~10 minutes) was very quick, whereas training and predicting from the tree ensembles was slower (30-60 mins, depending mostly on number of cross-validation folds).

Words of Wisdom

Do you have any advice for those just getting started in data science?

It’s important to try and appreciate the difficulties and shortcomings involved in the data collection and experimentation processes. The dataset provided in this competition is remarkable – it’s from chronic implants on human brains! Don’t forget that analysing data is only half the story, a lot of time, effort, and basic science went in to getting hold of it.

More generally, start with online courses on Cousera, Udacity, EdX, etc. but always practice what you’ve learned in real projects, and try and get hold of raw data whenever possible. It’s very important to have a healthy skepticism of all data; and each level of processing inevitably adds mistakes and assumptions that aren’t always obvious.

Just for fun

What is your dream job?

Anything involving neuroscience and machine learning - either using machine learning to guide health decisions, or, conversely, using neuroscience to inform development of machine learning approaches and AI.

Bio

Gareth Jones has a PhD in Neuroscience from The University of Sussex, UK and is currently a post-doc at the UCL Ear Institute, UK. His research uses electrophysiology, psychophysics, and computational modelling to investigate the neural mechanisms of sensory accumulation, multisensory information combination, and decision making.