Traditional cognitive measures struggle to demonstrate when a participant is compensating for their deficits. At AD/PD 2019 we shared new data on how to overcome this issue: deriving cognitive load from voice data collected on participants’ own devices, and in their own homes.

Background

Cognitive reserve and compensation for deficits can mask the severity of neurodegenerative disease. This can complicate the early detection of impairment, especially for high-functioning individuals. Therefore, enriching cognitive test scores with metrics which indicate cognitive effort may help to increase the sensitivity of cognitive testing to subtle decline.

Previous work has demonstrated that acoustic features of voice are sensitive to cognitive load [1], but this work has typically used recordings obtained in laboratory settings.

Recent developments in automatic speech recognition (ASR) enable the automated and remote administration of verbal cognitive tests [2], and the capture of voice data in the home environment.

This work aims to detect features of cognitive load from audio recordings captured during remote automated verbal cognitive testing on participants’ own devices.

Methods

Two-hundred participants aged 18 to 78 completed an automated verbal test of working memory: repeating a series of digits backwards using the Cambridge Cognition Neurovocalix platform. Testing was carried out in participants’ homes, on their own devices. Raw audio data, scored responses and participant demographics were all recorded.

Participants’ responses were scored using ASR. The task terminated when participants responded incorrectly on three occasions at a span length. Maximum working memory span ranged from 3 to 8 items.

We calculated cognitive load with respect to each participant’s maximum span. Responses were categorised as “high load” if they were > 0.6 of their maximum span.

Audio features were extracted from each response (Figure 1). Feature vectors were normalised within each participant, expressing within-subjects differences in vocal features across trials of varying load. Only correct responses were included in the analysis.

Figure 1: The figures above illustrate a set of low-level features derived from participants’ verbal responses, using a single utterance as an example.

Data were divided into training (60%) and test (20%) and validate (20%) datasets, with different participants in each set. Training and test were used in model building and hyper-parameter tuning, respectively. Validation was held out for final model evaluation.

A Random Forest classifier was trained to predict whether a response was from a “high load” or “low load” condition (Figure 2). Performance was assessed both overall, and by span length using accuracy as the main metric, and calculating the overall ROC curve.

Figure 2: We used a Random Forest (RF) model to predict cognitive load from voice features. This is an ensemble machine learning algorithm, representing a combination of decision trees.

Results

Overall model accuracy was 0.91 for Test and 0.92 for Validation sets, demonstrating good ability to distinguish between high and low load trials.

ROC curves are shown in Figure 3 for Validation and Test datasets.

Figure 3: A. ROC curves for predicting cognitive load in Test and Validation participants. B. The relationship between model probability, prediction of cognitive load, and the observed cognitive load for each response.

Conclusion

We have demonstrated the ability to derive measures of cognitive load from voice data collected on participants’ own devices and in their own homes.

These data suggest that automatically administered and scored verbal cognitive tests can be used to generate both reliable measures of performance and useful vocal features.

Future work will aim to replicate these findings in patients with neurodegenerative disease, and examine the potential of these digital biomarkers in increasing sensitivity to the presence of neurodegenerative pathology.