Mental health and artificial intelligence: losing your voice

While we still can, let us ask, "Will AI exacerbate discrimination?" as the productive forces of mental health are restructured within a techno-psychiatric complex. Poem.

Sketch,2018. Flickr/Whinger. Some rights reserved.

'You sound a bit depressed' we might
say to a friend,
Not only because of what they say but how they say it.
Perhaps their speech is duller than usual, tailing off between words,
Lacking their usual lively intonation.

There are many ways to boil a voice down
into data points;
Low-level spectral features, computed from snippets
as short as twenty milliseconds
That quantify the dynamism of amplitude, frequency and energy,
And those longer range syllabic aspects that human ears are tuned to,
Such as pitch and intensity.

A voice distilled into data
Becomes the training material for machine learning algorithms,
And there are many efforts being made to teach machines
To deduce our mental states from voice analysis.

The bet is that the voice is a source of
biomarkers,
Distinctive data features that correlate
to health conditions,
Especially the emergence of mental health problems
Such as depression, PTSD, Alzheimers and others.

And of course there's the words
themselves;
We've already trained machines to recognise them.
Thanks to the deep neural network method called Long
Short-Term Memory (LSTM)
We can command our digital assistants to buy something on Amazon.

Rules-based modelling never captured the
complexity of speech,
But give neural networks enough examples,
They will learn to parrot and predict any complex pattern,
And voice data is plentiful.

So perhaps machines can be trained to
detect symptoms
Of different
kinds of stress or distress,
And this can be looped into an appropriate intervention
To prevent things from getting worse.

As data, the features of speech become
tables of numbers;
Each chunk of sound becomes a row of digits,
Perhaps sixteen numbers from a Fourier
Transform
And others for types of intensity and rhythmicity.

For machine learning to be able to learn
Each row must end in a classification; a number that tags a known diagnosis.
Presented with enough labelled examples it will produce a probabilistic
model
That predicts the likelihood of a future speaker developing the same condition.

It's very clever to model the hair cells
in the human ear as
forced damped oscillators
And to apply AI algorithms that learn models through backpropagation,
But we should ask why we want machines to listen out for signs of distress;
Why go to all this trouble when we could do the listening ourselves?

One reason is the rise in mental health
problems
At the same time as available
services are contracting.
Bringing professional and patient together costs time and money,
But we can acquire and analyse samples of speech via our network
infrastructures.

The number of startups looking for
traction on mental states,
Through the machine analysis of voice,
Suggests a restructuring of the productive forces of mental health,
Such that illness will be constructed by a techno-psychiatric complex.

HealthRhythms, for example, was founded
by psychiatrist David Kupfer,
Who chaired the task force that produced DSM-5, the so-called 'bible of
psychiatry',
Which defines mental disorders and the diagnostic symptoms for them.
The HealthRhythms app uses voice data to calculate a "log of sociability" to
spot depression and anxiety.

Sonde Health screens acoustic changes in
the voice for mental health conditions
With a focus on post-natal depression and dementia;"We're
trying to make this ubiquitous and universal" says the CEO.
Their app is not just for smartphones but for any voice-based technology.

Meanwhile Sharecare scans your calls and reports
if you seemed anxious;
Founder Jeff Arnold describes it as 'an
emotional selfie'.
Like Sonde Health, the company works with health insurers
While HealthRhythms' clients include pharmaceutical companies.

It's hardly a surprise that Silicon
Valley sees mental health as a market ripe for Uber-like disruption;
Demand is rising, orthodox services are being cut, but data is more plentiful
than it has ever been.
There's a mental health crisis that costs economies millions
So it must be time to 'move
fast and break things'.

But as Simondon and others have tried
to point out,
The individuation of subjects, including ourselves, involves a certain technicity,
Stabilising a new ensemble of AI and mental health
Will change what it is to be considered well or unwell.

Samaritans Radar

There's little apparent concern among
the startup-funder axis
That all this listening might silence voices.
Their enthusiasm is not haunted by the story of the Samaritans Radar
When an organisation which should have known better got carried away by
enthusiasm for tech surveillance.

This was a Twitter app developed in 2014 by
the Samaritans,
The UK organisation which runs a 24 hour helpline for anyone feeling suicidal.
You signed up for the app and it promised to send you email alerts
Whenever someone you follow on Twitter appeared to be distressed.
If any of their tweets matched a list of key phrases
It invited you to get in touch with them.

In engineering terms, this is light years
behind the sophistication of Deep Learning,
But it's a salutory tale about unintended impacts.
Thanks to the inadequate
involvement of service users in its production,
It ignored the fact that the wrong sort of well-meaning intervention at the
wrong time might actually make things worse,
Or that malicious users could use the app to target and troll
vulnerable people.

Never mind the consequences of false
positives
When the app misidentified someone as distressed,
Or the concept
of consent,
Given that the people being assessed were not even made aware that this was
happening;
All riding roughshod over the basic ethical principle of 'do no harm'.

Although Twitter is a nominally public
space,
People with mental health issues had been able to hold supportive mutual
conversations
With a reasonable expectation that this wouldn't be put in a spotlight,
Allowing them to reach
out to others who might be experiencing similar things.

One consequence of the Samaritans Radar
was that many people with mental health issues,
Who had previously found twitter a source of mutual support,
Declared their intention to withdraw
Or simply went silent.

As with the sorry tale of the Samaritans
Radar,
Without the voices of mental health users and survivors
The hubris that goes with AI has the potential to override the Hippocratic
oath.

Fairness and Harm

The ubiquitous application of machine
learning's predictive power
In areas with real world consequences, such as policing
and the judicial system,
Is stirring an awareness that its oracular insights
Are actually constrained by complexities that are hard to escape.

The simplest of which is data
discrimination;
A programme that only knows the data it is fed,
And which is only fed data containing a racist bias,
Will make racist
predictions.

This should already be red flag for our
mental health listening machines.
Diagnoses of mental health are already skewed
with respect to race;
A high proportion of people from black and ethnic minority backgrounds get
diagnosed,
And the questions about why are still very much open and contested.

But surely, proponents will say, one
advantage of automation in general
Is to encode fairness and bypass the fickleness of human bias;
To apply empirical and statistical knowledge directly
And cut through the subjective distortions of face-to-face prejudice.

Certainly, as the general dangers of
reproducing racism and sexism have become clear,
There have been conscientious efforts from engineers in one corner of machine
learning
To automate ways to de-bias datasets.

But here's the rub;
Even when you know there's the potential for discrimination
It's mathematically impossible to produce all-round fairness.

If you're designing a parole
algorithm to predict whether someone will reoffend,
You can design it so that the accuracy for high risk offenders is the same for
white and black.
But if the overall base rates are different
There will be more false positives of black people, which can be considered a
harm,
Because more black people who would not go on to reoffend will be refused bail than white people.

Machine learning's probabilistic
predictions are the result of a mathematical fit,
The parameters of which are selected to optimise on specific metrics,
The are many mathematical ways to define fairness (perhaps twenty-one of them)
And you can't satisfy them all at the same time.

Proponents might argue that with machinic
reasoning,
We should be able to disentangle the reasons for various predictions,
So we can make policy choices
About the various trade-offs.

But there's a problem with artifical
neural networks,
Which is that their reasoning
is opaque,
Obscured by the multiplicity of connections across their layers,
Where the weightings are derived from massively parallel calculations.

If we apply this deep learning to reveal
what lies behind voice samples,
Taking different tremors as proxies for the contents of consciousness,
The algorithm will be tongue-tied
If asked to explain its diagnosis.

And we should ask who these methods will
be most applied to,
Since to apply machinic methods we need data.
Data visibility is not evenly distributed across society;
Institutions will have much more data about you if you are part of the welfare
system
Than from a comfortable middle class family.

What's apparent from the field of child
protection,
Where algorithms are also seen as promising objectivity and pervasive
preemption,
Is that the weight of harms from unsubstantiated
interventions
Will fall disproportionately on the already disadvantaged,
With the net effect of 'automating
inequality'.

If only we could rely on institutions to
take a restrained and person-centred approach.
But certainly, where the potential for financial economies are involved,
The history of voice analysis is not promising.
Local authorities in the UK were still applying Voice Stress Analaysis to
detect housing benefit cheats
Years after solid scientific evidence showed that its risk predictions were 'no
better than horoscopes'.

Machine learning is a leap in sophistication
from such crude measures,
But as we've seen it also brings new complexities,
As well as an insatiable dependency on more and more data.

Listening Machines

Getting mental health voice analysis
off the ground faces the basic challenge of data;
Most algorithms only perform well when there's a lot of it to train on.
They need voice data labelled as being from people who are unwell and those who
are not,
So that the algorithm can learn the patterns that distinguish them.

The uncanny success of Facebook's facial
recognition algorithms
Came from having huge numbers of labelled faces at hand,
Faces that we, the users, had kindly labelled for them
As belonging to us, or by tagging our friends,
Without realising we were also training a machine;
"if the product is free, you are the training data".

One approach to voice analysis is the
kind of clever surveillance trick
Used by a paper investigating 'The Language of Social Support in Social Media And
its Effect on Suicidal Ideation Risk',
Where they collected comments from Reddit users in mental health subreddits
like
r/depression, r/mentalhealth, r/bipolarreddit, r/ptsd, r/psychoticreddit,
And tracked how many could be identified as subsequently posting in
A prominent suicide support
community on Reddit called r/SuicideWatch.

Whether or not the training demands of
voice algorithms
Are solved by the large scale collection of passive data,
The strategies of the Silicon Valley startups make it clear
That the application of these apps will have to be pervasive,
To fulfill the hopes for scaling and early identification.

The democratic discourse around voice
analysis seems relatively hushed,
And yet we are increasingly embedded in a listening environment,
With Siri and Alexa and Google Assistant and Microsoft's Cortana
And Hello Barbie and My
Friend Cayla and our smart car,
And apps and games on our smartphones that request microphone access.

Where might our voices be analysed for
signs of stress or depression
In a way that can be glossed as legitimate under the General
Data Protection Regulation;
On our work phone? our home assistant? while driving? when calling a helpline?

When will using an app like
HealthRhythms, which 'wakes up when an audio stream is detected',
Become compulsory for people receiving any form of psychological care?
Let's not forget that in the UK we already have Community
Treatment Orders for mental health.

Surveillance is the inexorable logic of
the data-diagnostic axis,
Merging with the benificent idea of Public Health Surveillance,
With its agenda of epidemiology and health & safety,
But never quite escaping the long history of sponsorship of speech recognition
research
By the Defense Advanced Research Projects Agency (DARPA).

As the Samaritans example made clear,
We should pause before embedding ideas like 'targeting' in social care;
Targeting people for preemptive intervention is fraught with challenges,
And forefronts the core questions of consent and 'do no harm'.

Before we imagine that "instead of
waiting for traditional signs of dementia and getting tested by the doctor
The smart speakers in our homes could be monitoring changes in our speech as we
ask for the news, weather and sports scores
And detecting
the disease far earlier than is possible today",
We need to know how to defend against the creation of a therapeautic Stasi.

Epistemic Injustice

It might seem far fetched to say that
snatches of chat with Alexa
Might be considered as signficant as a screening interview with a psychatrist
or psychologist,
But this is to underestimate the aura of
scientific authority
That comes with contemporary machine learning.

What algorithms offer is not just an
outreach into daily life,
But the allure of neutrality and objectivity,
That by abstracting phenomena into numbers that can be statistically correlated
In ways that enable machines to imitate humans,
Quantitative methods can be applied to areas that were previously the purview
of human judgement.

Machinic voice analysis of our mental
states
Risks becoming an example of epistemic injustice,
Where an authoritative voice comes to count more than our own;
The algorithm analysis of how someone speaks causing others to "give a
deflated level of credibility to a speaker's word".

Of course we could appeal to the
sensitivity and restraint of those applying the algorithms;
Context is everything when looking at the actual impact of AI,
Knowing whether it is being adopted situations where existing relations of
power
Might indicate the possibility of overreach or arbitrary application.

The context of mental health certainly
suggest caution,
Given that the very definition of mental health is historically varying;
The asymmetries of power are stark, because treatment can be compulsory and
detention is not uncommon,
And the life consequences of being in treatment or missing out on treatment can
be severe.

Mental health problems can be hugely challenging
for everyone involved,
And in the darkest moments of psychosis or mania
People are probably not going to have much to say about how their care should
be organised,
But, in between episodes, who is better placed to help shape specific ideas for
their care
Than the person who experiences the distress;
They have the situated
knowledge.

The danger with all machine learning
Is the introduction of a drone-like
distancing from messy subjectivities,
With the danger that this will increase thoughtlessness
Through the outsourcing of elements of judgement to automated and automatising
systems.

The voice as analysed by machine learning
Will become a technology
of the self in Foucault's terms,
Producing new subjects of mental health diagnosis and intervention,
Whose voice spectrum is definitive but whose words count for little.

Putting forward demands for new programmes
and services,
Proposing strategies such as 'harm minimization' and 'coping with voices',
Making the case for consensual, non-medicalised ways to cope with their
experiences,
And forming collective structures such as Patients' Councils.

While these developments have been
supported by some professionals,
And some user participation has been assimilated as the co-production of
services,
The validity of user voice, especially the collective voice, is still
precarious within the mental health system
And is undermined by coercive legislation and reductionist biomedical models.

The introduction of machinic listening,
That dissects voices into quantifiable snippets,
Will tip the balance of this wider apparatus,
Towards further objectification and automaticity,
Especially in this era of
neoliberal austerity.

And yet, ironically, it's only the
individual and collective voices of users
That can rescue machine learning from talking itself into harmful
contradictions;
That can limit its hunger for ever more data in pursuit of its targets,
And save classifications from overshadowing uniquely significant life
experiences.

Designing for justice and fairness not
just for optimised classifications
Means discourse and debate have to invade the spaces of data science;
Each layer of the neural networks must be balanced by a layer of deliberation,
Each datafication by caring human attentiveness.

If we want the voices of the users to be
heard over the hum of the data centres,
They have to be there from the start;
Putting the incommensurability of their experiences
Alongside the generalising abstractions of the algorithms.

And asking how, if at all,
The narrow intelligence of large-scale statistical data-processing machines
Could support more Open
Dialogue, where speaking and listening aim for shared understanding,
More Soteria
type houses based on a social model of care,
The development of progressive user-led community mental health services,
And an end
to the cuts.

Computation and Care

As machine learning expands into real
world situations,
It turns out that interpretability is one of its biggest challenges;
Even DARPA, the military funder of so much research in speech recognition and
AI,
Is panicking that targeting judgements will come without any way to interrogate
the reasoning behind them.

Experiments to figure out how AI image
recognition actually works,
Probed the contents of intermediary layers in the neural networks
By recursively applying the convolutional filters to their own outputs,
Producing the hallucinatory
images of 'Inceptionism'.

We are developing AI listening machines
that can't explain themselves,
That hear things of significance in their own layers,
Which they can't articulate to the world but that they project outwards as
truths;
How would these AI systems fare if diagnosed against DSM-5 criteria?

And if objectivity, as some
post-Relativity philosphers of science have proposed,
Consists of invariance under transformation,
What happens if we transform the perspective of our voice analysis,
Looking outwards at the system rather than inwards at the person in distress.

To ask what our machines might hear in
the voices of the psychiatrists who are busy founding startups,
Or in the voices of politicians justifying cuts in services because they paid
off the banks,
Or in the voice of the nurse who tells someone forcibly detained under the
Mental Health Act,"This
ain't a hotel, love".

It's possible that prediction is not a
magic bullet for mental health,
And can't replace places of care staffed by people with time to listen,
In a society where precarity, insecurity and austerity don't fuel generalised
distress,
Where everyone's voice is not analysed but heard,
In a context which is collective and democratic.

The dramas of the human mind have not
been scientifically explained,
And the nature of consciousness still slips the net of neuroscience,
Still less should we restructure the production of truths about the most
vulnerable
On computational correlations.

The real confusion behind the Confusion
Matrix,
That table of machine learning accuracy that includes percentages of false
positives and negatives,
Is that the impact of AI in society doesn't pivot on the risk of false
positives
But on the redrawing of boundaries that we experience as universal fact.

The rush towards listening machines tells
us a lot about AI,
And the risk of believing it can transform intractable problems
By optimising dissonance out of the system.

If human subjectivities are intractably co-constructed
with the tools of their time,
We should ask instead how our new forms of calculative cleverness
Can be stiched into an empathic technics,
That breaks with machine learning as a mode of targeting,
And wreathes computation with ways of caring.

About the author

Dan McQuillan is a Lecturer in
Creative & Social Computing, and has a PhD in Experimental Particle
Physics. Prior to academia he worked as Amnesty International's Director of
E-communications. Recent publications include 'Algorithmic States of
Exception', 'Data Science as Machinic Neoplatonism' and 'Algorithmic Paranoia
and the Convivial Alternative'. d.mcquillan@gold.ac.uk | @danmcquillan

This article is published under a Creative Commons Attribution 4.0 International licence.
If you have any queries about republishing please contact us.
Please check individual images for licensing details.

Do you care about online rights? digitaLiberties needs your support to keep publishing alternative perspectives on surveillance, net neutrality and privacy. Please help by donating whatever you can.

Recent comments

openDemocracy is an independent, non-profit global media outlet, covering world affairs, ideas and culture, which seeks to challenge power and encourage democratic debate across the world. We publish high-quality investigative reporting and analysis; we train and mentor journalists and wider civil society; we publish in Russian, Arabic, Spanish and Portuguese and English.