Voice tech dystopia? Today's design decisions will shape the future

A new season of Black Mirror, the Netflix anthology series which brings tech into The Twilight Zone, was teased last week with a short trailer. The cast and stories are new, but the premise remains the same: what happens when a brilliant innovation falls out of the user’s control?

Episodes from previous seasons are frequently cited as timely warnings not to become complacent when developing cutting-edge tech. Within the marketing industry, those concerns are often applied to tracking software and hyper-targeting advertising, but they could apply to the burgeoning field of voice tech too.

There’s no doubting the potential of voice interfaces and assistants to become valuable mainstream utilities. But fulfilling that promise requires us – designers, developers, creatives and brands commissioning the services of these specialists – to produce more thoughtful solutions than the ones which came before. If we don’t continue to make those improvements, the nth iteration of Siri or Alexa could end up doing more harm than good to users whose needs aren’t properly considered.

The earliest warning signs of this look like snags, or merely teething troubles. Samsung’s Bixby assistant, long-awaited rival to Siri, Alexa et al, struggled to understand English at first, delaying its launch significantly. Previously, Amazon Echo units and Google Home devices have fallen victim to a series of accidental activations, resulting in various degrees of inconvenience and confusion including a spate of doll house orders to US homes.

These moments are embarrassing for the companies behind those systems, and they highlight an issue which affects even the biggest design and build teams in the industry: a gap between the designers and users. If this gap persists while users continue to scale, there is the risk that many more people will lose out on services that would benefit them.

In the world of Black Mirror, oversights like this are magnified 100-fold, and that’s a useful practice for design teams too. If a flawed voice recognition system was employed by a bank, how many customers might lose access to their accounts and miss critical payments? If home speakers were all as suggestible as the Alexa and Google examples, hacking a home could be as simple as tricking its AI master through an open window (Siri’s already been shown to be a potential victim of this).

Thankfully, those types of blind spots are easily patched or avoided today. However, design teams must continue to stretch themselves to identify and understand segments of the population which they might never have imagined to be their targets.

In the case of the automatic dolls’ house orders, the pursuit of a seamless purchasing experience on Alexa trumped concerns about practicality – in this case, why you might not want it to be too easy to order items on your account.

Balancing accessibility, usability and security can be tough, but even a conservative guess of the potential use cases for smart speakers – among families, shared-occupancy households, even offices – teams designing for these would be wise to consciously account for the interplay of all three.

In practice, this may mean that interfaces conspicuously break "seamlessness" in the name of security or propriety, and allow multiple channels of interaction to work in concert. When purchasing, for instance, a spoken interaction could pause to use biometric verification instead of asking a user to speak a password aloud. This is unlikely to be the only time that users must break away from voice interfaces – screens can also be usefully employed to display voluminous or complex information in a simple way – but these interface switches must be signposted and explained to avoid confusion.

Refining voice recognition is another ongoing concern for designers and developers of new interfaces. One resource to help this cause is Mozilla’s "Common Voice" project, which aims to collect thousands of hours’ worth of spoken language data in order to improve the accuracy of voice recognition across different accents and voices. Mozilla plans to release an open-source database of their work later in 2017.

But these charitable efforts are only useful if design teams maintain an open mind to the changing faces (and voices) of their target users – most of which will be nothing like their own – and seek solutions for their needs.

Research and testing must be integral to launching and updating voice apps, but the very make-up of development teams could stymie efforts to reach new audiences. As the saying goes, you don’t know what you don’t know, but you can protect against knowledge gaps by forming a team with a range of backgrounds and interests. Doing so will only improve the versatility of your final product.

As diverse mainstream audiences continue to adopt cutting-edge technology, devices and apps which don’t thoughtfully consider their needs – by ignoring accents, flagging false faces or failing to understand user contexts – will be shown up publicly. If social media mockery won’t highlight a shortcoming in your service, a decline in sales and user growth will.

As adoption of these services continues to grow, serving less obvious users or unpredicted situations will only become more important. Success on this front will keep new tech democratic, and ensure that Black Mirror’s vision of the future remains science fiction.