Blog

Really quick blog post today – a continuation of a shower thought (try not to picture that, btw). Building conversational experience is, in the worlds of the uber-talented Alexa expert Andy May (@andyjohnmay), “Easy, Hard”. I really like how profound this phrase is, and how many different meanings it has for CUIs.

I always try to explain to non-believers that voice and chatbots look super-easy to use (you talk, it responds, you talk some more, something gets done. Easy peasy.) but, like all good product design, that simplicity belies a metric shit-ton of work that’s gone on beforehand to hone, refine and hand-whittle that beast into something so singular.

But, like all the best expert’s wisdom, I keep seeing Andy’s words applying to so many more situations. Like, for example, CUI development. I’ve built several Alexa skills, Google Assistant Actions and Facebook chatbots myself now – I’m not a dev, but as the Director of a conversational app studio, I want to make sure I have a firm understanding of every aspect of the business, including having a working understanding of the guts of what we’re selling.

My relative naivety has led me to realise that Andy’s “Easy, Hard.” idea applies to building conversational apps too.

It seems fairly straightforward to knock up a quick skill or chatbot. Dialogflow is lovely – get your head around the concepts of intents, slot values, sample utterances and contexts and you’ve pretty much nailed it. Press the button and ship to Assistant. Chatfuel is beautiful – one of the best WYSIWYG interfaces I’ve ever used, super simple to create a really engaging Facebook chatbot and, again, click the button to deploy to your Facebook Page.

But, and here’s where the ‘Hard’ bit comes in, as soon as you want to do anything dynamic (responding to the weather with different logic, allowing a user to log in and accessing their account history, storing data for next time, etc…), suddenly these tools become too limited.

You can’t specify application logic (beyond contexts) in Dialogflow, which means that you can’t dynamically vary the UX or adapt the copy spoken based on some state or calculation. Those wonderful WYSIWYG tools for designing the Google Assistant widgets have to be entirely replaced, as you plug your intent into a fulfillment (some cloud-based code). It’s all or nothing: either you use the tooling (and build your widgets visually), or your code fulfillment (and you build your responses manually with code).

There seems to be this hard drop off – suddenly, adding what seems like relatively basic features mean you need to almost wholesale up-sticks and start whittle everything by hand. Suddenly you’re in the world of Node, Serverless, DynamoDB and Lambda function. Proper engineering!

When I was designing conversational experiences, I got used to checking code out of GitHub, editing strings in VSCode and checking back in. I’d design approximations of the Google Assistant widgets in Sketch symbol libraries and design flows the old-fashion way – as static screen flows. I’d then have to hand these design files over to the engineering team. Sure, we managed the intents and sample utterances in Dialogflow, but the business logic and response widgets would then all be created in code.This can’t be the best workflow. There must be a better way.

I’d be really interested to hear from the community on this: have you found a better way of bridging this gap? How do your designers and engineers work together on conversational experiences? Please do let me know!

I’ll be the first one to admit that if you plug it in, switch it on or put batteries in it, then I’m going to be inappropriately fascinated by it. As a kid, I’d take apart cathode-ray tube TVs to get at the guts, in an attempt to sleuth out how this mess of wizardry could produce something so pure and appealing. I once plugged myself into the mains. I can’t remember why. But this enduring sense of wonder technology gives me has brought with it an equal sense of foreboding and concern in recent years. I could never quite put my finger on why, until now.

I’m a firm believer in the idea that – on balance – our mastery of technology is a massively powerful force for human good. Medical science’s impact on curing diseases, 4k video streams from humans in space inspiring whole generations, revolutionary ways to shift to more sustainable energy usage… these are all great, but tech in the wrong hands is undeniably a powerful force capable of mass control and harm (Facebook & American elections, Brexit, nuclear war, cyber-bullying). We can’t solve those problems overnight, and technology itself will play a massive part in addressing these issues. But at a more human level, our technology can have an effect on us that is clawing, nearly imperceivable and creeps up on us so slowly, we don’t realise it’s a problem. Like boiling a frog. We are the frogs, and we’re being boiled in a soup of distraction.

Erm, ok. That analogy didn’t quite land as I’d hoped… but my point is this: we’re active participants in a war being waged on our focus.

In any typical day, I’m going to be bombarded with adverts selling me stuff I didn’t realise I wanted, music I probably don’t like, attention-grabbing click-bait headlines, The Siren’s Call of YouTube, phone calls selling me boiler replacements on monthly payment plans, my kids, Twitter performing A/B tests on me via notifications to find my news tolerance level like I’m some kind of Pavlovian Dog… the list goes on. And those are the more overt distractions.

The Times, They are A-Changin’

I believe we’re living through a time of massive paradigm shift. One of two new interactions mechanisms will fundamentally change the way we interact with technology as users, and adjust the way we design and scope products for future consumers, as digital professionals. These waves are VR and Conversation – being totally engulfed in a new virtual world, where every pixel is crafted by a super-sensitive AI attuned to your every whim and desire, and the #voicefirst & chatbot wave that’s bringing new ways for us to interact with technology. With these new paradigms bring so much promise. But it’s our duty as digital craftspeople to bring these capabilities into the world formed around the understanding of some of the pitfalls of the past: security, privacy and distraction.

Let’s take an example: Amazon Alexa. Obviously, I’m a big fan of voice interfaces – I’ve built a company specialising in designing and building digital Conversational products. I believe that their requirement to shift what we consider to be a user interaction, is a fundamentally good thing for the future of distraction. Consider a smartphone app – all that screen real-estate, all those colours, fonts, animations. All wonderfully crafted and hand-whittled by teams of professionals eager to meet a business objective. More distractions are added as the sales teams call for new features, the impact of the UI is dialled-up as the Brand team roll out new digital presences – bolder and more loud than your competitor on the homescreen next door. Very quickly, app development becomes an arms race. Who’s got the shoutiest weapons? And what about the user? They just want to get a thing done. They don’t really care about your app, your brand or your new feature. They just want to get a thing done with the least amount of stress possible. And to make matters worse, we craft these apps in isolation – each one with its own fully-formed style guide and usage idiosyncrasies. Each one carving out a semantic cave in the user’s mind, as they remember how it works. Each one having an impact on a user’s cognitive load. Each one multiplying distraction as it forces users in and out of these beautiful rooms we’ve created.

#Voicefirst encourages concentrated user value

But voice apps aren’t like that. The bandwidth you have to communicate with your users is so much more narrow. Every syllable you utter, every second of someone’s time you take up telling them words, has to be worth its weight in gold and be put there with meaning and intent. All the regular tools at the designer’s disposal (typography, visual hierarchy, animation, etc…) employed to give meaning just aren’t available when you’re building #voicefirst. All you have is copy: Words. Pauses. Cadence. Personality.

And the great thing about this limitation is it’s actually a revelatory blessing in disguise. The platform itself forces product teams to think more thoroughly about what their app’s real user value is. When all you have to drink from is a dribbling tap, you better hope it’s hipster beer and tastes good.

So, you end up focussing on exactly what your core value proposition is. Why do users come to us in the first place? And why would they use a #voicefirst experience over your existing smartphone app? These are entirely the right questions to be asking (hopefully, directly to your users) and mean that, over time, we produce less wasteful, attention-grabbing experiences and instead deliver much more concentrated user value that demands less of our users.

The other compelling benefit of a conversational UI is that it can only be built on existing platform widgets. This means each experience (while feeling entirely on-brand and useful) has a very familiar look and feel – this reduces cognitive load as users move between our apps, let’s them get stuff done quicker and even opens up more opportunities for engagement (if your core journey can now be squeezed in between rushing to pick up the kids!)

Not only do voice apps reduce cognitive load, but they’re also (ironically!) the most conspicuous form of the ambient computing movement. Ambient computing puts technology on the periphery, recognises that tech is a means and not the ends itself, and dials up the focus on utility and delivering concentrated user-value.

Alexa sits in the corner of my living room, taking up very little space while offering little aesthetic distraction. I use it when I need it, then it goes away. It’s not the hunking beast of aluminium and glass of this laptop I’m using right now – demanding to be tethered to a wall socket every few hours and having to sit on my lap so I can’t do anything else.

Wareables = Best Ambient Computer?

My Apple Watch has quickly become an indispensable part of my life, and that’s another example of beautifully ambient technology. Granted, you have to be pretty ruthless with the notification settings (otherwise it’s like strapping a needy child to your wrist), but when I found that sweet spot, I also found I was using my phone much less, being much more present in the moment and just getting shit done more effectively. The small screen and limited input mechanism means it has to be a more focussed device. Sending a text message is quicker via my watch than it is via my phone. Checking the weather is as quick as glancing at my wrist. And don’t get me started on reminders.

Oh. My. God. If this Apple Watch did nothing but allow me to set reminders via voice, and then buzz when I needed to be reminded, it would be worth £400 alone. Setting reminders is by far the most used feature of my watch and it’s because it’s a perfectly designed solution to a common under-served use-case. Almost every day I can be found shouting into my watch as I’m hurtling down hills on my cycle to work. For some reason, the exercise seems to prime my thought pumps and I seem to have some of my best ideas behind the handlebars. But I can’t reach for the sketchbook while I’m doing 25mph along Bath Road. I can, however, bark something into my wrist and then get back to concentrating on not becoming pate’d under a bus.

The Future of Ambient: Calm?

Recently, I became aware of a new movement that distills the ideal of ambient computing into a philosophy that feels very fit for 21st century living – Calm Computing.

Now this can sound a bit hippy. Like we’re in love with the movement itself, rather than what it represents. That it’s just the next hipster design trend we can spend hours arguing about over Twitter while drinking expensive coffee. But, you know what? I actually feel like Calm Computing is what I’ve been striving for in my design work all along. Technology that augments humans, makes them better. Is the means, not the ends. A concrete example of this is how Microsoft are using calm tech principles to inform their work on inclusive design:

Truly inclusive design doesn’t stop at optimising for disabilities. It understands that our human abilities are always bending and flexing, as we go about our daily lives:

“As people move through different environments, their abilities can also change dramatically. In a loud crowd, they can’t hear well. In a car, they’re visually impaired. New parents spend much of their day doing tasks one-handed. An overwhelming day can cause sensory overload. What’s possible, safe, and appropriate is constantly changing.”

Get on the Calm Train, Or Be Left on the Platform

Calm computing and ambient computing can be symbiotic. If this is the future #voicefirst digital products are moving us towards, then this is absolutely a future I want to be an active participant in. And you probably do to.

So why not drop StudioFlow a quick message to see how they could help you migrate your core user journeys to a more ambient #voicefirst user experience, and let’s improve our users’ lives for the better together 🙂