At work, we talk a lot about ‘voice’; what is it good for? Is it the post-mobile plat­form? And our clients ask us a lot about ‘voice’, and how to build a brand­ed app. But I’m not sure every­one is talk­ing about the same thing; and I’m just as unsure that any­one knows what makes a real­ly good brand­ed ‘voice’ app. I mean, I’m fair­ly sure I don’t.

This arti­cle is my attempt at defin­ing what we’re talk­ing about when we talk about ‘voice’; and, based on my expe­ri­ence as a user and devel­op­er of ‘voice’, try­ing to nail down some of the oppor­tu­ni­ties for brand­ed third-par­ty apps.

One com­mon phrase that’s guar­an­teed to rile me is when peo­ple are accused of ‘star­ing at their screens’. It’s usu­al­ly pre­fixed by ‘mind­less­ly’. This accu­sa­tion is espe­cial­ly often made of peo­ple on pub­lic trans­port or in cof­fee shops, not inter­act­ing with each oth­er but instead ‘star­ing at their screens’.

This morn­ing I did a quick stealthy sur­vey (OK, I looked over their shoul­ders) of my car­riage on the train to see what peo­ple were doing with their phones. Here’s the list—bear in mind that some peo­ple did more than one activ­i­ty:

I read 25 books in 2017, eight few­er than I did in 2016; I think this is because I read a lot more arti­cles (saved to Pock­et, on my Kobo eRead­er) as research for my job and my newslet­ter. Still, 25 books in 12 months isn’t a bad return, and I aim to read around the same num­ber this year.

In this arti­cle I’m going to be talk­ing about a few cur­rent trends in dig­i­tal tech­nol­o­gy as we move into 2018. It’s not a pre­dic­tions piece—I’m a tech­nol­o­gist, not a futur­ist. And there’s so much to talk about that this was at risk of turn­ing into an essay, so I’ve lim­it­ed it to some of the things that are inter­est­ing to me and rel­e­vant to my job, rather than the fullest/broadest scope of tech. As I said last year, it’s some­what informed, pur­pose­ly skimpy on detail, and very incom­plete.

Computers with Eyes

One of the most inter­est­ing devel­op­ments over the past cou­ple of years has been in the tran­si­tion from cam­eras to eyes; from tak­ing pic­tures, to see­ing. This has two parts: the first, com­put­er vision, recog­nis­es objects in an image; the sec­ond, aug­ment­ed real­i­ty, mod­i­fies the image before it reach­es your eyes.

Computer Vision

Com­put­er vision means under­stand­ing the con­tent of pho­tos: who is in them and what they are doing, where they are, and what else is around. This unlocks visu­al search—that is, find­ing oth­er images that are the­mat­i­cal­ly sim­i­lar to your pho­tos, rather than visu­al­ly sim­i­lar (‘is this most­ly blue?’ becomes ‘is this most­ly sky?’).

Ama­zon, ASOS, eBay, and Pin­ter­est (among oth­ers) use visu­al search to rec­om­mend prod­ucts sim­i­lar to the one you pho­to­graph (‘this pic­ture is of a den­im skirt; here is our range of den­im skirts’), which helps mit­i­gate the prob­lem of using text input to describe the prod­uct you want. Microsoft’s See­ing AI is chang­ing the lives of peo­ple with visu­al impair­ments by using com­put­er vision to describe their imme­di­ate envi­ron­ment (‘three peo­ple at a table near a win­dow’).

The next step for visu­al search is to move from clas­si­fy­ing objects in an image to pro­vid­ing con­tex­tu­al infor­ma­tion about them. Snapchat offers rel­e­vant fil­ters based on the con­tent of pho­tos, Pin­ter­est will start offer­ing looks (‘this is a den­im skirt; here are prod­ucts which com­bine well with this…’). The first mass mar­ket gen­er­al-pur­pose visu­al search is Google Lens which, while fair­ly lim­it­ed now—it can recog­nise land­marks, books/media, and URLs/phone numbers—will get smarter through the year, with recog­ni­tion of appar­el and home goods already teased as com­ing soon.

Peo­ple will begin to expect their cam­eras to be smarter, capa­ble of not just cap­tur­ing a scene, but under­stand­ing it. And it’s like­ly that expec­ta­tion will be to clear­ly give a sin­gle answer, rather than return­ing pages of search results; this will lead to the dimin­ish­ment of organ­ic search, but becomes mon­eti­s­able (brands can pay to have their prod­ucts placed in the result). Google’s years of search expe­ri­ence and an expan­sive knowl­edge graph gives them a huge soft­ware lead over Apple, but I wouldn’t be sur­prised to see a ‘Siri Lens’ sometime—Bing also has a pret­ty good knowl­edge graph they can use.

Augmented Reality

Aug­ment­ed real­i­ty, in its cur­rent form—placing dig­i­tal objects into a cam­era image of a phys­i­cal environment—has been around for a few years, with­out much impact on pub­lic con­scious­ness, but has recent­ly moved into main­stream aware­ness. Snapchat broke the ground with their face-chang­ing Lens­es, then using hor­i­zon­tal plane detec­tion to drop ani­mat­ed 3D dig­i­tal mod­els into the real world (the danc­ing hot­dog); both were sub­se­quent­ly copied and tak­en to greater scale by Facebook’s Cam­era Effects plat­form.

It’s now being pushed fur­ther by deep­er inte­gra­tion into the phone OS (Apple’s ARK­it and Google’s ARCore both take care of the com­plex cal­cu­la­tions required for AR, reduc­ing the bur­den on apps), and bet­ter hardware—Apple have a major lead here with the new cam­era set­up in the iPhone X, which will doubt­less come to all their mod­els in 2018. Google need to rely on their hard­ware part­ners to pro­vide the cam­eras and chips for AR, so will poly­fill it with soft­ware until that hap­pens (I strong­ly sus­pect the Pix­el 3 will be heav­i­ly opti­mised through chips and sen­sors).

IKEA Place and Ama­zon, amongst oth­ers, are using cur­rent-stage AR tech­nol­o­gy to let you see what their prod­ucts would look like in your home before you buy them. But find­ing use cas­es beyond prod­uct pre­views, toys (ani­mo­ji, AR Stick­ers), and games (Poké­mon Go, the forth­com­ing Har­ry Pot­ter) will, I imag­ine, occu­py much of the first part of the year, and pos­si­bly beyond; there is much dis­cov­ery still to be done. It may require an ‘AR Cloud’—a permission/coordinate space that allows dig­i­tal enhance­ments to be share­able and per­sis­tent, so mul­ti­ple peo­ple can see the same thing, in the same place, in the same state—before it becomes real­ly use­ful.

The next stage for AR is to pro­vide a map of your imme­di­ate envi­ron­ment through infrared scanning—Microsoft’s HoloLens does this, and the required scan­ners are now in the iPhone X (Apple bought Prime­Sense, whose tech­nol­o­gy pow­ered the Kinect) although not yet enabled. This allows for dig­i­tal objects to not appear over­laid in two dimen­sions, but to move around in a space with aware­ness of objects in it—this is com­mon­ly called mixed real­i­ty. This unlocks new cat­e­gories, such as indoor wayfind­ing; Google teased this at I/O 2017 with ‘visu­al posi­tion­ing ser­vice’ (VPS), the indoor equiv­a­lent of GPS, but this was a fea­ture of the Tan­go project, which has since been wound down, and with­out the required hard­ware in Android phones Apple could leapfrog them here.

Computers with Ears

Voice recog­ni­tion has improved mas­sive­ly in recent years, and there’s a grow­ing accep­tance among the pub­lic to inter­act­ing through voice. Voice assis­tants have moved from phones to smart speak­ers (Echo, Home, Home­Pod), to cars (CarPlay, Android Auto), to wrists (Apple Watch, Android Wear), to ears (Air­Pods, Pix­el Buds). Of the major dig­i­tal assis­tants, Google’s Assis­tant is much more use­ful than the oth­ers.

In voice-first (or -only) devices, Amazon’s Echo fam­i­ly has the lead in hard­ware sales over Google’s Home range, although Assis­tant has greater range thanks to its pres­ence on Android phones. Apple’s Home­Pod will launch soon, but is com­ing in at a high price in a mar­ket being dis­put­ed at the low end (Echo Dot and Home Mini are the big sell­ers) and may come too late. Both Ama­zon and Google (and com­peti­tors such as Microsoft’s Cor­tana and Samsung’s Bix­by) are now com­pet­ing to get their assis­tants embed­ded in devices made by third-par­ty man­u­fac­tur­ers. All voice-first devices, how­ev­er, have two major prob­lems which they’ll need to address this year.

The first prob­lem is dis­cov­ery: with no inter­face, how do peo­ple know what they can do? Alexa cur­rent­ly has some ~25k skills on their plat­form, and although Google are pri­ori­tis­ing qual­i­ty over quan­ti­ty (by work­ing more close­ly with brands), get­ting found is still an issue. For now brands will still have to run off-plat­form advertising/awareness cam­paigns, although that’s like­ly to change (I’ll come back to that lat­er).

The sec­ond is in being proac­tive; right now, both Alexa and Assis­tant skills are explic­it­ly invoked, so the user has to ask if any­thing has changed (‘is there an update on my deliv­ery?’). Both Ama­zon and Google are in the process of enabling noti­fi­ca­tions on their devices, but they will need care­ful con­sid­er­a­tion to avoid noti­fi­ca­tion over­load; it’s already con­sid­ered a prob­lem on phones, and could be worse on voice UI if you have to sit and lis­ten to a stack of spo­ken noti­fi­ca­tions.

Audio recog­ni­tion is capa­ble of under­stand­ing more than the human voice. Always-on song recog­ni­tion (run­ning on-device, not send­ing data to servers) is a major fea­ture of the Pix­el 2, and Apple recent­ly acquired Shaz­am (Siri already has a Shaz­am ser­vice built-in). The next stage of audio recog­ni­tion will be to under­stand oth­er envi­ron­men­tal sounds (TV is an area that’s being active­ly explored) and pro­vide con­text about what is being lis­tened to.

Computers with Brains

With more devices becom­ing more capa­ble of extract­ing infor­ma­tion about the world around us, we require bet­ter tools to pro­vide con­text and make deci­sions about what’s use­ful. This becomes a vir­tu­ous cir­cle, as tools make more data, and more data makes tools more use­ful.

Rec­om­men­da­tions based on visu­al search become more use­ful by know­ing your taste through your pho­to his­to­ry; not just what you wear, but your tastes in fur­ni­ture, home goods… at the moment the visu­al search of ASOS and Pin­ter­est give rec­om­men­da­tions based on recog­nis­ing a sin­gle prod­uct but giv­en, say, your Insta­gram his­to­ry, could refine your rec­om­men­da­tions with infer­ences from your broad­er tastes (‘peo­ple who like art deco fur­ni­ture tend to wear…’).

Algo­rith­mic rec­om­men­da­tion could help solve one of the prob­lems fac­ing any future mixed real­i­ty inter­face: as you have a poten­tial­ly unlim­it­ed num­ber of things to look at (it’s the whole world around you), how does your inter­face decide what is the most appro­pri­ate con­tex­tu­al infor­ma­tion to pro­vide, and who pro­vides it for you? An app-like expe­ri­ence (‘open TopT­able and tell me about this restau­rant’) lim­its dis­cov­ery, so it may be bet­ter to take a search engine approach, where the sys­tem tries to infer the best con­tent to offer based on a num­ber of rank­ing fac­tors.

Mixed real­i­ty is a dis­play prob­lem, a sen­sor prob­lem and a deci­sion prob­lem. Show an image that looks real, work out what’s in the world and where to put that image, and work out what image you should show. — Ben Evans.

As I men­tioned ear­li­er, voice-first/-only devices suf­fer from a lack of dis­cov­er­abil­i­ty. Alexa and Google Assis­tant are try­ing to solve this using intent; if a user asks for some­thing that the assis­tant doesn’t cov­er, it will rec­om­mend a third-par­ty app. Google calls these implic­it invo­ca­tions; a voice action from, say, Nike, can sug­gest itself as appro­pri­ate if a user asks for run­ning advice rather than explic­it­ly invok­ing Nike by name (this works like organ­ic search, but there’s future scope for this to be mon­e­tised like paid search using an Adwords-like sys­tem).

The Natural User Interface

With com­put­ers being more aware of what’s around them through their ‘eyes and ears’, the next step will be to bring them togeth­er: using com­put­er vision, audio recog­ni­tion and mixed real­i­ty to cre­ate mean­ing­ful, con­tex­tu­al con­nec­tions between the phys­i­cal and digital—a vir­tu­al map of the imme­di­ate envi­ron­ment, with an aware­ness and under­stand­ing of the things in it, and con­tex­tu­al infor­ma­tion pro­vok­ing rel­e­vant inter­ac­tion with dig­i­tal objects.

Plac­ing 3D objects into a scene is one part of this, but images can also be enhanced in dif­fer­ent ways, enrich­ing and enliven­ing the world around us. We can ask the ques­tion: what would aug­ment real­i­ty? Answers range from pro­vid­ing expla­na­tions and instruc­tions of phys­i­cal objects, to trans­lat­ing for­eign lan­guages in situ, to show­ing user reviews or price com­par­isons. With motion mag­ni­fi­ca­tion, almost imper­cep­ti­ble move­ments (like a pulse, or a baby’s breath­ing) can be ampli­fied to become vis­i­ble. Real­ly, we’re just at the start of what’s pos­si­ble.

Dif­fer­ent ser­vices, pow­ered by machine learning—computer vision, con­tex­tu­al rec­om­men­da­tions, mixed real­i­ty, and voice recognition—could even­tu­al­ly come togeth­er to cre­ate the post-mobile inter­face: under­stand­ing the phys­i­cal envi­ron­ment and enhanc­ing it with a con­tex­tu­al dig­i­tal lay­er, and dis­trib­ut­ing it into devices beyond the phone. Whether any­one will actu­al­ly achieve that in 2018 is up for debate (but unlike­ly).

Closing Social

There were signs this year that open social might have peaked. Shar­ing on Face­book has been declin­ing for a cou­ple of years, off­set some­what by increased shar­ing on Mes­sen­ger and What­sApp. It’s too soon to say it’s def­i­nite­ly peaked—or why—but cer­tain­ly in the broad­er media nar­ra­tive open social (and Face­book in par­tic­u­lar) was blamed as the flash­point for con­flicts of the val­ues of dif­fer­ent groups and gen­er­a­tions. Face­book can’t have failed to notice the decrease, and recent bouts of soul-search­ing led to them depri­ori­tis­ing arti­cles from the News Feed (with an appro­pri­ate drop in engage­ment for pub­lish­ers), and pro­mot­ing shar­ing and per­son­al updates—even to the extent of tri­alling a sep­a­rat­ed news feed, with all arti­cles in a sep­a­rate (hid­den) view—splitting the social from the media.

None of the big open social apps do tru­ly sequen­tial time­lines any more; Twit­ter and Insta­gram have fol­lowed Face­book by show­ing algo­rith­mi­cal­ly sort­ed time­lines so you don’t miss the good stuff (or, what they under­stand to be what you think is the good stuff). More shar­ing on Insta­gram is going into direct messages—another exper­i­ment is under­way to move DMs in their own app, which would become Facebook’s fifth mes­sag­ing app (after Mes­sen­ger, Mes­sen­ger Kids, What­sApp, and recent­ly pur­chased teen-focused app, tbh). Instagram’s Sto­ries have been one of their suc­cess­es, quick­ly sur­pass­ing the usage of Snapchat (from which they stole the for­mat), although Snapchat is increas­ing­ly more pop­u­lar with teens—per­haps anoth­er rea­son for the tbh pur­chase.

The Mes­sen­ger (bot) plat­form seems to be set­tling around cus­tomer ser­vice, with brands (rather than ser­vices) com­ing to realise that it’s not a great fit for cam­paigns, but not always able to see anoth­er way into it. The ear­ly promise of con­ver­sa­tion­al inter­ac­tion in mes­sag­ing has hit the real­i­ty that nat­ur­al lan­guage requires a great invest­ment in train­ing, script­ing, and test­ing, so bots have tend­ed to fall back into button/prompt UI, which is often a worse expe­ri­ence than using a rich Web or native app inter­face. With many brands not will­ing to invest with­out clear return on invest­ment there is a vicious cir­cle (low invest­ment, dimin­ished expe­ri­ence, low user uptake, and repeat) indi­cat­ing that mes­sag­ing is like­ly to take a while longer to ful­fil its promise.

Other Notes

Those are the major trends I’m inter­est­ed in for (ear­ly) 2018, but there’s plen­ty more to be aware of.

Smarter and Cheaper Devices

Machine learn­ing is increas­ing­ly being run on-device (most­ly phone) rather than cloud servers. On-device ML is good for get­ting fast results, low­er­ing net­work data usage, and improv­ing pri­va­cy. Google’s Ten­sor­flow Lite seems set to become the ear­ly stan­dard for on-device learn­ing, using pre-trained mod­els accel­er­at­ed by device APIs (Android 8.1’s Neur­al Net­works API, iOS 11’s Core ML) Many of Apple’s iOS machine learn­ing mod­els, such as face recog­ni­tion, are already on-device, and Google’s recent pho­tog­ra­phy ‘appsper­i­ments’ (ugh) also show that’s a way for­ward they’re embrac­ing.

On-device learn­ing com­bined with cheap, minia­turised hard­ware (a prod­uct of the smart­phone boom) opens up a new cat­e­go­ry of smart, sin­gle-pur­pose devices. Google Clips is one exam­ple: a cam­era with pre-trained com­put­er vision mod­el that detects when an ‘inter­est­ing’ moment hap­pens, cap­tures it in a short video clip and sends it to your phone—no oper­a­tor required.

This could extend to oth­er phone/smart device func­tions, such as voice-con­trolled speak­ers that don’t require the full pow­er of Alexa or Assis­tant, instead using pre-trained mod­els to con­trol music play­back. And research repeat­ed­ly shows that some of the most-used func­tions on smart speak­ers are set­ting alarms and timers, and unit con­ver­sion (for cook­ery), so it’s not a stretch to imag­ine a cheap kitchen timer that has the lim­it­ed smarts to car­ry out those core func­tions.

The Decline of the Ad-funding Model

The steady growth of ad-block­ers indi­cates that users are tired of ads and—in particular—invasive track­ing, lead­ing to more device-native ad-block­ing; Apple’s Safari brows­er recent­ly start­ed block­ing a num­ber of third-par­ty track­ing scripts (the impact of that is already being felt), and from ear­ly 2018 Google’s Chrome will start to black­list sites that per­sis­tent­ly vio­late the Bet­ter Ads Stan­dards. The EU’s Gen­er­al Data Pro­tec­tion Reg­u­la­tion (GDPR) will come into force in ear­ly 2018, which should make it hard­er for com­pa­nies to (legal­ly) track users and share their data with oth­er ser­vices. All of this may have a knock-on effect on adver­tis­ing rev­enue (espe­cial­ly to those oper­at­ing in murky areas who deserve pun­ish­ment).

It seems strange to talk about a decline when dig­i­tal ad spend con­tin­ues to grow (and, in 2017, over­took TV spend for the first time), but the prob­lem is that Google and Face­book already take around 2/3 of adver­tis­ing spend, and Ama­zon (includ­ing Alexa) is on course to join them (as they become the de fac­to pre-pur­chase search engine). This leaves dig­i­tal media pub­lish­ers with less rev­enue, and 2017 saw busi­ness­es rely­ing on the ad-fund­ing model—such as Buz­zfeed and Vice—facing job cuts and restruc­tur­ing.

Many pub­lish­ers have opt­ed for paywalls/paygates, but these lim­it reach and have a nat­ur­al cap—how many peo­ple can afford to pay for one or more sub­scrip­tions? A few pub­lish­ers are try­ing read­er dona­tion ser­vices to make up for the drop in ad revenue—the Guardian and New York Times have had some suc­cess with this mod­el. With bet­ter pay­ment meth­ods arriv­ing in browsers (Apple Pay, the Pay­ment Request API), it’s pos­si­ble that some ad rev­enue loss could be off­set by micro­pay­ments.

Finance

The UK’s Open Bank­ingAPI stan­dard rolls out in ear­ly 2018, with the EUSec­ond Pay­ment Ser­vices Direc­tive (PSD2) fol­low­ing short­ly after. The two are set to have a huge impact on bank­ing and per­son­al finance in Europe, bring­ing a wave of new banks and sav­ings appli­ca­tions and shak­ing up the exist­ing insti­tu­tions.

As for cryp­tocur­ren­cies… I have a hard time with these. The lead­ing cryp­tocur­ren­cy, Bit­coin, has basi­cal­ly failed to meet every one of its promis­es, and only real­ly works as an invest­ment vehi­cle. The under­ly­ing blockchain tech­nol­o­gy promis­es to have more ben­e­fit, but most of them seem to be B2B—I haven’t real­ly seen any con­vinc­ing con­sumer use cas­es. One area that I am intrigued by is using them to cre­ate dig­i­tal scarci­ty, like Cryp­toKit­ties; play­ful use cas­es can often lead to more inter­est­ing out­comes, and adding val­ue to dig­i­tal art sounds use­ful. For every­thing else… I’ll wait and see.

VR

Although there is grow­ing oppor­tu­ni­ty in VR gam­ing, I still can’t see this break­ing into the main­stream. Phone-based VR has seri­ous tech­ni­cal lim­i­ta­tions to over­come, teth­ered head­sets are too expen­sive and cum­ber­some (and don’t seem to have sold well, although recent price cuts have helped a lit­tle). The next gen­er­a­tion stand­alone head­sets (Ocu­lus Go, Vive Focus, Day­dream) could open the mar­ket a lit­tle more, but I still think it has to over­come its biggest prob­lems: iso­la­tion, and requir­ing excep­tion­al behav­iour (it’s not as easy as watch­ing TV or using a phone). This may be mit­i­gat­ed by future tech­nol­o­gy, but I can’t see any imme­di­ate signs of that hap­pen­ing.

Machine Learning

There’s lit­tle point in talk­ing about machine learn­ing as a sep­a­rate tech­nol­o­gy; it’s the fuel pow­er­ing much of every­thing inter­est­ing that’s hap­pen­ing. One area of par­tic­u­lar inter­est for 2018 will be authen­tic­i­ty: ‘fake’ images and audio gen­er­at­ed with machine learn­ing algo­rithms are get­ting increas­ing­ly con­vinc­ing, and it seems alarm­ing­ly easy to use an adver­sar­i­al net­work to ‘trick’ com­put­er vision mod­els into see­ing some­thing oth­er than we do.

The Web

There’s lit­tle point in talk­ing about the Web as a sep­a­rate tech­nol­o­gy; it’s the data lay­er con­nect­ing much of every­thing inter­est­ing that’s hap­pen­ing. While the major oper­at­ing sys­tems and plat­forms refuse to coop­er­ate, the Web still pro­vides the broad­est reach, espe­cial­ly in devel­op­ing mar­kets using low­er-pow­ered devices and with­out access to closed app stores. It’s inter­est­ing to see pre­vi­ous­ly closed plat­forms like Insta­gram and Snapchat more will­ing to go to the Web as they move to scale.

Thanks for read­ing. If you’re inter­est­ed in sto­ries about technology’s role in cul­ture, soci­ety, sci­ence, and phi­los­o­phy, you might want to sub­scribe to my newslet­ter, The Thought­ful Net.