The “Magic Pipe” Fallacy: Privacy Protection in the Smart Home

Intelligent Digital Assistants (IDAs) or voice-activated smart devices such as Amazon’s Echo and Google Home have become an essential part of today’s smart life. We use them in our homes (e.g. online searches, querying about weather, directions, etc) as well as in our offices (e.g. recording meetings) to make our life smarter.

It’s a Matter of Convenience

Indeed, voice technology is sweeping our world and transforming our lives. IDAs and voice-activated televisions (smart TVs) will soon be commonly used in our daily lives. Recent forecasts show that 50% of all searches on the internet will be voice searches by 2020 [14] and there will probably be more digital assistants than humans by 2021 [15]. The research done by J. Walter Thompson and Mindshare [13] shows that efficiency is the main reason for using voice. It shows that the user’s brain activity is lower when voice is used, as compared to when touch or typing are used, which indicates that voice data is more intuitive than any other means of communication. Current common tasks for regular voice users (i.e. those who use voice services at least once a week) are “online searches, finding information about a specific product, asking for directions, asking questions, finding information about a specific brand or company, playing music, checking travel information, setting alarms, checking news headlines and home management tasks” [13].

The Right to Privacy.

Image of Facebook logo being crossed out.

Privacy issues in technology were first raised as far back as 1890 by two legal scholars in possibly the most influential privacy article, “The Right To Privacy”, where they examined whether existing laws at the time protected the individual’s privacy [8]. They wrote the article mainly in response to the rise of the ”snapshot” and its subsequent use in taking photos of people secretly or without their consent. They wrote “Instantaneous photographs and newspaper enterprise have invaded the sacred precincts of private and domestic life,”. “The Right To Privacy” article is considered as the main foundation of American privacy laws [5] and since its publication, privacy laws have been passed in some US states to protect individuals. Today, after more than 130 years, drones embedded with cameras, allow anyone to spy from above and new privacy laws are being passed in the US to limit and govern their use [11].

Today’s technology is affecting the privacy of individuals on a daily basis, through the use of smartphones and social media: photos captured by smartphones are shared in social media websites making them susceptible to breach by hackers. In addition to these privacy concerns about photos shared in the cloud, the rising use of cloud-based voice recognition systems such as IDAs and smart TVs has added another layer of privacy issues, sneaking up on people right inside their homes.

For many, there is a belief that there is a “magic pipe” that exists between their Alexa-type device, and the ultimate provider of information, very much like typing text into a browser, and getting a webpage direct from, say, a weather website.

The main privacy problem with voice is that the voice data is processed online at the cloud which enables the cloud to record and store voice data. This makes data vulnerable to breaches from external hackers as well as from the cloud server itself. In fact the cloud provider acts as the conduit of all information to and from the consumer, which could include sensitive financial and health information. The SSL “padlock” that we see against many websites, protecting data-in-transit, has no equivalent in the voice activated world.

What Risks can Voice Really Present?

Man Speaking on the Phone

Voice adds an extra layer of potential privacy intrusion over Plain Old Text communications. The recent progress in voice forensics driven by modern advancements in AI speech processing systems by researchers from institutions such as Carnegie Mellon University can profile speakers from their voice data: they can estimate the speaker’s bio-relevant parameters (e.g. height, weight, age, physical and mental health) as well as their environmental parameters (e.g. location of the speaker and the surrounding objects). These research findings have been recently applied to help the US Coast Guard to identify hoax callers [6]. This shows the amount of information that can be leaked about speakers when their recordings are breached by hackers, or even where they are used for data mining by cloud voice providers.

So, online speech recognition leads to privacy issues not only because the cloud server will know the speaker’s transcribed text but also because voice data reveals the speaker’s emotions (e.g. joy, sorrow, anger, surprise, etc) and the speaker’s biological features. Voice data contains biometric data that might be used to identify the speaker. In fact, applications for speaker verification (used for authentication purposes) and speaker identification (used to identify a speaker from a set of individuals) are currently being deployed or are already in use in banking and other sectors.

In [4], it has been reported that recent patents by Amazon and Google about use cases of their digital assistants, Echo and Home respectively, reveal privacy problems that could affect smart home owners. In particular, ``a troubling patent”, as noted in [4], describes the use of security cameras embedded in smart devices (e.g IPA, see Fig. 1) to send video shots to identify a user’s “gender, age, fashion-taste, style, mood, known languages, preferred activities, and so forth.” [4].

Fig. 1. Consent vs Amazon’s Echo Look and Google Home Mini

Recently, there has been rise in concern about privacy among the users’ of Amazon Echo and Google Home as shown in a recent paper [7] analysing online user reviews. Apparently Amazon’s Echo got bad reviews mostly concerned about privacy after being used as a testimony in a US court to judge a murder case in Arkansas [10]. The paper shows also that Google Home reviews were not affected by the news warning that they are always listening without being activated [12]. Of course, these devices need to be listening in order to detect their activation keywords (e.g. “Alexa” or “OK Google”) but they should not be recording anything before they spot their activation keywords.

General Data Protection Regulation (GDPR) vs Voice Data.

“Big Data is Watching You”

The EU GDPR [16], enforced on May 25th, defines Biometric data as follows “personal data resulting from specific technical processing relating to the physical, physiological or behavioral characteristics of a natural person, which allows or confirms the unique identification of that natural person, such as facial images or dactyloscopic data”. So GDPR categorises biometric data as sensitive personal data. Personal sensitive data needs to be protected and its processing can be done with consent or in certain cases where it is necessary. In particular, speakers’ voice data is related to their physical, physiological and behavioral characteristics as mentioned above.

Therefore, voice data as well as all other forms of data need to be protected when outsourced to the cloud, and any subsequent processing should be done with consent. Otherwise, if data is breached by hackers, un-protected breached data can be exploited with severe consequences of the type mentioned above.

Achieving Privacy in Voice-activated Applications.

Encrypted Data

Fortunately, there are some solutions that allow us to enjoy the use of IDAs whilst at the same time achieve some measure of privacy. One possible solution is an on-device speech recognition system combined with searchable encryption [3, 2, 1] which is one of the practical methods to perform secure search on encrypted data. An alternative is to have on-device speech recognition as on-device intent matching, eliminating the need to have any cloud intermediary.

In this case the IDA device could be the user’s smartphone, laptop or desktop computer. The on-device solution allows us to avoid the data-in-use protection needed when performing computation in the cloud. It is more suitable for IDAs since they normally processes short-duration voice data in real time.

Performing speech recognition offline at the client side rather than at the cloud side means that at a minimum, the corresponding transcription hides the speakers’ biological and environmental voice features noted above, and only reveals the transcribed texts to the cloud server to enable the server to respond to the speakers’ queries.

The cloud server will use a search engine or any other convenient method to respond to queries depending on dynamic data such as news headlines, weather forecasts, travel information, shopping, etc. However, some very private tasks can be done locally at the user side without using a cloud server such as making phone calls, home management and calendar management.

Our on-device solution can also perform generic speech recognition to transcribe recorded office meetings or recorded customer service calls for example. Privacy and security concerns aside, the prospect of outsourcing data storage to the cloud is attractive for a number of reasons. With professional cloud hosting comes robust backup services, unlimited capacity and essentially it is cheap and more convenient than maintaining on-premise in-house databases. If stored data is always encrypted on the cloud then many concerns disappear, since encrypted data can still be searched, with state of the art searchable encryption techniques. This enables users to perform search when needed on their encrypted data stored at the cloud without costly download-decrypt-re-upload protocols. Third party queries, for example, such as the ones required by court in the Alexa murder case, could be privately issued through the use of multi-client searchable encryption schemes [17, 3] where the data owner (i.e. the user who recorded the meeting or conference call) only writes the encrypted data and gives access to queries to an authorized third party (e.g. court) according to a policy agreement between the data owner and the third party. The cloud server storing the encrypted audio data will not be able to know the encrypted queries or the encrypted audio data because it does not have the data owner’s secret keys. It will only be able to learn whether two encrypted queries are the same or not but will never ‘see’ the actual plaintext queries.

Path of most resistance

Row of Taxis

Whilst these cryptographic approaches are exciting, they represent a threat to the current order. Google, Apple and Amazon are all building business models that insert themselves in the transaction loop between consumer and brand.

“Alexa, get me a taxi to the airport” represents a major source of potential revenue to Amazon, who act as the arbiter of your intent. You want a cheap taxi, so you don’t care if it is Uber, Lyft or a local cab company. The lucky company pays a small commission to Amazon for being chosen. If Amazon acts as the payment provider, that represents a second source of income.

What is required is an in-home device that is powerful enough to provide the cloud power of speech recognition and intent matching to allow consumers to interact directly with the internet, but which is cheap enough that it provides a bulwark against low-cost devices provided by the major providers. The teardown cost reported by ABI Research of the second-generation Echo Dot is $34.87 [18]. It retails at $49.99 for one device, or $40 for 2, and has been seen for as low as $30. Clearly it is being seen as a loss leader for other services.

The question is, in a world where privacy is regularly sacrificed by consumers for access to free services and content, who will blink first, the internet giants who depend on our data to fund their businesses, or the consumers who provide it?