Constraints on the Cloud: why we need machine learning at the Edge

ML has heavy processing demands, but the communications overheads mean we ultimately need it to happen at the very edge: on smartphones

August 23, 2019

We are entering a new wave of technological innovation driven by artificial intelligence (AI), with machine learning (ML) at the forefront. Even today, ML is an important aspect of any device experience, powering all kinds of tasks, features, and applications. From on-device security, like face unlock, facepay and fingerprint recognition, to smartphone camera and audio functions that allow users to have more intelligent and fun experiences through apps such as Socratic, Snapchat, FaceApp and Shazam, there are a variety of ML-based features used regularly by consumers.

However, for ML-based tasks that create massive amounts of data, these are often shifted to the cloud for processing before being sent back to the device with the action. For example, Socratic and Shazam both use ML processing in the cloud, not on the device. This begs the question: wouldn’t it be simpler and quicker for ML processing to happen on the device?

Being able to perform ML-based tasks on the device – or the edge – instead of sending them to the cloud for processing is described by many as the “next stage of the ML evolution.” There are significant constraints that make the back and forth of ML data between the cloud and device impractical – power, cost, latency and privacy. While ML at the edge seemed like a ‘pipe-dream’ a few years ago, recent innovation on devices has made them capable of processing compute-intensive tasks.

Should we do machine learning in racks like these at Google, stuffed with TPUs?– Google

The cloud constraints

All the constraints with ML processing on the cloud are interlinked.

First, the power and cost of processing the massive amounts of data required by ML tasks on the cloud are enormous, not to mention the sheer amount of traffic it generates through the ever-growing bandwidth demands. Back in 2017, it was noted that if everybody used their Android Voice Assistant for three minutes per day, then Google would have to double the number of data centers it owned.

Today in 2019, Google may have already solved this particular challenge – in fact, at its recent IO event, Google talked about getting the 500GB voice assistance model shrunk to 0.5GB so it runs fully on the device (phone). Nevertheless, it does illustrate the huge infrastructure and economic demands from ML processing on the cloud. ML at the edge can alleviate these burdens by reducing the reliance on costly cloud services and the supporting infrastructure that would be needed for cloud-based ML.

The big economic costs of ML processing on the cloud is perhaps best illustrated through the developer experience. It costs around $0.00003 for one second of compute on the server per one device. This is obviously a very small cost, but when you take into account that developers are targeting the largest audience possible, the price soon begins to escalate. For example, for one million devices, the cost for one second of compute on the server is $28, which is $1,680 a minute. For extremely successful apps targeting 100 million users, developers would be charged $1.6 million a minute!

On top of this, using the cloud for ML-based tasks is simply not practical. It takes huge amounts of power to send data to the cloud, even over a few meters. On the device, the amount of power needed for ML-based processing is far smaller.

Sending the data back and forth between devices and the cloud produces a noticeable lag or delay, something most ML-based applications, particularly those that are time-critical, will not tolerate. Improving latency opens new capabilities of what can be done with ML-based applications. For example, the Snapchat AR experience would not be possible if ML processing was done on the cloud due to the time it would take to send, process, and receive the enhanced video.

The latency issues of ML processing on the cloud is best illustrated by looking at the process step-by-step. Initiating the streaming of data one way – from the device to the cloud – on the current 4G network provides 50ms theoretical latency, which is a lag of around four video frames. Then for processing the ML data on the cloud, there is one video frame lag at 16ms latency. Finally, streaming back from the cloud to the device is another lag of four video frames at 50ms. Therefore, the overall process is providing a lag of around ten video frames, making the latency far higher than on-device ML that delivers immediate responsiveness. 5G does have the potential to change this device to edge processing as the rollout will bring 1ms latency, but widespread adoption of these technologies will still take some time.

Finally, the constant interactions between the cloud and device create a scenario that is more vulnerable to privacy threats. Users are understandably more comfortable having their personal data interpreted on their device rather than it being sent to the cloud. If the data produced for ML tasks has to travel hundreds of miles to the cloud and then back to the device with the decision, there are far more opportunities for it to be compromised. Basically, the best way to prevent attacks on the privacy of users is to make sure sensitive data never leaves the device.

... or on a phone?– Google

Developers want ML at the edge

One group whose requirements are best suited to ML processing at the edge is developers. ML is transforming the way developers write their algorithms. Previously these were hand-coded for specific use cases, which was time-consuming, and, at times, error prone. With ML, developers no longer need to hand-code. Instead, they create a model and train it with a dataset, which results in a far more robust approach compared to hand-coding.

ML has been well known for developers for many years, but mainly used in high compute-powered machines in data centers. It has not been used on mobile devices until recently, as these have become far more compute efficient and capable of running more complex ML models. Combining recent device innovation with the requirements of developers makes ML at the edge the preferred option. ML processing on the device gives developers instantaneous results through real-time processing and low latency, while also allowing them to perform ML-tasks without a network connection.

Cloud ML processing can be appropriate … sometimes

Despite all the challenges, the cloud model of data processing means that ML algorithms can constantly be changed and upgraded. Therefore, any device that interacts with the cloud can also improve how it processes ML tasks. In addition, there are some tasks that require a massive amount of compute and are not as time sensitive. Therefore, the latency issues with the cloud are not as big of an issue. Examples include data processing from drones that produce detail images or video feeds for research, industry (often agriculture) and the military, or medical imaging such as radiology or x-ray examinations on patients. While remaining important, ML at the edge is not necessarily a ‘one-size fits all solution’ for all ML-based tasks. All the groups involved in ML processing--from developers to OEMs--need to work out which tasks on the device stand to benefit the most from ML at the edge and which can remain cloud-based.

ML at the edge is a future requirement

Mobile devices that utilize a variety of ML-based tasks and applications are already being used by billions of people worldwide. Most will require a real-time response on the device that ML at the edge will provide. Users, and developers, do not want to be dependent on the cloud given the power, cost, privacy, and speed constraints. It is still early days for intelligent devices, but ML on the edge will be the best bet moving forward.