AI Fits Best in Your Pocket

Artificial Intelligence is mainly run in massive datacenters these days, but some of the best emerging thought on the subject suggests it should be done at the edge of the network, close to where data is collected.

Since the 1800s, people have predicted a revolution in machine intelligence. In that time we have gone from fear of robots taking our jobs to concern about the cloud processing data from our devices using deep-learning algorithms. For the industry, the key technical issues are how we can overcome challenges with latency, bandwidth, and compute power.

The efforts of Google and Facebook in deep neural network (DNN) are well known. More recently, Dr. Ren Wu of Baidu’s Cupertino-based Institute for Deep Learning showed in a presentation how using ARM-based server farms has allowed him to achieve massive improvements in speech-recognition and image-recognition tasks. His approach reduced the error rate in speech and optical character recognition by 25-30% and achieved 94% correctness for face-detection.

Baidu has deployed five DNN applications, with more in the pipeline. The scale of its training data -- running for weeks or months at a time -- is truly awe-inspiring. It uses hundreds of millions of images, OCR and Click-Through Rate data, and tens of billions of speech samples. What’s more, these data sets are projected to grow 1,000% per annum over the near term. Interestingly, these DNNs trained in the Baidu cloud are now being deployed by Baidu in a mobile app leveraging mobile GPUs and OpenCL.

While the main DNN players are focused on server-based solutions with the beginnings of a mobile strategy, Max Versace, director of Boston University Neuromorphics Lab and CEO of Neurala takes a different approach. He asserts that today’s machines would require a nuclear power plant to deliver processing power equivalent to the human brain.

Indeed, power efficiency is a huge issue. Up to a million times more energy per operation is required in the cloud as opposed to processing locally in a device, according to Mark Horowitz of Stanford University. Looking even farther out, Eugenio Culurciello of Teradeep advocates a hardware approach to deep learning aimed at maximizing the power efficiency of DNN in mobile devices.

Another question for interactive and safety-critical services such as self-driving cars is latency. It is already a challenge for Web search, with Google issuing a call to action for the semiconductor industry at ISSCC in February 2014. On this front, it's interesting to note that Neurala’s next-generation NASA Mars rover experiences a software delay with a round-trip latency of 28 minutes. Latency is also an issue for the human brain, where distributed processing powers our reflexes without involving the frontal cortex.

Cloud-based services only work when we can scale to millions of users simultaneously, but how are we to stream video from mobile devices when bandwidth is currently a major limitation even for services like Netflix? Princeton University’s SignalGuru project shows how local video pre-processing cuts bandwidth requirements 1,000-fold in a cloud-based application.

As we have seen, a monolithic cloud model doesn’t really stack up from the perspective of power efficiency, latency, and bandwidth scalability. If this is true, we must ask how we can overcome the limitations of today’s AI model so we can enrich our lives and provide enduring value.

There is an opportunity for the industry to distribute computation and build systems that are able to “think locally and act globally” by preprocessing data within personal devices using low-power processors. In this model, data is processed as close as possible to the sensor, as our brains work. In this approach, only metadata rather than video is streamed to the cloud, resolving the power-efficiency, latency, and security issues and requiring radically less expensive infrastructure.

— David Moloney is Senior Vice President and Chief Technology Officer at Movidius, a vendor of vision processors.

The article is really interesting since combines ideas and aspects from the "global" and "local" worlds, i.e., "cloud" vs. "processor". Since the major of part of energy is referred to communication costs rather than the computation costs. In oher words, it is not necessary to flood the network with tons of data, spending so much energy for data communication. The energy should be spent for transfering useful data only (David says "metadata"), which the result(s) from a computation process.

I'll try to make an analogy from the real life. A meeting is effective if the participants exchange well-proved results (i.e., processed locally by human beings), not just ideas and data (i.e., data and apps in the "cloud" of meeting). Thus, a successful meeting requires less energy (i.e., less working/meeting hours).

The real challenge is to find the golden ratio between "local" and "global" activities, which is time-dependent and thus, there is a need for AI algos. In other words, the hardware/software codesign problem should be re-defined having in mind which tasks should be run in cloud and which ones locally.

From Centralised to distributed to centralised and again back to distributed...

the war goes on.

From the centralised mainframes we moved to localise desk tops with home and office computers doing alomost evrthing in the confort of our office or home without the need to submit our "jobs" to that batch processing main frames and waiting for the output at the end of the day!

Then the time came when distributed was th buzz word.

With the proliferation of internet and server farms , we moved back to centralised model - now touted as cloud computing .

Now the bandwidth is again becoming the limiting factor for the cloud based data processing to happen in real time and we are thinking ways to do localised processing as far as possible.

We are talking about great strides in vision and audio processing and think that these two will lead us to a great progress in AI.

But hardly any progress has been made in sensing, processing and reproducing smell -imitate the nose - which in my opinion is as important as the other two sensory organs.

When we will be able to replace those trained dog squads, by some smart AI systems ,that would sniff out the bombs or other such suspicious objects?

Tango and Kinect are indeed thought-leading developments in this space, but they are by no means the only ones as we've seen from the Amazon smartphone, Occulus Rift and the new HTC One with dual-cameras and lytro-like focus-later capability. 2015 will be a key inflection point for the transition of such products to the mass market, hopefully many of them will be leverage Myriad technology.

Your suggestion of an open platform for robotics than can leverage Myriad and offer Tango-like capabilities is an intriguing one and certainly food as roll-out our next generation product. We will be presenting Myriad2 on August 12th at HotChips in the Flint Centre in Cupertino.

Chris Anderson of 3D Robotics (and former Editor-in-Chief at Wired) likes to talk about leveraging the 'peace dividends of the Smartphone Wars" in disruptive manners, applied to new markets. Formerly unapproachable levels of proccessing power are now available at consumer price points... IMO, Movidius could aid general purpose APs to react to visual stimuli in real-time and low cost/power. This would be a boon to autonomous cars, UAVs, industrial robots, etc. all of which need to be able to take action to protect human safety without resorting to the Cloud. Leveraging DNN as Baidu & Neurala recommends makes sense - especially if the learnings could be openly curated and shared. I'd love to see an embedded version of the Movidius/Tango architecture open to hobbyists and hackers...

My two cents related to this interesting piece of writing: Vision is by far the most challenging sensor in terms of required processing. It may be also the most valuable. 'Distributed' vision or wireless vision sensors all around us seem to have been an elusive goal, despite the clear trend of computer vision going out from factory. Google's project Tango and the miniaturisation effort that brought Kinect may be the two most prominent first efforts in this respect.

I agree that much of the convenience of our current programming extractions will have to be abandoned as Moore's law and we can no longer rely on the 18 month 2x ratchet in terms of performance. This is a huge challenge for the SW industry as David Patterson said "parallelism is the biggest challenge in 50 years because industry is betting its future that parallel programming will be useful". We are all going to think more locally (embedded) in order to keep the whole economy going by compensating for the slowing of Moore's law by writing more optimised software and at the same time achieve rapid tome-to-market. Is the solution more APIs like OpenVX for computer vision and similar efforts for AI etc. or will we see more fragmentation similar to the break away from OpenGL by Apple in the form of it's Metal graphics API?

My own view is that grand efforts to understand the massive human brain with it's 80B neurons and countless synapses such as the EU Human Brain Project or US BRAIN program are doomed to failure when we can't even model the 309 neuron connectome of the C. Elegans worm. I believe that more modest efforts like the OpenWorm project will tackle brain function on a mirco level, allowing us to gradually scale up to complex behaviour just like Kilby's very primitive IC in 1959 madded the way to the multi-billion transistor SoCs we have today. In fact AI on the level of what an insect like a cockroach or an ant would already allow is to build very capable autonomous machines that could improve our daily lives performing useful tasks at very modest cost.

I suspect the (re)integration trend may be slowing due to the increasing cost per transistor as technologies scale to below 28nm. At least to me a distibuted model makes most sense as energy consumption, heat dissipation and bandwidth can be allocated in such a way as to arrive at something close to a global minimum and allowing mobile-to-cloud solutions to scale better.

I agree that pushing computational ability and "intelligence" towards sensors and mobile devices in a power-efficient way makes a lot of sense for future large-scale cloud applications. This is particularly true when media data analysis and transferral is involved. It's another interesting example of the regular "integration-redistribution" cycle that has been repeating in many areas of electronics, engineering and computing in recent years.