It makes sense. Smart devices from our pockets get geared up with more and more sensors. In parallel, the Internet of Things herald a new tsunami of data. This won't fit all in the netwrok bandwidth on the way up to the cloud and it would not be intelligent to push it that way either. On the other hand, the natural push would be to make the smart devices really smarter.

What would make a device smarter if not extra intelligence at a reasonable cost? A reasonable cost meaning also a reasonable power consumption.

Is AI the way to go? What neural network architecture/solutions will make this happen? What would be the HW - SW breakdown to minimize latency but provide as much intelligence as possible to cloud? Is this a new era for data mining too?

In the end I like this parallel to human brain: "Latency is also an issue for the human brain, where distributed processing powers our reflexes without involving the frontal cortex." Yet another dot connected! How many dots left to connect until we replicate the human brain? How about a uber-cloud overseeing other brains replicas 'socializing" with their data?

My own view is that grand efforts to understand the massive human brain with it's 80B neurons and countless synapses such as the EU Human Brain Project or US BRAIN program are doomed to failure when we can't even model the 309 neuron connectome of the C. Elegans worm. I believe that more modest efforts like the OpenWorm project will tackle brain function on a mirco level, allowing us to gradually scale up to complex behaviour just like Kilby's very primitive IC in 1959 madded the way to the multi-billion transistor SoCs we have today. In fact AI on the level of what an insect like a cockroach or an ant would already allow is to build very capable autonomous machines that could improve our daily lives performing useful tasks at very modest cost.

As computing is shifted more and more to the devices closer to the sensor this will also translate into a need for more and more research work to happen at embedded level. If I look in the world today, most processors closer to the sensors interfacing to the external world implement relatively simple algorithms, maybe just some FIR filter or such and leave the decisions to other stronger APs, or in this case the cloud.

What I see is that usually the main processors (or, let's talk about the cloud now) are based on some architecture which supports programming in very abstract ways, while the processors closer to the sensors are usually used still in a very "bare-metal" way to squeeze the maximum optimization possible in terms of power usage, speed etc.

So, this shift in locating the processing power I think leads to either the researchers of today having to abandon part of the abstractions they were used to and get closer to the metal or, alternetively, we might be at an era of a new strand of development towards improving tools for embedded in general, which would enable research labs and universities to program embedded devices at the same abstraction levels they would be used to.

The interesting problem I see with the latter is that while main processors tend to usually stay within one or two famillies, which enables development of their tools to be allocated to scores of teams around the world, the processors closer to the sensor are a lot more diverse. Many of them having dedicated instruction architectures therefore requiring separate toolchains in principle. In practice though we are seeing the rise of tools which try to abstract the architecture away to some extent like LLVM compilers are for example. So this new trend of switching processing closer to the sensor will, I think, open new doors for such portability tools.

I agree that much of the convenience of our current programming extractions will have to be abandoned as Moore's law and we can no longer rely on the 18 month 2x ratchet in terms of performance. This is a huge challenge for the SW industry as David Patterson said "parallelism is the biggest challenge in 50 years because industry is betting its future that parallel programming will be useful". We are all going to think more locally (embedded) in order to keep the whole economy going by compensating for the slowing of Moore's law by writing more optimised software and at the same time achieve rapid tome-to-market. Is the solution more APIs like OpenVX for computer vision and similar efforts for AI etc. or will we see more fragmentation similar to the break away from OpenGL by Apple in the form of it's Metal graphics API?

I agree that pushing computational ability and "intelligence" towards sensors and mobile devices in a power-efficient way makes a lot of sense for future large-scale cloud applications. This is particularly true when media data analysis and transferral is involved. It's another interesting example of the regular "integration-redistribution" cycle that has been repeating in many areas of electronics, engineering and computing in recent years.

I suspect the (re)integration trend may be slowing due to the increasing cost per transistor as technologies scale to below 28nm. At least to me a distibuted model makes most sense as energy consumption, heat dissipation and bandwidth can be allocated in such a way as to arrive at something close to a global minimum and allowing mobile-to-cloud solutions to scale better.

My two cents related to this interesting piece of writing: Vision is by far the most challenging sensor in terms of required processing. It may be also the most valuable. 'Distributed' vision or wireless vision sensors all around us seem to have been an elusive goal, despite the clear trend of computer vision going out from factory. Google's project Tango and the miniaturisation effort that brought Kinect may be the two most prominent first efforts in this respect.

Tango and Kinect are indeed thought-leading developments in this space, but they are by no means the only ones as we've seen from the Amazon smartphone, Occulus Rift and the new HTC One with dual-cameras and lytro-like focus-later capability. 2015 will be a key inflection point for the transition of such products to the mass market, hopefully many of them will be leverage Myriad technology.

Chris Anderson of 3D Robotics (and former Editor-in-Chief at Wired) likes to talk about leveraging the 'peace dividends of the Smartphone Wars" in disruptive manners, applied to new markets. Formerly unapproachable levels of proccessing power are now available at consumer price points... IMO, Movidius could aid general purpose APs to react to visual stimuli in real-time and low cost/power. This would be a boon to autonomous cars, UAVs, industrial robots, etc. all of which need to be able to take action to protect human safety without resorting to the Cloud. Leveraging DNN as Baidu & Neurala recommends makes sense - especially if the learnings could be openly curated and shared. I'd love to see an embedded version of the Movidius/Tango architecture open to hobbyists and hackers...

Your suggestion of an open platform for robotics than can leverage Myriad and offer Tango-like capabilities is an intriguing one and certainly food as roll-out our next generation product. We will be presenting Myriad2 on August 12th at HotChips in the Flint Centre in Cupertino.

From Centralised to distributed to centralised and again back to distributed...

the war goes on.

From the centralised mainframes we moved to localise desk tops with home and office computers doing alomost evrthing in the confort of our office or home without the need to submit our "jobs" to that batch processing main frames and waiting for the output at the end of the day!

Then the time came when distributed was th buzz word.

With the proliferation of internet and server farms , we moved back to centralised model - now touted as cloud computing .

Now the bandwidth is again becoming the limiting factor for the cloud based data processing to happen in real time and we are thinking ways to do localised processing as far as possible.

We are talking about great strides in vision and audio processing and think that these two will lead us to a great progress in AI.

But hardly any progress has been made in sensing, processing and reproducing smell -imitate the nose - which in my opinion is as important as the other two sensory organs.

When we will be able to replace those trained dog squads, by some smart AI systems ,that would sniff out the bombs or other such suspicious objects?

The article is really interesting since combines ideas and aspects from the "global" and "local" worlds, i.e., "cloud" vs. "processor". Since the major of part of energy is referred to communication costs rather than the computation costs. In oher words, it is not necessary to flood the network with tons of data, spending so much energy for data communication. The energy should be spent for transfering useful data only (David says "metadata"), which the result(s) from a computation process.

I'll try to make an analogy from the real life. A meeting is effective if the participants exchange well-proved results (i.e., processed locally by human beings), not just ideas and data (i.e., data and apps in the "cloud" of meeting). Thus, a successful meeting requires less energy (i.e., less working/meeting hours).

The real challenge is to find the golden ratio between "local" and "global" activities, which is time-dependent and thus, there is a need for AI algos. In other words, the hardware/software codesign problem should be re-defined having in mind which tasks should be run in cloud and which ones locally.