Stay connected, up-to-date, and informed on all things parallel development via Go Parallel, where you'll find viewpoints, how-to's, software tools, and educational information to help your software development work shine. http://goparallel.sourceforge.net/

The Internet’s been abuzz with news that Intel’s long-term roadmap for manycore devices like Xeon Phi, or its predecessor Intel Labs design, the so-called Single-Chip Cloud Computer (SCC) , would likely include 48-core devices for the smartphone market.

Why would a smartphone need manycore? Lots of reasons, potentially.

The reason most-cited by Intel engineers is the need for parallel support of applications, alongside a vastly-expanded palette of OS-provided UI and I/O services, many of which are computationally-intensive. The usual suspects: speech recognition and gesture recognition, but with little concrete about how these features might evolve over the next few years. Herewith, some speculation…

Growing importance of speech recognition …

Speech recognition is already becoming an important aspect of mobile I/O, both for high-level command/control and application-launching for search, navigation, and other applications that can be driven with terse, relatively-unambiguous commands (e.g., “Navigate to (address),” “Search for Alabama football score,” etc.), as well as voice-dictation supplanting keyboard input for email, text messaging, IM, note-taking and similar apps. At the moment, most of these functions require network-side support and an Internet connection. That plays to their benefit in some respects, both in terms of overall efficiency and because centralization potentially speeds up machine learning: exposing the app to more and more-diverse inputs (accents, phrasings, etc.) and enabling it to resolve more transactions around those inputs.

The relatively small size of speech data makes centralization of speech recognition practical, given connectivity. But it’s less obvious how gesture recognition, which involves realtime image processing, can be done centrally, even over advanced wideband mobile networks like LTE. Pushing recognition down onto endpoint devices will mean re-architecting them – enabling improved performance while not compromising the ‘crowdsourcing’ benefits of centralization; which in turn, requires both significant signal-processing capability on the endpoint device, plus managerial capacity and bandwidth to frequently report to, and derive updates from, centralized resources. More, and better-coordinated CPUs here clearly is a great idea.

… and parallelism in end points

Accelerating this revolution are several obvious convergent and divergent trends in device design, which collectively point towards increasing multi-modality of I/O, and concomitant need for vastly increased parallelism at endpoints.

On the one hand, you have increasing dependence on touchscreens and gradual elimination of pointing devices – a trend that implies a soon-to-be-powerful need for grafting contact-based and more remote recognition modes together (e.g., touch, multi-touch, local, and more-remote gesture) to permit fluent input, prevent ergonomic issues, and enable UI and data to keep sharing screen space.

And this multi-modality can also be applied to solve problems on the other side of the trend-curve, where we’re talking about wearing mobile devices shaped like glasses, that lack any form of touch interface. Though there are payoffs here, as well – for example, head-worn devices enable correlation of attention (i.e., what you’re looking at, hence what’s important) with gesture recognition for improved disambiguation of attention – provided you have the cores to keep the whole sensorium lit up.

The question of augmented reality is also important – as this technology is likely to move quickly from its current state (dropping internet-derived informational overlays onto select objects in the visual field) towards a more sophisticated interactivity, requiring maintenance and updating of a more and more powerful realtime model of the world – including positions and states of nearby and remote devices, as well as many layers of recognizer-generated information. That means cores, cores, cores.

NFC and Personal Area Networks

Another clear trend pushing manycores to mobile is the emergence of NFC and the growing market potential of peripheral devices in personal-area networks. Long predicted, this is now really happening. But it will happen much faster if peripheral devices themselves can be relatively dumber – the emerging model is very much like cloud computing: where peripherals exploit processing and internet connectivity on the host mobile device . That said, having a mobile device that can work like a cloud computer is a very good thing.

Finally, it seems pretty obvious – all marketing hype aside – that mobile computing will always struggle, in some contexts, with bandwidth and connectivity. And most of the now-on-the-napkin-or-drawing-board technologies for improving connectivity dramatically and affordably are computation-dependent.

An example – probably one that should still be taken with a grain of salt until more proof that it works and can be productized becomes available – is Reardon’s Distributed Input/Distributed Output (DIDO) wireless concept, described in this white paper, and much-discussed in 2011. Whether this or other schemes move forward, however, it seems certain that availability of strong parallelism on mobile endpoints can only facilitate realization of functional, economical solutions.

Posted on November 6, 2012 by John Jainschigg, Geeknet Contributing Editor

[...] The Internet’s been abuzz with news that Intel’s long-term roadmap for manycore devices like Xeon Phi, or its predecessor Intel Labs design, the so-called Single-Chip Cloud Computer (SCC) , would likely include 48-core devices for the smartphone... [...]

Maybe the day will come when we will have so many processors that the global memory architecture will become a nasty bottleneck. At that moment maybe we will need to change the paradigm and go back to the 80's to implement a celullar automata type processor where there is only local data but not global.

Trackbacks

[...] The Internet’s been abuzz with news that Intel’s long-term roadmap for manycore devices like Xeon Phi, or its predecessor Intel Labs design, the so-called Single-Chip Cloud Computer (SCC) , would likely include 48-core devices for the smartphone… [...]

Use this MPI library for better application performance on Intel® architecture-based clusters by implementing the high-performance MPI-2 specification on multiple fabrics. Quickly deliver maximum end-user performance even with new interconnects—without requiring major changes to the software or operating environment.