Establishing A New Frontier In Embedded Multicore Programming

David Stewart | Sep 22, 2008

Over the last few years, the electronics industry has seen a varied range of exotic new multicore processor architectures, 54 at our last count. While many of the inventors of these fascinating devices have promoted them as general solutions, only a few have seen success, and then in very specific application spaces.

To date, multicore processing has been restricted to desktop machines where for various reasons it tends to be under-utilized, and embedded platforms where the individual cores perform self-contained functions with limited interaction—which is what makes ARM’s new Cortex A9 multicore processor so interesting. For the first time, a standard, low power, multicore processor will be available for use in embedded systems. It has arrived at a time where new embedded applications require dramatic increases in processing horsepower alongside ever-present requirements to keep power consumption very low.

So what is the tipping point for an embedded multicore processor to achieve broad acceptance across multiple industry sectors? Well, changes in other industries suggest a combination of well targeted technology advancements, an organization with the brand muscle to drive adoption, and an ecosystem that transforms technology into a solution.

Technologically, two Cortex A9 advancements stand out. Firstly, the opportunity to exploit parallelism has been significantly expanded, suggesting that the potential performance of this device should exceed that of previous ARM processors. Secondly, although ARM has included DSP style instructions in previous devices, none compare to the functionality of the NEON Instruction Set. ARM has previously focused on the embedded control processor space but has left data processing to DSPs and other specialized cores from different suppliers. The Cortex A9 is a departure from this model and will likely be leveraged for a range of new DSP- and graphics-based applications.

This device could well be an early indicator of embedded processors to come, combining parallel architectural elements with instruction sets that enable a broad range of applications. ARM certainly has the brand muscle and industry connections to open up this opportunity and spread adoption across the embedded sectors. The company is publically revealing fifteen licensees of the Cortex A8/9 already—a “who’s who” of the mobile and consumer electronic industry sectors.

So the embedded processing ecosystem is going to have to take account of this expansion, providing capability that appeals to data as well as control applications, and above all, enabling engineers to take full advantage of the opportunity provided by this, and other, parallel architectures.

There has been plenty of commentary on multicore programming issues. It is hard enough for desktop application programmers to get to grips with the new thinking required for multicore. At least on the desktop, such constraints as power consumption and system performance take a back seat to functionality. This is not the case in the embedded world, and especially with data-centric applications.

Programming applications at this level can be a very different challenge. Performance and power consumption constraints are an important part of the overall design. Ensuring smooth data flow and efficient system operation is critical. The traditional ecosystem required to produce these applications: specialized assemblers, optimized functional libraries, instruction and cycle accurate models, for example, are reasonably well understood.

However, the new ecosystem for embedded multicore will need to take account of the intricacies of parallelism together with the harsh constraints of real time processing. The Cortex A9, and processors like it, will require a two dimensional ecosystem of performance centric programming, combined with concurrent operation.

A methodology is required which utilizes multiple environments, from desktop programming, through virtual platforms, to working with the final system. Programming requirements include assembly code routines, cache profiling, and pipeline behavioral analysis. Real-time constraint monitoring will have to be combined with cross-application communication optimization, data race and deadlock avoidance, load balancing, and stalled execution signaling on an unprecedented scale.

Some may feel that data processing does not have the same level of indirection, branching, and interrupt-based operation inherent in control centric applications, so parallel coding is less complex and error prone. Today, there is some truth to this. However, the applications of the future, running on diverse instruction set processors, will serve to break down the barriers between control and data, culminating in a mesh of intercommunicating, optimized code streams.

The multicore methodology requirements of the desktop will be magnified in the embedded applications of the future, and with the performance needs of these systems exposing the limitations of today’s flows, a multicore-centric environment becomes essential. As small code blocks, a product of initial forays into parallel programming, give way to extensive multimillion line-parallel applications with intertwined threads of execution, parallel programming issues will be amplified, driving delays and quality issues. It is essential that this is recognized and prevented up-front.

The ARM Cortex A9, with its broad applicability and parallel performance, could blaze a path to a new level of reusable platforms, with economies-of-scale resulting from fewer specialized processors and reduced custom hardware. With the complexity contained in the latest wave of communications protocols and multimedia standards, the timing couldn’t be better. We’ve just got to get the ecosystem right!