7 Tips for Getting Started with Multicore

Introduction
Multicore processors are everywhere. In desktop computing, it is almost impossible to buy a computer that does not have a multicore CPU inside. Multicore technology is also impacting the embedded space, where increased performance per watt presents a compelling case for migration.

Multicore technology is also impacting the embedded space, where increased performance per watt presents a compelling case for migration

Developers are increasingly turning to multicore because they either want to improve the processing power of their product, or they want to take advantage of some other technology that is 'bundled' within with the multicore package.

This article offers seven tips to help making that first step to using such devices.

Tip 1. Don't Fix What Works
It's not unnatural to want to use the latest technology in our favorite embedded design. It is tempting to make a design to be a technological showcase, using all the latest bells and whistles. However, it is worth reminding ourselves that what is fashion today will be 'old hat' within a relatively short period.

If you have an application that works just fine, and is likely to keep performing adequately within the lifetime of the product, then maybe there is no point in upgrading.

Tip 2. Use Multicore to Take Advantage of the Low Thermal Footprint
One of the benefits of the recent processor design trends has been the focus on power efficiency. Prior to the introduction of multicore, new performance barriers were reached by providing silicon that could run with faster and faster clock speeds. An unfortunate by-product of this speed race was that the heat dissipated from such devices made them unsuitable for many embedded applications.

As clock speeds increased, the limits of the transistor technology physics were moving ever closer. Researchers looked for new ways to increase performance without further increasing power consumption. It was discovered that by turning down the clock speeds and then adding additional cores to a processor, it was possible to get a much improved performance per watt measurement.

The introduction of multicore, along with that of new gate technology, redesign of power-hungry parts of CPU designs has led to CPUs that use significantly less power, and yet are capable of more raw processing than their antecedents.

An example is the Intel® Atom™ processor, a low power Intel® architecture processor which uses 45nm Hi-K transistor gates. By implementing an in-order pipeline, adding additional deep sleep states, supporting SIMD (Single Instruction Multiple Data) instructions and using efficient instruction decoding and scheduling, Intel has produced a powerful but not power-hungry piece of silicon.

Taking advantage of the lower power envelope could in itself be a valid reason for using multicore devices in an embedded design " even if the target application is still single-threaded.

Tip 3. Utilize the Advanced Architectural Extensions
All the latest generation CPUs have various architectural extensions that are 'free' and should be taken advantage of. One very effective, but often underused extension is support for SIMD " that is conducting several calculations in one instruction.

The Intel® Atom processor, for example, has dedicated SIMD execution units, as can be seen in Figure 1.

Figure 1. The Internals of the Low Power Intel® Architecture

Often developers ignore these advanced operations because of the perceived effort of adding such instructions to application code. While it is possible to use these instructions by adding macros, inline assembler or dedicated library functions to the application code, a favorite of many developers is to rely on the compiler to automatically insert such instruction in the generated code.

Auto-vectorization and SIMD
One technique known as 'auto-vectorization' can lead to significant performance boost of an application. In this technique the compiler looks for calculations that are performed in a loop. By replacing such calculations with say Streaming SIMD Extension (SSE) instructions, the compiler effectively reduces the number of loop iterations required. Some developers have seen their applications run twice as fast by turning on auto-vectorization in the compiler.

The potential performance boost by this technique cannot be overstated. Some developers have reported speedup of their application by a factor of 2 or more. In one recent case, a speedup of over 10 was achieved " all by the flick of a compiler switch. Intel® architecture yields the best benefit.

Like the power gains discussed previously, using these architectural extensions may be a valid reason in itself for using a multicore processor even if you are not developing threaded code.