ARM’s secret recipe for power efficient processing

There are several different companies that design microprocessors. There is Intel, AMD, Imagination (MIPS), and Oracle (Sun SPARC) to name a few. However, none of these companies is known exclusively for their power efficiency. That isn’t to say they don’t have designs aimed at power efficiency, but this isn’t their specialty. One company that does specialize in energy efficient processors is ARM.

While Intel might be making chips needed to break the next speed barrier, ARM has never designed a chip that doesn’t fit into a predefined energy budget. As a result, all of ARM’s designs are energy efficient and ideal for running in smartphones, tablets and other embedded devices. But what is ARM’s secret? What is the magic ingredient that helps ARM to produce continually high performance processor designs with low power consumption?

A high-end i7 processor has a maximum TDP (Thermal Design Power) of 130 watts. The average ARM-based chip uses just two watts max budget for the multi-core CPU cluster, two watts for the GPU and maybe 0.5 watts for the MMU and the rest of the SoC!

In a nutshell, the ARM architecture. Based on RISC (Reduced Instruction Set Computing), the ARM architecture doesn’t need to carry a lot of the baggage that CISC (Complex Instruction Set Computing) processors include to perform their complex instructions. Although companies like Intel have invested heavily in the design of their processors so that today they include advanced superscalar instruction pipelines, all that logic means more transistors on the chip, more transistors means more energy usage. The performance of an Intel i7 chip is very impressive, but here is the thing, a high-end i7 processor has a maximum TDP (Thermal Design Power) of 130 watts. The highest performance ARM-based mobile chip consumes less than four watts, oftentimes much less.

This isn't the world of desktops and big cooling fans, this is the world of ARM.

And this is why ARM is so special, it doesn’t try to create 130W processors, not even 60W or 20W. The company is only interested in designing low-power processors. Over the years, ARM has increased the performance of its processors by improving the micro-architecture design, but the target power budget has remained basically the same. In very general terms, you can breakdown the TDP of an ARM SoC (System on a Chip, which includes the CPU, the GPU and the MMU, etc.) as follows. Two watts max budget for the multi-core CPU cluster, two watts for the GPU and maybe 0.5 watts for the MMU and the rest of the SoC. If the CPU is a multi-core design, then each core will likely use between 600 to 750 milliwatts.

These are all very generalized numbers because each design that ARM has produced has different characteristics. ARM’s first Cortex-A processor was the Cortex-A8. It only worked in single-core configurations, but it is still a popular design and can be found in devices like the BeagleBone Black. Next came the Cortex-A9 processor, which brought speed improvements and the ability for dual-core and quad-core configurations. Then came the Cortex-A5 core, which was actually slower (per core) than the Cortex-A8 and A9 but used less power and was cheaper to make. It was specifically designed for low-end multi-core applications like entry-level smartphones.

At the other end of the performance scale, came the Cortex-A15 processor, it is ARM’s fastest 32-bit design. It was almost twice as fast as the Cortex-A9 processor but all that extra performance also meant it used a bit more power. In the race to 2.0Ghz and beyond many of ARM’s partners pushed the Cortex-A15 core design to its limits. As a result, the Cortex-A15 processor does have a bit of a reputation as being a battery killer. But, this is probably a little unfair. However to compensate for the Cortex-A15 processor’s higher power budget, ARM released the Cortex-A7 core and the big.LITTLE architecture.

The Cortex-A7 processor is slower than the Cortex-A9 processor but faster than the Cortex-A processor. However, it has a power budget akin to its low-end brothers. The Cortex-A7 core when combined with the Cortex-A15 in a big.LITTLE configuration allows a SoC to use the low-power Cortex-A7 core when it is performing simple tasks and switch to the Cortex-A15 core when some heavy lifting is needed. The result is a design, which conserves battery but yet offers peak performance.

64-bit

ARM also has 64-bit processor designs. The Cortex-A53 is ARM’s power-saving 64-bit design. It won’t have record breaking performance, however it is ARM’s most efficient application processor ever. It is also the world’s smallest 64-bit processor. Its bigger brother, the Cortex-A57, is a different beast. It is ARM’s most advanced design and has the highest single-thread performance of all of ARM’s Cortex processors. ARM’s partners will likely be releasing chips based on just the A53, just the A57, and using the two in a big.LITTLE combination.

One way ARM has managed this migration from 32-bit to 64-bit is that the processor has different modes, a 32-bit mode and a 64-bit mode. The processor can switch between these two modes on the fly, running 32-bit code when necessary and 64-bit code when necessary. This means that the silicon which decodes and starts to execute the 64-bit code is separate (although there is reuse to save area) from the 32-bit silicon. This means the 64-bit logic is isolated, clean and relatively simple. The 64-bit logic doesn’t need to try and understand 32-bit code and work out what is the best thing to do it each situation. That would require a more complex instruction decoder. Greater complexity in these areas generally means more energy is needed.

A very important aspect of ARM’s 64-bit processors is that they don’t use more power than their 32-bit counterparts. ARM has managed to go from 32-bit to 64-bit and yet stay within its self-imposed energy budget. In some scenarios the new range of 64-bit processors will actually be more energy efficient than previous generation 32-bit ARM processors. This is mainly due to the increase in the internal data width (from 32- to 64-bits) and the addition of extra internal registers in the ARMv8 architecture. The fact that a 64-bit core can perform certain tasks quicker means it can power-down quicker and hence save battery life.

This is where the software also plays a part. big.LITTLE processing technology relies on the operating system understanding that it is a heterogeneous processor. This means the OS needs to understand that some cores are slower than others. This generally hasn’t been the case with processor designs until now. If the OS wanted a task to be performed, it would just farm it out to any core, it didn’t matter (in general), as they all had the same level of performance. That isn’t so with big.LITTLE. Thanks to Linaro hosting and testing the big.LITTLE MP scheduler, developed by ARM, for the Linux kernel which understands the heterogeneous nature of big.LITTLE processor configurations. In the future, this scheduler could be further optimized to take into account things like the current running temperature of a core or the operating voltages.

The future is looking brighter than ever for mobile computing.

There is also the possibility of more advanced big.LITTLE processor configurations. MediaTek has already proven that the big.LITTLE implementation doesn’t need to be adhered to rigidly. Its current 32-bit octa-core processors use eight Cortex-A7 cores, but split into two clusters. There is nothing to stop chip makers from trying other combinations that include different sizes of LITTLE cores in the big.LITTLE hw and sw infrastructure, effectively delivering big, little and even smaller compute units. For example, 2 to 4 Cortex-A57 cores, two performance tuned Cortex-A53 cores, and two smaller implementations of the Cortex-A53 CPU tuned towards lowest leakage and dynamic power – effectively resulting in a mix of 6 to 8 cores with 3 levels of performance.

Think of the gears on a bicycle, more gears means greater granularity. The extra granularity allows the rider to pick the right gear for the right road. Continuing the analogy, the big and LITTLE cores are like the gears on the crank shaft, and the voltage level is like the gears on the back wheel – they work in tandem so the rider can choose the optimum performance level for the terrain.

The future is looking brighter than ever for mobile computing. ARM will continue to optimize and develop its CPUs around a fairly fixed power budget. Manufacturing processes are improving and innovations like big.LITTLE will continue to give us the benefits of peak performance with lower overall power consumption. This isn’t the world of desktops and big cooling fans, this is the world of ARM and its energy efficient architecture.