ARM's Lead Engineer Discusses Inexact Processing

ARM's lead processor architect has told us that the company has thought about creating an inexact processor; a processor that curtails precision in the interest of saving power.

Richard Grisenthwaite, lead processor architect with ARM, has said the company has thought about creating an inexact processor -- a processor that curtails precision to save power. The technique has echoes of fuzzy logic and probabilistic processing and uses reduced accuracy of multiplication and addition but manages the probability of errors building up.

Grisenthwaite, vice president of technology and an ARM Fellow, has been responsible for the ARM architecture for the last ten years or so. He is associated with the ARMv7 instruction set architecture and the Cortex range of processor cores, with the move to 64-bit computing with ARMv8, and the "big-little" strategy now being deployed in both ARMv7 and ARMv8. Big-little pairs a performance-optimized processor core (big) with a power-optimized core (little) and the software load is allowed to transition between them as part of a dynamic voltage and frequency scaling regime.

I met Grisenthwaite recently at an ARM-organized analysts' conference and so took the opportunity to ask him about inexact computing and the possibility of its deployment by ARM.

Inexact processing is not new but seeks to address applications that do not require the full precision that is conventional in software but is therefore responsible for much of the power consumption without benefit. The technique is relevant in audio and graphical subsystems where the ear or eye often cannot perceive the full computed resolution and where human beings are good at error compensation. It is also applicable in some database and networking applications and control algorithms.

Different strokes for different apps
In the past it has been the case that because some applications required full precision -- and software was written using precision data types -- it was necessary that a computer's single processor be capable of full precision. But with heterogeneous multicore processing and hardware offload entering the mainstream, there is now an opportunity to provide different processor cores within a single system-chip optimized for different classes of application and providing the associated precision.

Earlier this year EE Times reported on a prototype inexact processor built by an academic team that demonstrated energy savings of 90 percent for particular applications where answers deviated from the correct value by 0.25 percent on average. (See: Inexact processor is more power efficient.)

So is there scope for ARM to produce a processor architecture or micro-architecture that somehow uses "pruned" ALUs and holding registers with fewer bits to achieve acceptable answers at much lower power consumption?

"ARM has thought about that. But it is quite hard to see how we would deploy it under our business model. It tends to be application area specific and we have to build cores that go to many applications and markets," Grisenthwaite told me. "And you would have to put it alongside a core designed to run conventional precision and legacy code."

My response was that this could be a good fit with ARM's computing philosophy of having multiple cores offering different required levels of application performance and energy efficiency and that an inexact processor could be run with other cores under a big-little or heterogeneous computing scheme. "It is something ARM has looked at but there's nothing to say about it" was Grisenthwaite's reply and clearly a concluding remark.

Does that mean ARM is working on an inexact processor? Your interpretation of Grisenthwaite's remarks is probably as good as mine -- but perhaps the company should be?

"the block tends to be specified for the toughest requirement and then operated at that level."

I can understand that, but in cell phones the toughest requirement is usually power. I don't know the impact of inexact caclucations on video quality, but I'm willing to bet that a 90% power reduction would trumpt exact calculations in many cases.

I agree that three identical low resolution data engines running the same software would tend to produce identical inexact results making a voting regime redundant. Nonetheless I think there is scope for more creative thinking here.

How about this?

Three (or more) low-resolution data engines running different algorithmic approaches to achieve the same functionality might produce different inexact results that could be averaged to produce a higher resolution final result, but with a significant reduction in energy consumption.

They do indeed put compression/decompression engines in graphics processors.

In fact some systems have ways to offload compression/decompression to the GPU in a heterogeneous processor system.

But as with examples given above the accuracy, loss/losslessness required may depend on application (video versus communications) and as such the block tends to be specified for the toughest requirement and then operated at that level.

Don't they put video decompression engines in processors/video chips nowdays to help with handling MPEG video streams? This seems like the perfect application for inexact calculation.

Also, VoLTE is the up-and-coming voice standard for phones that all the major cell companies are headed toward. This is another application that uses a reasonable amount of CPU power where inexact processing would work and the power savings would be very desireable.

If the power savings were 50% or less I would say that it wasn't very interesting, but at 90% you have the option of running multiple cores on the same data and voting for the correct answer. Three cores with only a .25% variance should be pretty close to reliable, depending on the distribution of the errors. Granted, it would "only" save 70% of the power, but would definitely be worth exploring.

It is also worth looking at where the errors show up. If programmatic logic is not affected, then network routers and many control systems would definitely benefit. I would expect that data collection systems that do advanced math would be leery of it unless the error range could be bounded and manageable.