London Calling: Are ARM's big-little days numbered?

The pace of change of in IC manufacturing is becoming so fast that ideas and techniques may struggle to last without significant re-invention. What next for big-little?

In the early days of digital electronics a good architectural idea could be implemented and the advantage it granted could be expected to apply for all foreseeable manufacturing generations. But now the complexity and pace of change of chip manufacturing is such that ideas and techniques sometimes struggle to last without significant re-invention.

"Big-little" is the idea, from processor licensor ARM Holdings plc (Cambridge, England), of pairing a performance-optimized processor core with a low-power standby-optimized processor core. It enables application software to switch between the cores for an overall energy saving in typical use where equipment spends much of its time in standby mode. ARM has prepared two processors cores, the Cortex-A15 and the Cortex-A7, to help implement the strategy and the idea is starting to come through in commercial products. Two examples are the Exynos 5 Octa applications processor from Samsung and the MP6530 from Renesas Mobile.

But can big-little last and, if not, how must it be re-invented?

For a start we must consider that big-little is itself a reinvention of, or complement to, dynamic voltage and frequency scaling (DVFS).

DVFS is the idea that you can wind the voltage and clock frequency down on a given core to save dynamic power consumption at low performance and then wind them up to achieve a necessary application performance level. So an application starts on the Cortex-A7 it goes up through the DVFS "gears" until it is at top voltage and clock frequency and then jumps across to the Cortex-A15 where it resumes at low voltage and relatively low power and again goes up through the DVFS gears to reach top performance. As the application-load diminishes the process is followed in reverse until the equipment is once again idling at the lowest DVFS point on the "little" core.

However, as we have ridden Moore's Law down to smaller dimensions we have also reduced the voltage ranges and thereby reduced DVFS scalability. So ARM's bright idea was to buy back some DVFS trade-off with additional silicon real-estate.

But if we follow Moore's Law on to 20-nm bulk CMOS and 16/14-nm FinFET processes voltage scalability is likely to be yet more reduced, reducing the scope for the use of DVFS within big-little. So what's to be done?

What's next for big.Little is that it just got 20 licensees, and Renesas will even take it into the car chip market, which should be another booming chip market, soon:
http://techdomino.com/renesas-big-little-r-car-h2

Mhhh, more interesting techniques exit like body biasing. Check the literature on conferences I would say. Though such techniques are not always possible for everyone at a foundry - advantageous for firms with own process and modelling departments.

I'll also throw this into the mix. At highest voltage there also tends to be an increase in non-dynamic (leakage) power consumption.
BUT leakage consumption is a bigger proportion of overall IC power consumption when a chip is idling at low voltage.

ARM has no choice. When your Dual core A15 power (just for core) comes at 6W, how do you compete against Intel Atom, when they can match performance and have lower power? ARM is losing the power battle. ARM is no longer power performance efficient. Intel took the lead

Unfortunately, from the perspective of energy consumption, DF really doesn't help, even though it does lower the power (energy consumption per second). In fact, slowing down the frequency will only increase the total energy consumed for the same task as it increases the execution time (due to the extra energy consumed in the "supporting" blocks to keep the core running).

I would hesitate to claim that it's just DVFS. DFS works primarily on dynamic power, and does little for overall energy consumption, while DVS is where the real meat is, affecting both leakage and dynamic consumption strongly. At the same time, as voltage drops, frequency must also often be compensated to assure proper operation. Big.Little is different in that by having two entire cores, you can also use separate fabrication parameters for both.
Because the two cores are independent entirely, it's possible for the .Little processor to be fabricated with a high threshold voltage in mind for low leakage during the expected long on-times. The Big. processor can then be fabricated with a more aggressive process and less concern about leakage. Rather than designing for average case, you can design for expected case for both processors.

How about use of Digital power mngmt (Intel's term is IVR - for integrated voltage regulator - to be used in multicore Hasswell processor family) and fine tune each core....
How many PMU/PMICs is Samsung's Exynos Octa 5 using?

Oh come on...there ought to be a system-wide solution to this problem. Some types of code (simple logic, event handling etc.) can probably run on the baby processor.
Some larger code (iterating through large data structures, block data handling, signal processing etc.) can be done on the larger core.
These are 2 different types of programming...one needs to optimize latency while the other might need to optimize throughput...but due to the prevailing convention we use a single programming language and a single processor for both types of data. If arm wants to tackle this they should invent a new type of programming language or virtual machine or something, and assign threads to different processors. Then let the VM or OS decide which core to turn on, based on software demand.

Bunch of Bollocks! Peter gotta get his facts straights and realize it was not ARM who invented Big Little! It was really an ARM customer that started it, then ARM took the concept, enhanced and started gorilla marketing campaign.