Multicore Madness

By Mark LaPedus
Smartphones and tablets are migrating towards new and faster application processors, basebands, graphics chips and memories.

In the cell-phone chipset area alone, there are a multitude of options and design considerations. Some devices combine the application processor and modem on the same chip. Some are separate devices. In addition, the architectures range from single- to eight-core devices.

On top of that, devices eventually will migrate from planar to finFET transistors. And not to be outdone, there are a couple of major technology platforms to choose from: bulk CMOS and fully-depleted silicon-on-insulator (FD-SOI). Startup SuVolta also has garnered some attention with its dual-gate 2D transistor, but so far it remains an underdog in a highly competitive market.

Needless to say, OEMs face some tough choices and challenges. The prevailing wisdom is that next-generation smartphones and tablets will require more cores and new transistor architectures. After all, consumers want more performance, bandwidth and battery life.

In reality, however, there is no one-size-fits-all technology. As the market continues to splinter into various sub-segments, such as entry-level phones, smartphones, superphones and tablets, there is room for several different architectures and technologies.

Still, OEMs must rethink their design and product choices before jumping on a device with the most cores and the usual processes. In fact, there is a common and oversimplified message that more processor cores translate into a better performance in mobile systems, said Marco Cornero, a fellow and head of advanced computing at ST-Ericsson, a cell-phone chipmaker.

The reality is that mobile designs are much more complex. There are several factors at play in determining the efficiency of multicore systems, such as software, chip frequency, area and power, Cornero said. And generally, quad-core and beyond may be overkill for today’s systems, thereby creating unnecessary costs for OEMs and consumers. “It makes sense to add more cores if the software can utilize them,” he said. “The problem is the software can’t utilize them.”

Going against the grain, he contends that the optimum solution for mobile platforms involves two main technologies: dual-core and FD-SOI. Dual-core architectures provide enough horsepower, and are more efficient, than today’s slower quad-core devices for mobiles. And bulk CMOS is rapidly running out of gas, propelling the need for a new technology like FD-SOI, he added.

Still, there are various tradeoffs between bulk and SOI, not to mention the implications in moving from planar devices to finFETs. “FinFETs move the line quite nicely in terms of power efficiency,” said John Goodacre, director of program management within ARM’s CPU group. “With finFETs, unfortunately, we lose the dynamic range.”

Multicore: Fact vs. fiction
In any case, there are some parallels between the development of processors for PCs and mobile systems. The prevailing wisdom changed almost overnight in 2001, when Pat Gelsinger, then Intel’s chief technology officer, proclaimed that if chip scaling continued on its current pace, then processors could reach the power density of the sun’s surface by 2015.

At the time, IBM, Intel, AMD and others were racing each other to develop faster microprocessors by boosting the clock frequencies. In those days, Intel claimed that its Pentium 4 processor would scale up to 10 GHz, but in reality, heat dissipation issues limited the clock speeds to only 3.8 GHz.

Then, a decade or so ago, Intel and others moved away from the “megahertz race,” focusing instead on multicore designs. That put far greater emphasis on core efficiency and power consumption, but multicore also caused a fundamental disruption in software on the PC. Applications had to be written in a concurrent and parallelized fashion to map the programs efficiently on multiple processors. Even today, parallelism remains a challenge in system environments.

To some degree, history is repeating itself on the mobile processor front, at least according to ST-Ericsson. To prove its case, the cell-phone chipmaker examined the performance of Apple’s iPhone 4S and 5. The 4S is based on Apple’s A5 application processor, which includes dual-core, 800-MHz Cortex A9 chips from ARM. The iPhone 5 is based on Apple’s A6, which has two custom 1.3-GHz cores based on ARM’s technology.

The software performance of the iPhone 4S and 5 were benchmarked using Browsermark, Geekbench and Sunspider. The benchmarks were conducted by AnandTech, a hardware review Web site.

Deriving the data from AnandTech’s benchmarks, ST-Ericsson drew two conclusions. First, the dual-core processors in the iPhones ran below their theoretical peak performances. Not one of the processors in the iPhone showed signs of being “saturated” in terms of CPU efficiency and frequency, according to ST-Ericsson.

Second, the iPhone 5 ran faster than the 4S. This has little to do with dual-core chips, but rather the faster speeds are attributed to “other hardware optimizations, such as an improved memory sub-system,” according to ST-Ericsson.

Like the PC, the problem is that software scales less proportionally in multicore mobile designs, according to the firm. The dual-core design in the iPhone also impacts clock frequency, due to conflicts on the shared resources in the system.

Now, the market is turning up the volume about quad-core. “There has been a lot of marketing about quad-core,” said ST-Ericsson’s Cornero, “but quad-core doesn’t bring a lot of benefits.”

In Web browsing, for example, a system can run 30% faster when moving from single- to dual-processor designs. But a system only shows a 0 to 11% improvement when moving from the dual- to quad-processor designs, according to the company.

Second opinion
Regarding the multicore debate, ARM has a different viewpoint. “With multicore, I spread the work over two cores,” ARM’s Goodacre said. “In aggregate, the number of instructions are about the same or potentially even less. Then, I can start playing a power game with the voltage. You can lower the frequency with the associated cores for that given workload.”

The real question is whether the software can take advantage of multicore devices execute in parallel. “We’ve got applications that are single-threaded. They run perfectly well on a single thread,” he said. “The question is why do I need multiple cores if the existing workloads can work on one? What we’re really seeing today is a lot of individual sub-systems. This is where we can leverage dual- and quad-core. What that means is that everything runs faster and smoother.”

ARM refers to this as explicit concurrency. Multicore designs are likely required in two booming segments—gaming and social media. “In the user interface, scrolling up and down on Facebook can resource four CPUs flat out. Going back and forth on applications in Android is another one,” he said. “When we look at LTE bandwidths, those management threads are fairly significant in terms of IP traffic. So, you might have three dominant threads, plus management. So, four for the design phase is reasonable. One keeps the management in place. Another puts the network tasks in place. The user interface and the applications each require one.”

There are also tradeoffs on the process and transistor fronts. “Bulk, through voltage and frequency scaling, can get your power/performance ratio down plus or minus 50%,” Goodacre said. “FD-SOI has a specific body-bias technique, which allows it to stretch the voltage down even further while still delivering performance.”

When asked about the shift towards finFETs, he said: “The interesting thing with finFETs is that the steepness of the curve is greatly reduced. It’s much flatter. I don’t have much dynamic range in terms of power efficiency,” he said.

In response to the dynamic range issues, ARM proposes the move towards its heterogeneous architecture, dubbed big.LITTLE. In January, Samsung rolled out an eight-core application processor based on the big.LITTLE architecture. The Exynos 5 Octa from Samsung consists of four Cortex-A15s to handle processing-intense tasks, while four Cortex-A7s are used for lighter workloads.

“Why would I go to eight cores? Some threads would be much better off using a smaller processor. That would represent our ‘LITTLE’ core. In effect, we have about four to six times power efficiency difference between our ‘big’ and ‘LITTLE’ cores,” Goodacre said.

The question is whether big.LITTLE is best served by bulk or FD-SOI? “Does FD-SOI stretch the voltage further with big.LITTLE? This is yet to be confirmed until we measure it,” he said.

In any case, today’s dual-core architectures, combined with FD-SOI, are potentially a powerful combination. ST-Ericsson itself has rolled out an integrated, dual-core cell-phone chipset based on 28nm FD-SOI. The FD-SOI part is 30% faster than bulk devices, said Joel Hartmann, executive vice president of front-end manufacturing and process R&D at STMicroelectronics. “We have demonstrated a 50% power reduction,” he said.

One company has put a new twist in the FD-SOI debate. “FD-SOI is a simpler technology,” said Asen Asenov, chief executive of Gold Standard Simulations, a provider of simulation services. “There are fewer process steps than bulk.”

At 32nm/28nm, the statistical variability introduced by the random discrete dopants in FD-SOI is lower than bulk, Asenov said. With FD-SOI, the threshold voltage variation is reduced more than six times and the leakage is reduced five times for almost equivalent drive current, he said.

STMicroelectronics’ FD-SOI technology is based on a gate-first technology. With gate-first FD-SOI, chipmakers could reduce the voltage to below 0.7V. However, Gold Standard’s simulations revealed that gate-last FD-SOI at 28nm would enable a supply voltage below 0.5V.

“If you take metal-gate-first FD-SOI, and compare with 28nm bulk, FD-SOI will still have a lower variability. If you develop metal-gate-last, this will bring additional benefits,” Asenov said. “FD-SOI in general has a very good chance to deliver a low-power extension for 28nm. A lot of the infrastructure has been put in place. There is enough evidence to make the big fabless companies think very seriously about moving to FD-SOI.”