NVIDIA details Variable SMP, the brain of quad core mobile computing

The folks at NVIDIA are coming out with a quad core processor for mobile devices this year, and they're making no jokes about it happening sooner than later. We've just seen a Windows 8 tablet said to be running on the SoC already, and we're pretty much betting the farm on there being an Android tablet and/or smartphone with the new CPU before the end of 2011. Today NVIDIA takes us on a short tour through vSMP or Variable Symmetric Multiprocessing, the technology which makes Kal-El work as well as it does. With the details of this tech comes a bombshell: Project Kal-El will have a fifth CPU core, called the "Companion" core, which will handle low frequency tasks in the background.

Invisible Companion

Each of the five, count them, five, not four, five cores are identical ARM Cortex A9 CPUs, each of them enabled and disables via aggressive power gating based on work load (aka when they're needed, they're fired up.) Unlike current Asynchronous SMP architectures, the Companion core is OS transparent. This means that the first four cores are not aware of the fifth, but automatically take advantage of it - this saving both software efforts and new coding requirements as it flows.

Active Standby mode

While you are walking to the bus, crossing a street, driving, or eating cereal in the morning, one must assume that you're not on your smartphone. While you are not on your smartphone, processes are still running - this is what's called "Active Standby" in this report by NVIDIA. While current processors all use one or more of their main cores to do tasks that are running at this time, tasks such as syncing and searching for Wi-fi networks, Kal-El uses its Companion core to work with lower frequencies and save power in a giant way.

Power Consumption

One of the most important points to a consumer in a smartphone or tablet is how long it'll last without needing to get charged up. For this answer you must dig deep into the silicon, the powers consumption of a silicon device being thus: the sum of leakage power and dynamic power. Leakage power is determined by the silicon process technology used while dynamic power is determined by by silicon process technology and by operating voltages and frequencies.

To find the dynamic power piece of this equation, one must know that it is proportional to the operating frequency and also proportional to the square of the operating voltage. So you see:

Total Power = Leakage Power + Dynamic Power
andDynamic Power α Frequency x Voltage^2

At play, a device's power consumption is based mostly on dynamic power consumption, while at rest, power consumption is mostly based on leakage power.

So the difference, again, between the four main cores and the Companion core on Project Kal-El is the power process technology on which it was built: the fifth CPU was built on a low power process technology while the other four were built on a high power process technology. This results in two different monsters:

NVIDIA’s Project Kal-El is the world’s first mobile SoC device to implement a patented Variable Symmetric Multiprocessing (vSMP) technology that not only minimizes active standby state power consumption, but also delivers on-demand maximum quad core performance. In addition to four main Cortex A9 high-performance CPU cores, Kal-El has a fifth low power, low leakage Cortex A9 CPU core called the ‘Companion’ CPU core that is optimized to minimize active standby state power consumption, and handle less demanding processing tasks.

Two more technologies used in NVIDIA's project Kal-El to intelligently manage workload distribution between the five cores are DVFS and their CPU Hot-Plug management software. These technologies work based on application and operating system requirements, and do not require any sort of special mods to the operating system to work.

Companion Core

The Companion core has the same internal architecture as the rest of the Cortex A9 CPU cores but is designed on a low power process technology which then caps out the core at 500 MHz. This core also delivers higher performance per watt than than the rest of the cores at operating frequencies below 500 MHz. Different cores for different chores.

Activating Cores

As the main four CPU cores are made to scale up to high operating frequencies at lower operating voltage ranges, they're therefor able to deliver high performance without major dynamic power consumption. As higher performance is needed, more power is needed - and as this trade-off increases as a fast rate with the Companion core, it increases relatively gradually with the Main Cores. You can see this situation taking place in the chart below. Once you need a certain amount of performance, your Companion core is shut off and your Main Cores are turned on, they being the main guns until Max quad core performance is reached.

A general core activation chart has been whipped up here by NVIDIA to show how the CPU Governor and CPU management logic works in general with basic examples. As more demanding apps and processes are activated, more cores are turned on. Simple stuff, really.

Architectural Advantage

Cacheing information is made easy by the fact that the Companion core and the main cores are never active at the same time, this allowing the system to use the same L2 cache for all of it. The cache is programmed to return data to both the Companion and the main cores in the same number of nanoseconds, though more main core cycles end up taking place than Companion core cycles because the main cores run at a higher frequency.

vSMP technology maintains synchronous operating frequency on all active cores so that CPU management logic has a completely seamless transition not visible by the user in the least. CPU cores running at asynchronous frequencies run the risk of slow transitions and other OS scheduling inefficiencies - don't get caught in the bind!

In contrast with the whoever-needs-in model vSMP uses, other processor solutions for power consumption include only a single voltage rail used for all cores, this then requiring each core to run at the voltage required by the fastest core. vSMP does away with this by separating each core. Don't need the power? Don't use it!

NVIDIA has implemented advanced circuits and logic as well to enable high speed switching efficiency. The NVIDIA team reports that switching time, including time to switch cores within the chip and the time to completely stabilize voltage rails (for the activated core or cores) is less than 2 milliseconds - this amount of time of course impossible to recognize by the average human.

Power Advantage

The vSMP technology included in NVIDIA Project Kal-El enables and disables CPU cores on the fly depending on what's needed to attain user goals. NVIDIA has provided a chart of "power savings" as calculated between their original NVIDIA Tegra 2 dual-core processor and the brand new quad-core NVIDIA Project Kal-El. I thin you'll see the superiority instantly. NVIDIA adds: "Power measured as a sum of application processor power and DRAM power after normalizing for other system variables. LP0 is the lowest power state of the respective Tegra devices."

NVIDIA reminds us that what makes NVIDIA Project Kal-El dual-core processors so great is that even though there are more cores than ever, there's also less power consumed than ever. Both Project Kal-El (twice, once with different core clocking) and a couple of competitor cores are compared. NVIDIA takes contestant number one and body slams it then takes contestant two and suplexes it. Destruction!

You'll also notice some Coremark scores in there. These are also shown in the two charts below, power and performance compared and scored galore.