NVIDIA Tegra 3

NVIDIA showed us a Tegra 3 early silicon during Mobile World Congress 2011. Back then, it was codenamed “Kal-El” (the real name of Superman) and the demonstration we saw was hinting to cool products by “late 2011”. Fast forward nine month in the future, and Tegra 3 is embedded in the most anticipated Android tablet of 2011: the ASUS Transformer Prime, and the rumor mill already mentions an HTC Edge, powered by Tegra 3. Undoubtedly, you will hear about Tegra 3 a lot, so you must absolutely read this (and you may want to read about Systems on a chip too).

Built for performance

At its very core, Tegra 3 has been designed to bring many times the performance of Tegra 2, within a comparable or lower power envelope. We’ll get back to the energy consumption soon, but let’s start with the performance.

NVIDIA’s specifications mention that Tegra 3 has twice the CPU performance, and 3X the graphics performance of Tegra 2. To achieve this, four high-performance CPU cores have been integrated into the chip. They can work at different frequencies (up to 1.4GHz if a single core is active), and they should be able to shut down completely (thus draining little/no power) when not in use. Depending on the workload, one or all cores can be summoned, with the goal of getting the task done as soon as possible before returning into a state of (deep) sleep.

Parallel programming preferred

For developers, the biggest challenge is to find tasks that can be split into smaller chunk and send independently to multiple cores at once. Things like photo processing or scientific computations are natural candidate, but tasks that are more sequential in nature can be hard -if not impossible- to split.

Can Tegra 3 tablets run as fast as an Intel Core Duo PC? We'll have to see it for ourselves

PC-class CPU? In fact, NVIDIA compares the CPU speed of Tegra 3 with the Intel Core 2 Duo T720 (2GHz, 667Mhz bus). I believe that NVIDIA’s benchmarks show that Tegra 3 can crunch numbers as fast is the T720 in a synthetic test, but we have yet to see a mobile SoC power a computer-like setup with ease. Perceived computing user experience goes requires more than pure math computations and involves the whole system, including system data transport and storage.

Power efficiency

Of course when you hear “quad-core”, it is logical to worry about battery life. It’s been known for a while that virtually every chip maker has committed to a certain power envelope/budget because the batteries aren’t as fast as performance. Even better, NVIDIA says that Tegra 3 can use “up to” 61% less power than Tegra 2.

As mentioned above, the fact that cores can be shut down when they are not needed helps a lot. The frequency of each core can also be tweaked automatically to find the best ratio between power and performance. It’s great, but there’s something even more radical in Tegra 3’s design…

A 5th “companion” core

The five ARM A9 cores (yellow, center)

To give you some context, cores that are optimized for absolute performance tend to be less power-efficient in a low-intensity workload environment than a core optimized for absolute low-power. On the other hand, cores optimized for low-power don’t perform as well when computing demand is high. The problem is: we want high performance AND low-energy usage.

This graph provided by NVIDIA explain the power vs. performance conundrum

To solve this problem, NVIDIA has decided to include a 5th “companion core”. This is the true secret to Tegra 3’s power efficiency theory. It is optimized for ultra-low power and that’s the one taking care of all the “boring” (but important!) tasks like keeping the OS running, check on emails etc… In fact, this is the core that will be online the most often – simply because your phone spends most of its time… in your pocket/purse. Also, you may think of HD video playback as a demanding task, but it isn’t anymore: the companion core and a special video decode unit can handle that without waking up the faster cores.

Because this companion core is optimized for low-power, NVIDIA doesn’t want it to handle heavy workloads, or it would start consuming too much. To do so, its frequency has been set with a range of 0 to 0.5GHz. Whenever the companion core is overwhelmed by work, one or several high-performance cores wake up and pick up the work. This is NVIDIA’s definition and implementation of Variable Symmetric Multiprocessing (vSMP), which it has patented.

Automatic Core Switching

This graph shows the different combinations of ON/OFF cores

In its paper, NVIDIA says that the operating system (Android 3.0, aka Honeycomb) assumes that all CPU cores in the chip are identical instances, which is not true in this case. Therefore special management had to be devised at the hardware level, and the software level to make this heterogeneous group of cores completely transparent to the OS.

Cores are switched ON and OFF depending on a real-time analysis of the workload as the diagram above shows. The only “limitation” seems to be that “companion Core” cannot be activated when Core 1-4 are. NVIDIA says that not allowing the companion core and the high-performance cores to run at the same time simplifies the cache memory management and avoid performance penalties that would have hindered the high-performance cores.

Making this transparent to the OS is very important for many reasons, but for end-users, it means that OS updates don’t have to wait for NVIDIA to tweak some code.

And the best part in all of this is that Android apps don’t need to be modified. Everything is automatic, and apps can run “as is”.

vSMP Power benefits

This graph provided by NVIDIA shows power consumption relative to Tegra 2

Logically, Tegra 3 shows power benefits even when compared to the current generation Tegra 2 processor. According to NVIDIA, that is true during sleep state (LPO), media playback and even gaming. The graph above shows the power savings that NVIDIA has seen during its own tests. Remember that this shows only the power saved at the chip level, not at the system (including display) level.

NVIDIA also provides perf/Watt comparisons with other high-profile chips that are on the market such as the OMAP4 and the Qualcomm QC8660. Note that NVIDIA is using Coremark, a well-known benchmark that is very multi-core friendly (performance is more or less expected to scale with the number of cores). A quad-core Tegra 3 chip won’t have any difficulties winning the absolute score, but I find it very interesting to see that at comparable performance, Tegra 3 can consume only 1/3 of the electric power.

Memory bandwidth: it’s never enough

As you have seen, a system on a chip (SoC) can be incredibly complex, and with so many components working on so many things at the same time, it is easy to hit yet another barrier: bandwidth. Computing performance requires a lot of data, and it doesn’t matter how fast the CPU cores are if they have to wait for packets of data to process. Hence, the real question is: “how fast can you move it”?

To accommodate for higher bandwidth, Tegra 3 can use the super-fast DDR3L-1500, or the older LPDDR2-1066. The frequency is a bit higher, but all in all, it’s not *that* different from Tegra 2. Given the increase in CPU cores, and pixel-processors, I am a bit concerned that the bandwidth may become a limiting factor at some point…

3X the graphics performance

The graphics unit of Tegra 3 has been created using the same building blocks than Tegra 2. However, it has received 50% more pixel computing units and run at a “much higher frequency”, says NVIDIA which keeps the actual number under wraps, for now.

With the extra pixel processing power, full-screen effects like this Motion Blur are possible

In ShadowGun for Tegra 3, the water can use a lot more geometry and physics to feature ripples

Automatic stereo 3D: it is clear that most of today’s games aren’t using the full potential of the hardware, so NVIDIA has introduced an OpenGL 3D driver that can convert any app to stereo 3D. This is something that is branded as NVIDIA 3D Vision in the PC world. It’s interesting to see how years ago, a few driver engineers at NVIDIA started what would become NVIDIA 3D Vision.

WebGL: WebGL may not be the sexiest use of OpenGL, but the web is a killer app, and companies like Google will do everything they can to move us into a browser, so if you may be interested to know that WebGL is hardware accelerated with Tegra 3;

40Mbps video decode: This is actually outside of the GPU, but I’ll mention it while we’re talking about pixels. Tegra 2 was limited to a 5Mbps bitrate during 1080p Mpeg4 video playback. Tegra 3 can handle 40Mbps Blu-Ray streams, and we’ve been told that 60Mbps is the actual limit (probably for variable bitrates). This means that Tegra 3 can now be integrated into set top boxes (I think that Boxee didn’t end up using Tegra 2 because of this).

2X performance boost for the camera processor: this is also outside of the GPU, but NVIDIA has told us that the image signal processor used for the camera is now twice as fast. This may have many implications, but off the top of my head, I can think of continuous auto-focus, burst shots, better panoramas.

Software has matured

In any hardware endeavor, the software is often a critical aspect of the project. First of all, it communicates with the OS and applications, and software can eve “fix” minor chip design issues by using workarounds. NVIDIA has probably about 2X more software engineer than hardware engineers.

Because Tegra 3 has been in NVIDIA’s labs for so long, and because the underlying shader architecture is similar to Tegra 2, software engineers have had enough time to write a mature graphics driver. We will have to see how much further they can push it, but I suspect that sometime in Q1 2012, a small group will start branching out to “Wayne”, NVIDIA’s next Tegra chip.

The NVIDIA Glowball 2 demo uses per-pixel shading profusely

Games: at launch time, NVIDIA expect to have 15 games to be optimized for Tegra 3 (it usually means adding special features), and while some games won’t be exclusive to Tegra 3, they will be released on Tegra hardware first – most likely because they have been developed and tested with Tegra hardware.

Support for controllers: beyond the performance, the Tegra 3 software can also handle existing controllers (PS3, Xbox, Wii, Logitech etc…). This is a great way to play with a real controller, and this makes it possible to use the tablet as a small console connected to the TV. Some work is required by game developers, but games featured in TegraZone should be compatible.

10-foot interface: NVIDIA has also added a way to control a user interface optimized for big screen, with the controllers. I have not seen a demo yet, but I’m curious to see what this is going to look like.

Android 4.0: I’ll keep the best for the end: when asked about Android Ice Cream Sandwich (4.0) support, NVIDIA said that we should see such devices “very quickly”, which means that ICS support seems under control. NVIDIA typically cannot show or announce anything without the consent of its customers and partners.

Conclusion

NVIDIA was serious when it showed its multi-year roadmap earlier in 2011, and so far, they have executed on it. Architecturally, Tegra 3 is very interesting in the sense that it tries to address both extreme performance, and extreme low-energy. It will make a remarkable entrance on the market by shipping in the Asus Transformer Prime, which is the best (and probably the last) Android tablet of the year. Now, Tegra 3 needs to pass the real-world test of independent reviews and benchmarks. We can’t wait to see some real-world results.