Share This article

Late last night at CES 2014, Nvidia surprised everyone by revealing that its in-house 64-bit ARM core, Project Denver, would arrive this year in the Tegra K1 (Tegra 5/Logan), rather than next year in Tegra 6 (Parker). The Tegra K1 will ship in two flavors: First there’ll be a fairly standard K1 with a quad-core (4+1) Cortex-A15 CPU (just like Tegra 4), but then in the second half of 2014 there’ll be a K1 that features a dual-core 64-bit ARMv8 Denver CPU. Except the CPU, and larger caches on the 64-bit version, both variants of the K1 SoC seem to be identical — they both have the monstrous 192-core (1 SMX) Kepler GPU. While the GPU is exciting, and we’ll be discussing its ramifications in due course, it is Nvidia’s 64-bit Denver core that we’ll be looking at in this story.

ARMv8, 7-way superscalar, up to 2.5GHz

Nvidia first mentioned Project Denver way back in 2011, also at CES. At the time, Nvidia teased Denver as some kind of super core that would revolutionize PCs and servers — but curiously, not smartphones and tablets. It would now seem that mobile devices are back on the table, but it will very much depend on how much power the Denver cores suck down (more on that later). The Denver cores (and the rest of the SoC) are fabricated on TSMC’s 28nm HPM process and be clocked at up to 2.5GHz. It sounds like both cores will share 128KB of L1 instruction cache and 64KB of L1 data cache.

So far, so good. Much more interesting than clock speeds and caches, though, is Denver’s support for the 64-bit ARMv8 instruction set and an insanely wide “7-way superscalar” architecture. Superscalar, in computing terms, is a kind of CPU architecture that allows for instruction-level parallelism — that is, it can carry out multiple instructions in a single clock cycle. A simple superscalar processor might be capable of fetching and decoding two instructions per clock cycle. To do this, the processor needs to have multiple units that are capable of fetching/decoding/executing/etc simultaneously.

When Nvidia says that each Denver core is 7-way superscalar, it means that it has the hardware resources to perform seven instructions per clock cycle. Nvidia hasn’t said exactly what those hardware resources are (if it can decode seven instructions per cycle we’d be stunned), but it’s pretty clear at this point that Team Green has built an absolutely monstrous chip that should be capable of impressive performance. Maybe Nvidia’s claim that Denver is a “Super Core” isn’t just marketing fluff?

Such performance comes at a cost, though — both in terms of power consumption and die size. We don’t have an exact die size yet, but the Denver core is going to be huge. Considering the two Tegra K1 variants are going to be pin-compatible, and going by the slides published by Nvidia, a Denver core is 2x the size of a Cortex-A15 core — which itself is 3-4x larger than a Cortex-A9 core. Add 192 GPU cores, a memory controller, and all the other bits and pieces, and the Tegra K1 is going to be a very large chip. In terms of power consumption, seven-way instruction-level parallelism is going to be very costly.

Tegra K1 die shot (stylized). This is the Cortex-A15 version (4+1 cores), but it’s so pretty that we’re including it anyway.

Is Denver the reincarnation of Nvidia’s x86 efforts?

If all that wasn’t exciting enough, there’s an interesting theory — proposed by Charlie Demerjian and seconded by AnandTech — that Denver is actually a reincarnation of Nvidia’s plans to build an x86 CPU, which was ongoing in the mid-2000s but never made it to market. To get around x86 licensing issues, Nvidia’s chip would essentially use a software abstraction layer to catch incoming x86 machine code (from the operating system and your apps) and convert/morph it into instructions that can be understood by the underlying hardware. This isn’t an entirely new idea: Transmeta tried and failed at it with its Crusoe and Efficeon CPUs.

Transmeta’s Crusoe CPU. RIP.

In this case, of course, the abstraction layer would catch ARMv8 machine code, rather than x86. Furthermore, if you take that abstraction layer and insert a lot of scheduling and parallelism intelligence, you can correspondingly simplify the hardware, which reduces the die size and power consumption. The 7-way superscalar pipeline would also make more sense in such a setup.

If Denver really is a funky code-morphing/emulating CPU, then the 64-bit version of the Tegra K1 could be a very interesting chip indeed. Given the size of the die and the (expected) complexity of the Denver core, Nvidia will have to do something truly magical (such as a really efficient abstraction layer) to make it fit into a smartphone or tablet’s power envelope. In reality, as Nvidia hasn’t yet specified what market it will target with the Denver-powered Tegra K1, the company itself is probably still carrying out lots of testing and optimization to work out whether it’s a mobile chip or a server chip.

The Denver-based Tegra K1 is expected to hit the market in the second half of 2014. Let’s hope it’s as exciting as it sounds, and not just another high-performance power hog — it’s easy to build those.

Tagged In

Post a Comment

massau

i don’t really see how a 7 way dual core can be faster than just a quad core. ok i might rule in ST workloads and the gpu units can do the parallel work but an instruction parallelism of 7 is hard to find. They also seem to forget that gpu programming even whit openCL is hard to do (its really verbose so i’m going to try to make some wrappers).

Hi

The idea of making it Superscalar is that programmers don’t have to do any extra work, the chip can just run 7 instructions at a time, with dual core this means that there is 14 instructions rather than 4 with a quad core.

massau

you mean 12 instructions for the quad, but if there is a branch whiting those 7 instructions it will not work at all (branch prediction can help).
maybe it has SMT that might help, but i’m rather sure that 4 cores 3 way is more efficient than a 7 way processor. if it wasn’t than evryone would have used higher instruction parallel cores. also they should compaire the A15 to there denvers they should compare the A57 which is also 64bit.

Shaun Walsh

Depends on the quad core, simple quad cores can only handle 1 instruction set at a time

Maybe some older ARM multi cores where but i doubt it. if you know an example i would like to hear it.

Steve

Superscalar is almost the alternative to Intel’s hyper-threading. Nvidia tried doing some heavy work on it in the mid-2000s using x86 hardware but had some licencing issues.

massau

there is still a difference between superscalar , do multiple instructions in parallel like VLIW and hyper threading aka SMT witch is doing multiple threads on one core to fill in the pipeline bubbles created by something like branches.

the ARM and most advanced processors cores are already superscalar but don’t use SMT.

deltatux

Superscalar is NOT an alternative. All modern processor designs must require processors to be superscalar. It just simply means that the CPU can do more than one operation per clock cycle. The wider the superscalar is on a CPU, the more it can do in parallel. This doesn’t mean it can run more threads, but can perform more micro-ops per cycle.

Intel’s HyperThreading is just the fancy marketing name for their implementation of “Simultaneous MultiThreading” (SMT for short).

Dozerman

Please, please, please be true. I was hoping for awhile that someone would do this. We need more competition in the X86 market, and Nvidia has the resources to make it happen. If we can get something running along current Atom/Kabini levels, as unlikely as that is, it would be a monumental achievement. Here’s to hoping that Nvidia also put aside any malice against AMD and includes HSA support in the future, too.

Steve

I know right? I really hope it happens to be true too. I think it will turn out well over time, but I don’t think it will perfect until the next iteration. They probably suffered the efficiency of the abstract layer code for release time. They look like are probably trying to keep a lot of their investors satisfied for the next 8 months.

AaronM

Concerning the hoping it is not a power hog comment: NVIDIA has been leading the way in supercomputing by reducing power which is the current limiting factor. NVIDIA is also beating AMD in the GPU power game. I would expect that NVIDIA has developed the Denver Tegra K1 along the same lines to reduce power consumption.

Use of this site is governed by our Terms of Use and Privacy Policy. Copyright 1996-2015 Ziff Davis, LLC.PCMag Digital Group All Rights Reserved. ExtremeTech is a registered trademark of Ziff Davis, LLC. Reproduction in whole or in part in any form or medium without express written permission of Ziff Davis, LLC. is prohibited.