Pascal technology will have as many as 17 billion transistors under the bonnet, Fuzilla can exclusively reveal.

Pascal is the successor to the Maxwell Titan X GM200 and we have been tipped off by some reliable sources that it will have more than a double the number of transisters. The huge increase comes from Pascal's 16 nm FinFET process and its transistor size is close to two times smaller.

Nvidia and AMD are making their GPUs at TSMC and the Taiwanese foundry has announced 16nm FinFET production runs. Intel and Samsung/ GlobalFoundries call their process 14nm. Our sources told us the branding depends which side of the transistor you look at, the longer or shorter. The size of the gate is almost identical for both 16nm and 14nm process.

Pascal has 17 billion transistors and it will be significantly smaller silicon than the Maxwell 28nm based GM200.

Nvidia will use second generation HBM for its Pascal GPU to get to a 32GB on the highest end card, This is 2.7 times more than the already impressive 12GB used on Titan X. The second generation HBM or HBM 2.0 will enable 8Gb per DRAM die, 2Gbps speed per pin and 256 GB per second Bandwidth/ stack.

The first generation offers 2Gb Density per DRAM die, 1Gbps speed per pin, 128 GB/s GB per second Bandwidth and maximum of 4 Hi stack chips with 4GB per HBM card. You saw this with Fiji cards.

HBM2 enables cards with 4 HBM 2.0 cards with 4GB per chip, or four HBM 2.0 cards with 8GB per chips results with 16GB and 32GB respectively. Pascal has power to do both, depending on the SKU.

The GPU looks great but it is coming in 2016, and not before. After Pascal comes the Volta GPU, but that will take a few years.

Nvidia has reportedly successfully taped out Pascal, its next generation GPU based on a new FinFET node, with new memory to boot.

Pascal, which is apparently codenamed GP100, is Nvidia’s first FinFET design and will be manufactured on TSMC’s 16nm node, although we are not entirely sure which one. 3DCenter.org reports that the design has taped out, but details are sketchy.

It is also the first Nvidia GPU to use second-generation High Bandwidth Memory (HBM2). Nvidia is skipping HBM1, which will be used on AMD’s upcoming Fiji GPU. However, AMD’s decision to proceed with an HBM1 design will give it a head start, since Fiji is about to launch in a week or so, while Pascal is still a couple of quarters away.

The biggest downside to HBM1 is the limited amount of memory that can be integrated in the design, which is why AMD’s flagship Fiji VR card is expected to ship with two GPUs and a total of 8GB of HBM1, 4GB per GPU.

Nvidia’s Pascal on the other hand could address as much as 32GB, although it’s simply not necessary. It might be a possibility on some professional graphics cards, or compute cards, but it would be overkill for gamers.

Greenland is the successor of Fiji, and we don’t think it will be radically new core. The main goal for Greenland is to bring more performance per watt to AMD GPUs in 2016.

AMD's highly anticipated Fiji GPU is a 28nm design, and so are the GM200 based GPUs that ended up in Nvidia's latest Titan X cards. The Geforce GTX 980 based on GM204 GPU is also manufactured in 28nm manufacturing node at TSMC. Back in November, we said there was simply no place for 20nm GPUs in 2015. The yields are horrible and this is one of the reasons why neither Nvidia nor AMD went for TSMC's 20nm node.

The future is at least a bit brighter, as the successor to AMD Fiji, codenamed Greenland, will focus on lower power. AMD hopes to win the hearts of notebooks manufacturers and decrease the overall power consumption of its desktop GPUs too. It worked well for Nvidia'a Maxwell and AMD hopes it can get the same effect too. We hear that the second generation High Bandwidth Memory (HBM2) is also part of the Greenland spec and as we have mentioned before HBM2 memory doubles the bandwidth and doubles the maximum memory size for the future cards.

14nm, HBM 2, Pascal, Greenland, interposer

Our early information is that Greenland will be made in GlobalFoundries / Samsung's 14nm manufacturing process and it is not clear if AMD will stay loyal to TSMC for its GPUs and use its 16nm manufacturing process. Korea Times thinks that Nvidia might go after Samsung's 14nm instead of 16nm TSMC manufacturing process for its next generation GPUs, but we are not convinced.

Nvidia's Pascal is also a lower power HBM part that is scheduled to come in 2016, and one can only speculate whether this is a TSMC 16nm GPU or 14nm GlobalFoundries / Samsung one. The bottom line is that after many years of 28nm products, GPU industry will finally move to a smaller manufacturing node and if you think about it you can put four times the transistors on 14nm on the same die size. GM200 has 8 billion transistors on its 28nm at 601 square millimeter die size and with the rough math; you would be able to place 32 billion transistors in 14nm on the same space. However, it's not that simple and we won't see such a linear improvement, since we are dealing with a non-planar node.

With Fiji, and Greenland later in 2016, we expect AMD to become more competitive and regain some of its lost GPU market share, but Nvidia won't simply stand by and hand over the market. Nvidia will fight Fiji with a faster GM200 based Geforce GTX 980 TI card, performance driver, and later with Pascal High Bandwidth Memory supporting graphics card too. It will be a fun second part of 2015 and 2016 is bound to be a very eventful year for the GPU market.

We had a chance to see the Hynix HBM memory of the future, and even see a next generation HBM2 wafer with many of these next-generation dies.

High Bandwidth Memory (HBM) is what will be used in the GPUs of the future, and at this point the concept hardly needs an introduction. The first generation of HBM will be used in AMD Fiji, a GPU that is expected to come in the next few months.

AMD uses HBM1, Nvidia HBM2

Nvidia will use 2nd generation HBM that will enable its Pascal GPU to get to a whopping 32GB on the highest end card, 2.7 times more than the already impressive 12GB used on Titan X cards.

The 4-HI HBM1 has a 1024-bit interface, can handle two prefetch operations per IO and has a maximum bandwidth of 128GB per second. The tRC is 48nm, with tCCD of 2ns (1tCK), and the VDD voltage of 1.2V. For example GDDR5 has 1.35 to 1.5V and a top bandwidth of 28GB/s throughput per chip.

In addition, 4-Hi HBM1 16Gb (2GB per chip) and 8-Hi HBM1 32Gb (8GB per chip) is possible and HBM2 will get double the bandwidth and density. With first generation HBM memory it is at least theoretically possible to design a card with 8GB to 16GB of memory, assuming that the company would be using 4 HBM chips on an interposer, resulting in 512GB/s bandwidth on a four-chip HBM card.

Boosting density and capacity

HBM uses Trough Silicon VIA (TSV) technology, as it is stacking the memory cores on top of each other on the same base die. It is amazing that SK Hynix manages to make these layered interconnections trough the silicon. It’s one thing to read research papers about it, but seeing the finished product is something else. As you can see, HBM chips will be significantly smaller than GDDR5 and DDR3 chips.

The HBM1 acts as a stack of eight 2Gb chips, resulting in 16Gb (2GB) per chip, while the second generation HBM doubles the density to 32Gb (4GB) per chip, with 4-Hi HBM2 modules or even 64Gb 8GB with 8-Hi HBM2 memory. This is how Nvidia will be able to get 32GB with four chips on Pascal.

So if you do the math you can get 512GB with four HBM1 chips and 1024GB/s with four HBM2 second-generation chips, which means AMD might have a second generation Fiji with HBM2 and Nvidia will use the same technology on Pascal, sometime in 2016.

In any case, HBM represents the biggest change on the memory front in years. Coupled with new FinFET nodes, 2016 promises to be a very eventful year for the GPU industry.

Nvidia’s senior vice president of GPU engineering, Jonah Alben, didn’t want to comment on the manufacturing process, or if the chip has already taped out. He was clear that the Pascal uses 2.5D HBM memory, which you can tell from the Pascal renders that we saw at GTC in March 2014 and again just hours ago.

He didn’t want to comment if the Volta card with new architecture will use the real 3D memory, where the memory chips are stacked on top of the GPU. Volta according to latest Nvidia roadmaps can be expected around 2018.

John did mention that you can expect Pascal to use a “future node” and that it is too early to talk about the readiness of the chip.

This time CEO Jen-Hsun Huang didn’t hold up a mock-up board, Nvidia tells us that this time it is all about the Titan X, as this is the new GPU that is setting the benchmark, and probably getting the performance crown once again.

Nvidia didn’t comment on the 2016 timing for Pascal, but we have learned that this is the timeframe we can expect the first Pascal products, so let’s say roughly a year from now, or a month earlier at best.

Pascal is Nvidia’s next generation architecture and it is coming after Maxwell of course. The company says it will launch next year, but details are still sketchy.

According Nvidia CEO Jen Hsun Huang, it is coming with Mixed Precision and this is the new architecture that will succeed Maxwell. Nvidia claims that the new GPU core has its own architectural benefits.

3D memory or High Bandwidth Memory (HBM), is a big thing and Jen Hsun Huang claims 32GB is possible with the new architecture, compared to 12GB on the new Maxwell-based Titan X. This is a staggering increase from the current standard of 4GB per card, to 12GB with Titan, and probably up to 32GB with Pascal. NV Link should enable a very fast interconnect that has 5 times the performance of PCI Express, which we all use right now. More memory and more bandwidth are obviously needed for 4K/UHD gaming.

Huang also shared some very rough estimates, including Convolution Compute performance, will be four times faster with FP16 precision in mixed precision mode. The 3D memory offers a six-fold increase in GPU to memory bandwidth.

Convolution and bandwidth at the front, and bandwidth to convolution at the back of the GPU, should get be five times faster than on Maxwell cards. It is complex fuzzy logic that is hard to explain with so few details shared by Nvidia about the Pascal architecture.

The width update interconnect with NV Link should get you a twofold performance increase and when you when you multiply these two numbers, Nvidia ends up with a comes to 10x compute performance increase compared to Maxwell, at least in what Nvidia CEO calls the “CEO bench”.

He warned the audience that this is a very rough estimate. This 10X number mainly targets deep learning, as it will be able to teach the deep learning network ten times faster. This doesn’t meant that the GPU offers 10 times the GPU performance for gaming compared to Maxwell, not even close, we predict.

Volta made it back to the roadmap and currently it looks like the new architecture will be introduced around 2018, or about three years from now.

In early January we heard a thing or two about an AMD card codenamed Fiji, the one that comes with High Bandwidth Memory (HBM).

The card is expected in the second part of 2015, and it will beat Nvidia Pascal to market. Nvidia is also using High Bandwidth Memory (HBM) for its 2016, 16nm GPU, but the company is using another technique, quite different compared to AMD's Fiji. Nvidia also announced Pascal Unified memory with 3D memory, an NVLink GPU in 2016 pointing to a Stacked DRAM powered Volta, the successor of Maxwell. Pascal is next year, as Nvidia needs to have a 16nm TSMC node operational and ready before it goes after this next big thing.

AMD is using what is calls a 2.5D-IC silicon interposer, which means that there will be two separate chips on the same silicon interposer and package substrate. Fiji in 28nm will be one of these chips, and the second batch of chips will be the High Bandwidth Memory (HBM) memory designs. However, there is a catch with AMD's approach.

From what we've learned, Fiji is limited to 4GB memory. With the current memory technology the GPU would simply be too big to put on an interposer and package. The interposer should be viewed as a stack of conductors that lets the GPU and HBM memory communicate at much higher speeds than ever before. The interposer then gets into the package that goes on PCB. You could say the interposer is the middle-man that makes things faster.

Hynix has HBM memory with 1024-bit wide interface and 1Gb/s per pin data-rate. This results with 128GB/s bandwidth per memory chip. In case of four 8 Gb chips (1GB) with a 1000MHz core clock you can end up with total bandwidth of 512GB/s. There are indications that HBM memory on Fiji might work at 1.25GHz, which would result in 640 GB/s. The Geforce GTX 980 has 224GB/s bandwidth, while Geforce Titan Back has 336 GB/s.

SK Hynix has listed that 1GB, 128GB/s chips with 1.0Gbps speed packaged in 5mKGSD are available now, January of 2015. These HBM chips feature 4Hi stack VDD/VDDQ 1.2V. The old GDDR5 needs 1.5V to work, meaning that HBM is not only faster, it is more power efficient as well. Fiji could end up having more than twice the bandwidth or Nvidia's current and upcoming Titan cards.

Nvidia on the other hand is using what is called Vertical stacking 3D, or on-package stacked DRAM for its Pascal 2016 GPUs. Nvidia gives a straightforward explanation of the meaning of 3D memory on Pascal: "3D memory: Stacks DRAM chips into dense modules with wide interfaces, and brings them inside the same package as the GPU."

The clear benefits are a massive increase in bandwidth and quadruple energy efficiency. Nvidia is waiting for 16nm to make such a chip possible, and 3D memory is better approach than the interposer.

Back to the AMD Fiji approach with 2.5 stacking and interposer. Our sources claim that you cannot put more than four memory chips on the interposer, meaning that Fiji is limited to 4GB of memory. Eight chips next to a massive GPU would result in massively big chip (remember the GPU and memory chips are on the same board - interposer) and then put on the package. This is why we don’t think 8GB HBM Fiji will happen with this generation, but with time Hynix will come with more dense HBM memory chips making 8GB 2.5 D cards possible.

The only thing that comes to mind is that AMD could be using 4GB HBM memory on the interposer and then put some additional GDDR5 chips on the PCB. Think of it as L2 and L3 cache with some older CPUs. Level two cache would be really faster, while Level tree cache would be slower, but would be able to get some important things done.

We don’t think that this would happen as GDDR5 memory would be significantly slower than the HBM part, making the similar case than with Geforce GTX 970 memory. Fiji has a good chance to beat whatever Nvidia comes up with in 2015, but it will probably have a hard time fighting against Pascal. Pascal is coming in 2016 and one can only hope that it will happen in early 2016.

We believe Fiji and Pascal will enable single GPU 4K gaming at acceptable frame rates and prices.

We managed to meet up with Sumit Gupta, the General manager of Nvidia's Tesla business and we managed to learn a few more details about the future products such as Pascal. In addition to novel features like 3D stacked RAM, we learned a bit more about the new form factor called Mezzanine. In case you are not familiar with this connector, you can see it here.

The next generation Pascal computational Tesla-market card will come in two formats, one PCIe 3.0 and second in Mezzanine form factor that is flat, more stable and significantly smaller than the PCIe 3.0 graphics card form factor. This was the card that Jen-Hsun Huang chose at the keynote and was so happy about the fact that it is one third of a PCIe card size.

The new Pascal card will be used in servers and we are not sure if it will make its way to Pascal gaming and workstation hardware, at least not in Mezzanine form. The new format becomes interesting as with the bottom connector you can feed the board with up to 300W power. This means that there won’t be any necessity for external power connectors even for the highest end computer modules, graphics cards, Tesla cards, Quadro cards. This sounds rather interesting and innovative.

We could not find out if the Mezzanine will make it to the desktop graphics card market but we can certainly see a potential use for such a format in small form factor machines.

All the memory is on the package of the card surrounded by some power elements but 3D stacked memory and Pascal GPU are in one packaging, and this is the reason why bandwidth will be up to 2.5 times faster than anything before. The Mezzanine connector and 300W maximum is a nice touch too.

When we asked Sumit Gupta, the General manager of Tesla, if Pascal 2016 will use memory made by Nvidia or if they will buy it somewhere else, he confirmed that the stacked memory comes from memory manufacturers.

There are many memory manufacturers that Nvidia uses today for its computational parts and GPUs and the memory will come from one or more of them. This is not a big surprise, but it is good to clear this up, as Nvidia doesn’t want to produce its own memory when other companies are quite good at that.

The memory will come on the package together with the Parker chip and we can expect the usual amount of stacked memory that will be appropriate for a 2016 GPU or computational part. In other words we will see a multiple gigabytes of memory placed in the package.

Stacked 3D RAM will be the only memory on the board and we were told that there is no need for additional memory in the form of BGA chips that we see on today’s graphics cards. All the memory comes as 3D stacked RAM and it's powered by NV Link, so it should be able to "talk" to the GPU at very high speeds.

The exact figure is unclear, but it should be 2.5 times faster than the memory we see today that will get us to 800GB to 1000GB per second, quite a step up from current 288GB/s that Nvida has with its pricey GTX Titan card today.

Volta was previously supposed to follow in the footsteps of Maxwell, which is rolling out this year, at least this was the case last time we saw Nvidia's roadmap.

Things changed today at the Nvidia's GPU technology conference, Jen Hsun Huang, the CEO of Nvidia just showed an updated roadmap with Pascal replacing the Maxwell architecture at some point in 2016.

Volta is currently scheduled to come after Pascal, so definitely from late 2016 onwards. Nvidia told us that the Pascal got pulled in and the module that was shown at the keynote is meant for the increasingly popular HTPC form factor.

To clear any possible confusion, Pascal will make it to mobile, desktop, graphics card factors, so there is nothing to worry about. Just like Maxwell it will show up in all segments where Nvidia needs an up to date GPU.

Volta is now coming after Maxwell, that is the official line. Pascal comes in a unique form factor that opens up a lot of opportunities, but again this very unique chip with stacked memory and NVlink communication is happening in late 2016, quite some time from now.