GDDR5X Memory Standard Gets Official with JEDEC

Though information about the technology has been making rounds over the last several weeks, GDDR5X technology finally gets official with an announcement from JEDEC this morning. The JEDEC Solid State Foundation is, as Wikipedia tells us, an "independent semiconductor engineering trade organization and standardization body" that is responsible for creating memory standards. Getting the official nod from the org means we are likely to see implementations of GDDR5X in the near future.

The press release is short and sweet. Take a look.

ARLINGTON, Va., USA – JANUARY 21, 2016 –JEDEC Solid State Technology Association, the global leader in the development of standards for the microelectronics industry, today announced the publication of JESD232 Graphics Double Data Rate (GDDR5X) SGRAM. Available for free download from the JEDEC website, the new memory standard is designed to satisfy the increasing need for more memory bandwidth in graphics, gaming, compute, and networking applications.

Derived from the widely adopted GDDR5 SGRAM JEDEC standard, GDDR5X specifies key elements related to the design and operability of memory chips for applications requiring very high memory bandwidth. With the intent to address the needs of high-performance applications demanding ever higher data rates, GDDR5X is targeting data rates of 10 to 14 Gb/s, a 2X increase over GDDR5. In order to allow a smooth transition from GDDR5, GDDR5X utilizes the same, proven pseudo open drain (POD) signaling as GDDR5.

“GDDR5X represents a significant leap forward for high end GPU design,” said Mian Quddus, JEDEC Board of Directors Chairman. “Its performance improvements over the prior standard will help enable the next generation of graphics and other high-performance applications.”

JEDEC claims that by using the same signaling type as GDDR5 but it is able to double the per-pin data rate to 10-14 Gb/s. In fact, based on leaked slides about GDDR5X from October, JEDEC actually calls GDDR5X an extension to GDDR5, not a new standard. How does GDDR5X reach these new speeds? By doubling the prefech from 32 bytes to 64 bytes. This will require a redesign of the memory controller for any processor that wants to integrate it.

As for usable bandwidth, though information isn't quoted directly, it would likely see a much lower increase than we are seeing in the per-pin statements from the press release. Because the memory bus width would remain unchanged, and GDDR5X just grabs twice the chunk sizes in prefetch, we should expect an incremental change. No mention of power efficiency is mentioned either and that was one of the driving factors in the development of HBM.

Performance efficiency graph from AMD's HBM presentation

I am excited about any improvement in memory technology that will increase GPU performance, but I can tell you that from my conversations with both AMD and NVIDIA, no one appears to be jumping at the chance to integrate GDDR5X into upcoming graphics cards. That doesn't mean it won't happen with some version of Polaris or Pascal, but it seems that there may be concerns other than bandwidth that keep it from taking hold.

Medical leeches never gone anywhere, and will not go anywhere for several decades to come. I dunno where you live (guessing 'Murica, huh), but in majority of the rest of the world medical leech treatment are still a very essential medical treatment method these days.

Essential for what? There are a very small number of medical conditions where leaches will help. They are used for those conditions in the US. Most uses of leaches in the past, and still in other countries, are uses for which the leaches will have no effect.

HBM for the win, as GDDR5X is still a space and power HOG! It's going to still be GDDR5 for most middle end desktop graphics cards that can get by fine on GDDR5's bandwidth, with HBM2 gradually taking over, And HBM2 even comes with some latency enhancements over HBM!

"One of the key enhancements of HBM2 is its Pseudo Channel mode, which divides a channel into two individual sub-channels of 64 bit I/O each, providing 128-bit prefetch per memory read and write access for each one. Pseudo channels operate at the same clock-rate, they share row and column command bus as well as CK and CKE inputs. However, they have separated banks, they decode and execute commands individually. SK Hynix says that the Pseudo Channel mode optimizes memory accesses and lowers latency, which results in higher effective bandwidth."(1)

Bye GDDR-Anything, that HBM on the Interposer eats your bandwidth lunch, and does so at about 1/7th the memory clock speed so on the power usage metric and the space saving metric HBM/HBM2 rules! I can't wait for APUs on an interposer to add some Zen cores to the interposer package, along with the HBM, for some wide thousands of traces directly from the Zen cores to the Polaris GPU's cores, and that's in addition to the wide 1024 bit traces/data paths to each HBM stack. HBM/HBM2 is going to be a must for laptops because of those power usage and space savings metrics for both APUs and discrete mobile GPUs!

I didnt think about it before but re size, if thats the case, then going forward the mid range being physically larger than the high end will be the norm (390/390 X vs Fury/Fury X); even for nVidia.

Or maybe not, given the need to keep thermals in check. I dont imagine AMD would go the water loop route at the high end again with Polaris' new efficiencies. It would be a tragedy yet again to have the more powerful dual GPU solution not recommended across the board because of the unwieldy prospect of two radiators in a case.

Hopefully it can move down into the mid-range market. The die sizes for each market are going to change a bit. The giant die we see on 28 nm may not be economical on 14 nm FinFET due to lower yields. Also, HBM2 die are more than double the size of HBM1; about 40 square mm compared to about 92 square mm. This means that 4 stacks will take around 400 square mm on the interposer. This will only leave around 400 square mm for the GPU, but that is huge on 14 nm. This is assuming the max size of the interposer is still around the size of the one used in the Fury.

It t would be great if the economics work out to allow 1, 2, and 4 HBM stack devices. A single stack of HBM2 can still provide 4 GB at 256 GB/s, which is faster than an Nvidia 980 at 224 GB/s. A 2 stack device would still be 8 GB capacity at 512 GB/s, so that would be more powerful than the current high-end. A 980 is around 400 square mm, but a similarly powerful GPU at 14 nm FinFET could be maybe half that size or less. This would allow the interposer to only be around 300 square mm (very rough estimate) for a single stack device. To get over 256 GB/s with GDDR5 it currently takes a 384-bit bus. Depending on the economics, these small interposer devices may compete with GDDR5 solutions even for mid-range devices. With the higher cost and lower power requirements for mobile devices, I would think smaller interposer based devices may show up in mobile first.

HBM is great and all but considering we only finally got rid of low end DDR3 GPUs during this past year we're not going to see HBM on low end GPUs for quite some time. Doubling the bandwidth is nice but will a $100 GPU need anything more than GDDR5 anyways? I don't see there being 3 types of memory used in a product stack so unless GDDR5X is just as cheap to produce as GDDR5 I think it will have a hard time justifying its existence.

GDDR5X has twice the bandwidth of GDDR5. So it doesn't have to be just as cheap to produce, as long as it is much less than twice as expensive at GDDR5 it will have a place on low end cards that don't want the expense of a very wide memory interface.

A doubling of bandwidth is not needed at the low end, so this is meant as an HBM alternative at the high end. Maybe they're betting HBM availability issues and plan on offering sweetheart deals (Intel is good at that) while keeping production costs down.

Bandwidth wont be doubled initially (maybe 1.5x) so if these make a showing, it will likely be in the high-mid SKUs.

So...if GDDR5 is "10" and HBM2 is "35+", then where is GDDR5X in that? Something like "22" or "25", I guess. Either way, I think GDDR5X would be feasible only for the so-called "re-brands" of the low and mid price segment, while high and top tier cards will all now come with HBM on their board. That's the only reasonable way I can see it being any feasible. If not - GDDR5X will be, pretty much, useless, and would become obsolete quite fast. Not even home console makers are stupid and naive enough to use this in their future offerings instead of HBM (hell, it looks like even the quite-no-so-far-away-already upcoming Nintendo's NX will be an HBM-based platform, let alone PS 4 or next X-BAAAAAWWWKZ), so I really can't see this being implemented in anything else but just low and mid tiered GPU "re-brands", or mobile devices. Everyone's tried the hell out of GDDR5, that is a fact. Even if you update specs somewhat and do it on a new tech-process, it still has "GDDR5" in it's name, and people would definitely not like that when there will be a full-blown HBM and HBM2 offerings to get, lying nearby.

Re-brands are NOT necessarily 1-to-1 exactly same cards each and every time, so I don't really mean same cards exactly. If Radeon is being called R9 380X, it doesn't mean it's exactly the same as R9 280X, because it's not (R9 280X is a Tahiti XT, which is a slightly updated HD 7970 GHz, but R9 380X is based on Tonga XT, which in itself is an update of R9 285's Tonga, and that one's quite different since it's a completely different system than the previously used Curacaos, Tahitis, and all that. There's only three Tonga cards so far - R9 285, R9 380, and R9 380X, and neither one of them has anything to do with the previous generations, Tonga is it's own separate and original animal, essentially).

I think the term you want to use is "refresh" and not re-brand. Think of rebrand as something that various vendors do when they purchase products from another vendor and then "rebrand" them as their own, where basically the only thing that changed was the label and the box it came in.
You used the 290, 290x, 390 and 390x as good examples of what a product refresh is, where the original product is still there, but tweaked to make it better.

Not really, since absolute majority of time "re-branding" and "refreshing" are being used in exactly same context, being thrown back and forth interchangeably. Also, no, "re-brand" doesn't necessarily mean that a third-party manufacturer just slapped it's sticker on a reference card and put it in it's own box, since sometimes even "reference cards" might have tweaks by those said third parties. In this modern day and age "re-brand" mainly means that a manufacturer performs slight tweaks to it's older hardware to up it's default performance a bit (be that either through firmware update, or through OverClocking, or through something else, like change of the thermal paste for a better one, redesign of the cooler, and etc) AND then releases it as a separate product while changing numbering on the box/in the official spread sheets. Releases tweaked OLD hardware and sells it as a new one. That is a RE-BRAND. This is not quite exactly same thing as a REFRESH, though. I didn't say anything about R9 390 or R9 390X - those ARE straight out re-brands of R9 290X and R390X. I've clearly made a comparison between R9 280X and R9 380X, which are two quite different things. It's true that R9 280X is essentially a "polished-to-the-max" and tweaked HD 7970 GHz at it's core, and thus R9 280X can be considered as a re-brand, but R9 380X is NOT. R9 380X is an absolutely different kind of an animal, from different, separate family. You CAN say about R9 380X that it's a refresh, but you CAN'T call it a re-brand, because R9 285 was THE ONLY Tonga-based video card out there before R9 380X and R9 380 were made, and R9 380X was done on a Tonga XT silicone, NOT first Tonga, and they're both are from completely different pricing segments also. You really can't compare R9 285 and R9 380X, because the gap in specs and features is too wide, so R9 380X is NOT a "re-brand of R9 285" any. But you CAN compare R9 280X with R9 380X, because R9 380X was developed and designed to be straight out replacement for R9 280X, in Radeon line of video cards. And that IS a refresh. So that's basically that. Maybe I should've been more thorough with it from the get-go, indeed. Basically, what I'm trying to say here, is that things essentially go like this: R9 370X is a re-brand, because it's a straight out "revamp" of R9 270X, while R9 270X is a straight out "revamp" of HD 7870 GHz, so R9 270X = slightly tweaked HD 7870 GHz -> R9 370X = slightly tweaked R9 270X = even further tweaked HD 7870 GHz. R9 380X, on the other hand, is NOT a re-brand, but it's a REFRESH aimed to completely replace R9 280X (which is slightly tweaked HD 7970 GHz), and R9 380X is it's own original thing, since it's an update of an original platform that is Tonga, unlike R9 280X which was a Tahiti card, and Tahiti has nothing in common with Tonga, Tonga is it's own thing. That is my point, basically.

Is this a Micron attempt to delay the adoption of HBM? Are AMD or Nvidia interested in this? Perhaps Intel is? Micron and Intel have HMC (Hybrid Memory Cube) but HMC is not suited to consumer level graphics cards at all. A lot of people seem to think that HMC is competition for HBM, but it is not. HMC uses stacked memory, but the bottom logic die converts the wide connections to a narrow serial interface that can run through a PCB. The interface is only 8 or 16 bits wide, but it is clocked very high. It can do 20 to 30 GB/s read bandwidth per channel. That sounds great when comparing with DDR3 or DDR4. You can replace the 2 or 4 channels of DDR4 with 2 or 4 channels of HMC and still achieve higher bandwidth and lower pin count.

This isn't that great when comparing with HBM though. To do 512 GB/s, you would need around 20 to 25 16-bit channels. To do 1 TB/s, you would need around 40 to 50 16-bit channels. Intel has talked about over 400 GB/s with Knights Landing, but that seems to be aggregate (read and write), so it is really only around 200 GB/s read bandwidth. That works out to around 25 GB/s for the 8 channels it uses. It is not a cheap chip, so I don't know if a consumer grade part could even use 8 channels economically. They also seem to have HMC defined as only going between board mounted components, not socket or slot connections. This doesn't seem like a good replacement for DDR4 either. It would be lower power for the bandwidth but probably a lot higher price and not upgradable.

It's as much of an "attempt to delay", rather than "to make a more smoother transition from GDDR5", as they state. Personally, I don't believe that PR BS, since it's still a piece of hardware which they'll ask a crapton of monnies for, so I really don't see it as a "smooth transition" simply because people aren't THAT stupid as to buy this crap instead of HBM/HBM2 when they'll be moving from GRRD5. You have to be VERY naive to buy GDDR5X-based video card instead of the upcoming HBM2 beasts or at least HBM low/mid tiered offerings.

You don't have to be naive to buy GDDR5X. GDDR5X offers about double the bandwidth of GDDR5. HBM offers much more than that. However, there is going to be quite some time when affordable level cards (probably <$300) will not be able to afford to implement an interposer and HBM. In those cards, it will be better to use GDDR5X than GDDR5. Of course if you are looking at $500+ GPUs next year you will want HBM, but it will be too expensive for the whole market initially.

It doesn't sound like GDDR5x is implemented in upcoming Nvidia or AMD GPUs. This pushes products out for quite a while unless other companies have plans to use it (who?). Given that AMD seems to be able to use HBM1 in $500 card, I don't think there will be that much of an issue for it to come down into the $300 range with HBM2. HBM2 will be produced by multiple manufactures and is apparently already ramping up at Samsung.

HBM2 will be larger die though. In the Anandtech article, HBM1 is listed as having a die size of around 40 square mm while HBM2 is around 92 square mm. A high end GPU will need to be considerably smaller than AMD's massive 592 square mm Fury die. The max size for current interposers is over 800 square mm, but with HBM2, 4 stacks will limit the size of the GPU to quite a bit under 400 square mm. For mid range chips, it will allow for much smaller interposers though. Using HBM2, they could have an interposer half the size of current interposers by using a ~200 square mm GPU and only two stacks of HBM2. This would still allow 8 GB of memory at 512 GB/s, which will still be faster than GDDR5x. Note that a 200 square mm GPU on 14 nm FinFET will still be a very powerful device which may rival the current high-end. I don't know if it will be economical enough to allow a single stack device, although it could be very small. You need to realize that an Nvidia 980 only has 224 GB/s. A single stack of HBM2 can provide 4 GB of memory at 256 GB/s.

It doesn't sound like GDDR5x is implemented in upcoming Nvidia or AMD GPUs. This pushes products out for quite a while unless other companies have plans to use it (who?). Given that AMD seems to be able to use HBM1 in $500 card, I don't think there will be that much of an issue for it to come down into the $300 range with HBM2. HBM2 will be produced by multiple manufactures and is apparently already ramping up at Samsung.

HBM2 will be larger die though. In the Anandtech article, HBM1 is listed as having a die size of around 40 square mm while HBM2 is around 92 square mm. A high end GPU will need to be considerably smaller than AMD's massive 592 square mm Fury die. The max size for current interposers is over 800 square mm, but with HBM2, 4 stacks will limit the size of the GPU to quite a bit under 400 square mm. For mid range chips, it will allow for much smaller interposers though. Using HBM2, they could have an interposer half the size of current interposers by using a ~200 square mm GPU and only two stacks of HBM2. This would still allow 8 GB of memory at 512 GB/s, which will still be faster than GDDR5x. Note that a 200 square mm GPU on 14 nm FinFET will still be a very powerful device which may rival the current high-end. I don't know if it will be economical enough to allow a single stack device, although it could be very small. You need to realize that an Nvidia 980 only has 224 GB/s. A single stack of HBM2 can provide 4 GB of memory at 256 GB/s.

Please tell me you're joking.
"Quite some time before affordable HBM cards would be available"? PUH-EFFING-LEASE. Will the cut Nano had just recently and with the upcoming Fury price cut, they are ALREADY very affordable and absolutely competitive. And you KNOW that Nvidia won't be ANYWHERE even remotely near that price level with their first HBM-based cards, simply because they're waaay late into the game with all of this, so, please, don't fool yourself and stop living in delusional denial of the reality. HBM-based video cards are ALREADY extremely competitive and VERY affordable, if you take into consideration how Nvidia milks it's "general audience" of brainless shilling sheeple.

Not joking. By 'affordable' I don't mean for people who read PCPer who have a PC upgrade budget. I mean that, like previous generations, nVidia and AMD will need to make a chip that can product $199, $249 and $299 SKUs, and those parts will not to be able to afford HBM in 2016.

A Fury Nano is not an 'affordable' card, it is 'enthusiast' tier.

I don't see why you are singling out AMD vs. nVidia here. Both AMD and nVidia will probably release GDDR5 or GDDR5X parts this year.

If GDDR5x is not designed into the upcoming parts, and it sounds like it is not, then you will not see GDDR5x parts from AMD or Nvidia this year. If they ever support it, it will be in the next iteration. I don't know if we will see a new part in the first half of 2017, other than maybe a larger HBM part. This gives HBM2 probably at least a year and a half to ramp up production and reduce cost. We may not have HBM in the mid-range in 2016, but in 2017 it will probably be much cheaper and much more common. In 2017, they should even be making HBM based APUs, although it is yet to be seen how affordable those will be. With a single stack of HBM2 capable of 4 GB at 256 GB/s (more than a current 980), it isn't clear what is going to happen in the GPU market. I would expect the mobile market to definitely go with the HBM APUs.