Why are you complaining about a workstation/enthusiast CPU on the TR2 platform being inappropriate for server applications when the CPU is not a server CPU, and the platform is not a server platform? It makes no sense.

Member

Why are you complaining about a workstation/enthusiast CPU on the TR2 platform being inappropriate for server applications when the CPU is not a server CPU, and the platform is not a server platform? It makes no sense.

If you follow the whole chain of comments, you will realize that I am not complaining about TR2 is not being inappropriate for server applications, it obviously wasn't intended for that in the first place. I am saying that a CRIPPLED DESIGN (referring to the earlier mentioned 8C/16T CPU with 1ch DDR4 and also TR2) is not acceptable for server CPUs.

I thought you were deliberately being snarky with your first comment. Apologies.

Lifer

I agree that they will not move to a design that would effectively limit a CPU like Matisse to a single-channel configuration. In the end, their decision to or not to differentiate between server and client CCX design will be related to cost/benefit anaylsis: what can they afford, and how much do they stand to profit from the additional expense of maintaining two CCX designs?

Member

I agree that they will not move to a design that would effectively limit a CPU like Matisse to a single-channel configuration. In the end, their decision to or not to differentiate between server and client CCX design will be related to cost/benefit anaylsis: what can they afford, and how much do they stand to profit from the additional expense of maintaining two CCX designs?

I don't really have a very strong opinion on this. I'm sure AMD knows what's best for them.

I am mainly interested to find out if anyone here has thought about what ROME might look like given the 9 dies conundrum. Since a few people here have dismissed it as BS, I presume they must have at least given it some serious thought. I'd like very much to know what they think is wrong with the diagram.

Lifer

The only thing I see "wrong" with the Rome layout is that they have moved the memory controller away from the CCX. That effectively prevents them from using the same dice in a Matisse product, unless they intend to go with a "chiplet" design in the client CPUs as well.

Right now AMD has two dice: the CPU dice you get in everything except their APUs, and then the APU dice. And they had to make a separate APU die just to include Vega. I do not think AMD was seriously entertaining the notion of something along the lines of KabyLake-G for their own products.

Anyway, setting aside the APUs, all AMD products are nothing more than constant repetition of the CPU dice. Want more cores? Then add more dice. It allows them to keep the CPUs relatively simple in terms of packaging. The 2990WX is sort of an outlier since it is basically an EPYC with two of the dice not linked to DIMM slots on the board (yay product differentiation). But it's still just four Zen+ dice, regardless.

If we are to believe in the diagram from the OP, now you have a situation where every CPU based on Zen2 will have a minimum of two dice, assuming AMD wants to stick with the "interchangeable parts" strategy. For example, they can ill afford to produce one Zen2 die for Matisse that is one "chiplet" plus a dumbed-down version of the central die from the diagram (one without l4, no SERDES support, and a memory controller with two channels instead of eight). The cost appeal of Zen from the beginning is, again, repetition of the same die design, over and over again. Rome itself would have nine different dice (8 CCX dice and the central l4/IMC die), none of which they could use in client products.

AMD would need to use common CCX dice while altering the central "control" die based on the application. So for example, we get the heavy I/O and major memory bandwidth of the Rome die, but the Matisse die would be smaller and more pedestrian. Then they would link it (Matisse "central"/SoC die) to a single CCX die via IF, meaning a minimum of two dice for any Zen2 product. That introduces the potential for higher memory latency and other "fun" latency effects by moving all the SoC functions to a separate die, connected by IF. And now we also have the potential for high memory latency, the likes of which we currently only see on the 2990WX when attempting to access main memory from a thread pegged to one of the dice with a crippled DDR4 interface.

Member

The only thing I see "wrong" with the Rome layout is that they have moved the memory controller away from the CCX. That effectively prevents them from using the same dice in a Matisse product, unless they intend to go with a "chiplet" design in the client CPUs as well.

Right now AMD has two dice: the CPU dice you get in everything except their APUs, and then the APU dice. And they had to make a separate APU die just to include Vega. I do not think AMD was seriously entertaining the notion of something along the lines of KabyLake-G for their own products.

Anyway, setting aside the APUs, all AMD products are nothing more than constant repetition of the CPU dice. Want more cores? Then add more dice. It allows them to keep the CPUs relatively simple in terms of packaging. The 2990WX is sort of an outlier since it is basically an EPYC with two of the dice not linked to DIMM slots on the board (yay product differentiation). But it's still just four Zen+ dice, regardless.

If we are to believe in the diagram from the OP, now you have a situation where every CPU based on Zen2 will have a minimum of two dice, assuming AMD wants to stick with the "interchangeable parts" strategy. For example, they can ill afford to produce one Zen2 die for Matisse that is one "chiplet" plus a dumbed-down version of the central die from the diagram (one without l4, no SERDES support, and a memory controller with two channels instead of eight). The cost appeal of Zen from the beginning is, again, repetition of the same die design, over and over again. Rome itself would have nine different dice (8 CCX dice and the central l4/IMC die), none of which they could use in client products.

AMD would need to use common CCX dice while altering the central "control" die based on the application. So for example, we get the heavy I/O and major memory bandwidth of the Rome die, but the Matisse die would be smaller and more pedestrian. Then they would link it (Matisse "central"/SoC die) to a single CCX die via IF, meaning a minimum of two dice for any Zen2 product. That introduces the potential for higher memory latency and other "fun" latency effects by moving all the SoC functions to a separate die, connected by IF. And now we also have the potential for high memory latency, the likes of which we currently only see on the 2990WX when attempting to access main memory from a thread pegged to one of the dice with a crippled DDR4 interface.

Your comments are right on. That's why it is such a conundrum. I started with the assumption that ROME is 8 CPU dies + 1 I/O die (as the very credible rumors go), and tried to guess what it might look like. I still cannot figure any other way that would make sense. I explained the problems and how I arrived at this diagram in an earlier comment if you haven't seen it.

With regards to Ryzen (Matisse) being a completely different die. That is what I believe. By now, AMD is ready and able to spend a little more R&D money and take some more risk.

Diamond Member

The only thing I see "wrong" with the Rome layout is that they have moved the memory controller away from the CCX. That effectively prevents them from using the same dice in a Matisse product, unless they intend to go with a "chiplet" design in the client CPUs as well.

As soon as I heard the rumors I wondered the same thing, but then I thought the CCX may still keep the MC, only use it in low chip count SKUs. Something along the lines of the current TR2 SKUs having all MCs disabled and all chips being fed externally through IF.

Normally it would seem like a waste, but considering all consumer products would use that silicon area, "wasting" it for high margin products like high-end server chips would not be a problem.

The latency hit is too much at this point to have it off die. It's inevitable that it will be separated out since they should be able to deal with that eventually, with something like EMIB or the Active Interposer.

Member

The latency hit is too much at this point to have it off die. It's inevitable that it will be separated out since they should be able to deal with that eventually, with something like EMIB or the Active Interposer.

Member

As soon as I heard the rumors I wondered the same thing, but then I thought the CCX may still keep the MC, only use it in low chip count SKUs. Something along the lines of the current TR2 SKUs having all MCs disabled and all chips being fed externally through IF.

Normally it would seem like a waste, but considering all consumer products would use that silicon area, "wasting" it for high margin products like high-end server chips would not be a problem.

And it would only be wasted on the high-end server products. It's either that or make separate dies for server, or risk even more by using the "chiplet" approach in consumer products as well. Every option comes with it's own set of risks / restrictions.

Senior member

IMO, I think that the main issue with the chiplet design would be latency, but this could be reduced by using an interposer instead of connecting everything via the substrate.

It would mean that those chiplets would not be used for Ryzen as those chiplet's would be stripped of all the needed SoC parts (MC,VDD,IO,...), but it would be great for costs and flexibility (CPU only, CPU + GPU, CPU + FPGA, ...)

Obviously, those chiplets would not be used for Ryzen but I could see AMD stripping the CPU-only Ryzen design and only produce a Ryzen 8C APU for the 7nm.

Anyway, setting aside the APUs, all AMD products are nothing more than constant repetition of the CPU dice. Want more cores? Then add more dice. It allows them to keep the CPUs relatively simple in terms of packaging. The 2990WX is sort of an outlier since it is basically an EPYC with two of the dice not linked to DIMM slots on the board (yay product differentiation). But it's still just four Zen+ dice, regardless.

2990WX is not even an outlier, the uncore on Zeppelin is essentially a Swiss knife where never everything is used. Even in the best case of Epyc one of the Serdes is unused per die for optimal routing length. And the lower the food chain the die goes the more of the uncore is gated. 2990WX was an outlier in that a memory controller is being disabled, but Athlon 200GE joined that approach.

Member

For me, when the rumor that ROME would move from 4 dies to 9 dies first surfaced, I was not skeptical but troubled. Not skeptical, because multiple sources with impeccable track records said the same thing. Troubled, because I couldn't make sense of the technical trade-offs that would make moving from 4 dies to 9 dies feasible or worthwhile.

AdoredTV's latest video added a few pieces of info: (i) ROME will be a completely new design from the ground up, (ii) AMD will drop NUMA altogether, and (iii) ROME will support 4P configuration.

Piecing all the rumors together, I was finally able to come up with an architecture that explains why AMD would choose to move from 4 dies to 9 dies. If ROME is really like what I described in the diagram, it would give Intel's Cascade Lake and Cooper Lake a very serious run.

Of course, there are a million ways to do the same thing. So in all likelihood, I will be completely wrong.

IO. And no exact opposite. Ryzen for example only as 24 PCIe lanes available (socket decision) and has a bunch of interconnect stuff to be able to talk to other dies. None of them are useful on Ryzen. So if you remove everything. Memory controller, PCIE connections, pretty much all uncore features. Put them on a IO or communication chip and have the communication chip be only as large as it needs to be for the feature set of Ryzen. Ryzen 4k (because I doubt Zen 2 is chiplet ready imho) would have a much smaller die size for both the core chip and the com chip then lets say Ryzen 3k with Zen 2 which would be a shrunk down SR/PR (with more cores again my opinion). So whatever Zen 2 dies look like Ryzen 4k would have same feature set, but the die size of both chiplets would be smaller than Zen 2.

So then AMD can make a comm IO chip for ThreadRipper (still probably no cache, 4 Memory controllers, 64 Lanes). Then they can make a couple of different ones for EPYC, maybe 1 with 8 controllers, 128 PCIe lanes, no cache. Next one add 256MB of L4, next one add 512MB of L4. This allows AMD to continue to maximize the flexibility of zen dies by market demand. But give them the ability to configure, adjust, and specialize the CPU for the market. The IO Chiplet would also be much less complex than one that includes CPU cores. Meaning easier design. Cheaper and stock would be easier to control. To top it off. It wouldn't even have to be 7nm. This could be how they maintain the WSA, all the chiplets could be 12nm stuff from GF.

Ryzen Desktop and Notebooks: Different 8C/16T APU, i.e. bring back integrated GPU to mainstream desktop CPUs. A competitive CPU paired with a superior GPU would be a win for AMD. Fuse off features for product segmentation.

Member

IO. And no exact opposite. Ryzen for example only as 24 PCIe lanes available (socket decision) and has a bunch of interconnect stuff to be able to talk to other dies. None of them are useful on Ryzen. So if you remove everything. Memory controller, PCIE connections, pretty much all uncore features. Put them on a IO or communication chip and have the communication chip be only as large as it needs to be for the feature set of Ryzen. Ryzen 4k (because I doubt Zen 2 is chiplet ready imho) would have a much smaller die size for both the core chip and the com chip then lets say Ryzen 3k with Zen 2 which would be a shrunk down SR/PR (with more cores again my opinion). So whatever Zen 2 dies look like Ryzen 4k would have same feature set, but the die size of both chiplets would be smaller than Zen 2.

So then AMD can make a comm IO chip for ThreadRipper (still probably no cache, 4 Memory controllers, 64 Lanes). Then they can make a couple of different ones for EPYC, maybe 1 with 8 controllers, 128 PCIe lanes, no cache. Next one add 256MB of L4, next one add 512MB of L4. This allows AMD to continue to maximize the flexibility of zen dies by market demand. But give them the ability to configure, adjust, and specialize the CPU for the market. The IO Chiplet would also be much less complex than one that includes CPU cores. Meaning easier design. Cheaper and stock would be easier to control. To top it off. It wouldn't even have to be 7nm. This could be how they maintain the WSA, all the chiplets could be 12nm stuff from GF.

That's a lot of different dies to make. You know why I disagree. I think desktop is extremely cost sensitive and you can't beat a reasonably small monolithic die when it comes to performance and cost. Time will tell.

Diamond Member

That's a lot of different dies to make. You know why I disagree. I think desktop is extremely cost sensitive and you can't beat a reasonably small monolithic die when it comes to performance and cost. Time will tell.

Yeah but you can't base the idea of Desktop CPU pricing on Intel's pricing. AMD is making a pretty decent amount of money on competitively priced products with a 200+mm die. They could still do a 16c die in 7nm have it be 150mm die, then a 30-40mm communication die and still be smaller than they are now die wise. Or even go 60mm be slightly larger have that on 12nm and be much cheaper than than doing a 190-200mm die mono die on 7nm.

It several smaller and less complex dies to make, doesn't need to be on same process. While letting them still bin and get great yields on the main core dies. I don't see how they couldn't do that and possibly be more profitable then they currently are now.

Member

Yeah but you can't base the idea of Desktop CPU pricing on Intel's pricing. AMD is making a pretty decent amount of money on competitively priced products with a 200+mm die. They could still do a 16c die in 7nm have it be 150mm die, then a 30-40mm communication die and still be smaller than they are now die wise. Or even go 60mm be slightly larger have that on 12nm and be much cheaper than than doing a 190-200mm die mono die on 7nm.

It several smaller and less complex dies to make, doesn't need to be on same process. While letting them still bin and get great yields on the main core dies. I don't see how they couldn't do that and possibly be more profitable then they currently are now.

Believe me, multiple dies will not be cheaper or better performing than monolithic dies of reasonable size (~200mm^2 or less). Also, I think desktop CPUs would be limited 8C because of 2ch memory bandwidth constraints. As a rule of thumb, you need 2.5GB/s per core. You could push 12C but that would be really stretching it to the extreme.

Senior member

I think we're on to something with this discussion. I see their product stack shaping up like this:
A pair of IO chips with a smaller one for desktop usage and a larger one for TR/Server usage.
A base CPU chip with 8 cores and the needed glue logic to connect them to an I/O chip
An APU with 4-8 cores and an iGPU

Desktop AM4 will be a tiny 7nm CPU chip with a 12nm small I/O chip
TR will be 2-4 CPU chips with a large 12nm I/O chip
Epyc will be 4-8 CPU chips with a large 12nm I/O chip
APUs are stand alone monolithic designs and cover the low to mid part of the market.

Ryzen Desktop and Notebooks: Different 8C/16T APU, i.e. bring back integrated GPU to mainstream desktop CPUs. A competitive CPU paired with a superior GPU would be a win for AMD. Fuse off features for product segmentation.

All in all, just 3 unique dies are needed:

1. 8C/16T cores-only CPU die
2. SC die
3. 8C/16T APU die

I wouldn't be offended one bit even if you did. As long as you have actually thought about the problem and are willing to share with me why you believe it is BS.