Xeon Gold & Platinum Officially Confirmed

Xeon Gold & Platinum Officially Confirmed

[Edit 04/27/2017: A newer thread that includes more solid information directly from Intel (non-rumor info) was merged with this one, see this post for additional information.]

So there is an increasingly solid set of rumors about the next generation of Skylake Xeons for workstations that is leaking in various locations including a recent post over in the Anandtech forums.

In brief: There's a "Gold" series that goes up to 22 cores (18 cores with the highest clockspeeds) and the "Platinum" series that goes all the way up to 28 cores (although you can buy a cut-down version with a mere 24 cores too). I have no direct confirmation, but I strongly suspect that at least the "Gold" series is designed for the new LGA-2066 socket with quad-channel RAM. That's the "smaller" socket and a direct successor to LGA-2011. As for the Platinum series, I'm on the fence as to whether these chips are also on LGA-2066 or maybe they are designed for the big-time LGA-3647 socket that's shared by the very high-end parts like the 32 core Skylake EP chips.

Here's the leaked purported lineup:

A few interesting points: 1. There have been rumors about a radically different cache hierarchy in these chips floating around since at least January. I'm not talking about the assumption that the L2 cache would go to 512KB/core, I'm talking something much more unusual. At this stage these are still rumors but I've seen enough consistent information online to at least play around with the possibility that they are real.

First: Each core now has not 256KB, not 512KB, but 1MB for the L2 cache. Second: Each slice of the L3 cache is now only 1.375MB compared to 2.5MB in Broadwell Xeon & Broadwell HEDT parts. Third: Putting those two numbers together on a core-for-core basis actually gives you a total L2 + L3 cache pool that's approximately 15% smaller that what you would get on an existing Broadwell Xeon (or a Broadwell-E HEDT part). To put things into perspective, an 8 core RyZen part has a total of 20 MB of L2 (4 MB) and L3 (16 MB) cache. The equivalent 8-core Skylake Xeon Gold/Platinum part would only have a total of 19 MB (8MB L2 + 1.375 * 8 = 11 MB of L3). Fourth: I have a few theories but with this cache hierarchy it looks like Intel has abandoned a fully-inclusive L3 cache that maintains a complete copy of the contents of the L2 cache. Either that or the L3 on these chips actually stores an extremely small amount of data outside of what's already in the L2 caches...

2. Notice how there's an 18 core Xeon Gold 6154 CPU with a 200 watt(!) TDP in there? Now look down the list to the 18 Core Xeon Gold 6134 with the higher base clock speed but the rather pedestrian 130W TDP... At first you might be thinking that can't be right, but there's a reason for it: it appears that some Xeon Gold parts don't have AVX-512 activated while others do have it turned on. One major factor affecting the maximum theoretical power consumption beyond the core counts and the clockspeeds is AVX-512.

3. All of these parts, while clearly capable of being run in servers, at least have the branding of workstation-type CPUs and not traditional Skylake server parts. These parts are clearly not merely Skylake-EP, which incidentally goes up to 32 cores, "only" has a TDP of up to 165 watts or so, and is already publicly available in a limited manner via Google Cloud.

Re: Skylake Xeon: Now in Gold & Platinum

Posted: Sat Mar 25, 2017 2:26 am

by Welch

Trolllololoolololoooolll..... Right? I mean this would be some amazing timing wouldn't it? If even a smidgen true, then we know what Intel was holding back on all of these years while milking the market for all it's worth.

Re: Skylake Xeon: Now in Gold & Platinum

Posted: Mon Mar 27, 2017 2:42 am

by synthtel2

chuckula wrote:

1. There have been rumors about a radically different cache hierarchy in these chips floating around since at least January. I'm not talking about the assumption that the L2 cache would go to 512KB/core, I'm talking something much more unusual.

[...]

Now that's interesting. Everyone else seems to be using more passive L3s these days (victim L3 with L2 doing the heavy stuff), and if this is true, it looks like Intel is thinking the same way.

I find it just as interesting that this is rumored for Skylake variants rather than Coffee/Cannon/whateverlake. This seems like the sort of thing that would usually happen in a tock (for whatever value the term has these days), not something keeping the same name.

I'm trying to figure out why this is a useful change. I have a tough time envisioning this being faster in the average workload (considering that a 1MB L2 is liable to be a touch slower), and the current Intel uncore design doesn't use much power. Some power savings seem likely, but performance can't be dropping by much if this is to make sense. What if the current architecture has trouble scaling with high core counts? Alternately, what if the new way is better for AVX-512? Under 28 cores worth of AVX-512 load, I could see going more directly to bigger L2s being better than trying to run stuff through a unified cache level. (How unified are the L3s on the biggest Broadwell dies, anyway?)

chuckula wrote:

2. Notice how there's an 18 core Xeon Gold 6154 CPU with a 200 watt(!) TDP in there? Now look down the list to the 18 Core Xeon Gold 6134 with the higher base clock speed but the rather pedestrian 130W TDP... At first you might be thinking that can't be right, but there's a reason for it: it appears that some Xeon Gold parts don't have AVX-512 activated while others do have it turned on. One major factor affecting the maximum theoretical power consumption beyond the core counts and the clockspeeds is AVX-512.

That's why I'm skeptical that AVX-512 will show up on S parts anytime soon. I didn't expect them to be selective about it on Xeons though - I thought they'd just do more of the deal where they downclock under AVX loads.

Re: Skylake Xeon: Now in Gold & Platinum

Posted: Mon Mar 27, 2017 3:08 am

by NTMBK

That gaudy branding is perfect for Intel's Trump-loving CEO.

Re: Skylake Xeon: Now in Gold & Platinum

Posted: Mon Mar 27, 2017 3:15 am

by Krogoth

Looks like Intel marketing is trying to move away from E5 and E7 brand into "Gold" and "Platinum". I suspect they are trying to appeal to PHB-types.

Re: Skylake Xeon: Now in Gold & Platinum

Posted: Mon Mar 27, 2017 11:08 am

by the

chuckula wrote:

So there is an increasingly solid set of rumors about the next generation of Skylake Xeons for workstations that is leaking in various locations including a recent post over in the Anandtech forums.

In brief: There's a "Gold" series that goes up to 22 cores (18 cores with the highest clockspeeds) and the "Platinum" series that goes all the way up to 28 cores (although you can buy a cut-down version with a mere 24 cores too). I have no direct confirmation, but I strongly suspect that at least the "Gold" series is designed for the new LGA-2066 socket with quad-channel RAM. That's the "smaller" socket and a direct successor to LGA-2011. As for the Platinum series, I'm on the fence as to whether these chips are also on LGA-2066 or maybe they are designed for the big-time LGA-3647 socket that's shared by the very high-end parts like the 32 core Skylake EP chips.

I would expect the lower core counts to appear on LGA-2066 due to bandwidth concerns with the high core count models. By lower core counts, I'm implying <20 cores here for LGA 2066. I think all of these are going to be LGA 3647 with the LGA 2066 parts taking the Silver title. Bronze I presume would be Xeon D or replace the E3 line up.

As for what distinguishes Platinum and Gold would be the usage of memory buffers and 8+ socket support on the Platinums. This mirrors the difference between E5 and E7 today.

chuckula wrote:

A few interesting points: 1. There have been rumors about a radically different cache hierarchy in these chips floating around since at least January. I'm not talking about the assumption that the L2 cache would go to 512KB/core, I'm talking something much more unusual. At this stage these are still rumors but I've seen enough consistent information online to at least play around with the possibility that they are real.

First: Each core now has not 256KB, not 512KB, but 1MB for the L2 cache. Second: Each slice of the L3 cache is now only 1.375MB compared to 2.5MB in Broadwell Xeon & Broadwell HEDT parts. Third: Putting those two numbers together on a core-for-core basis actually gives you a total L2 + L3 cache pool that's approximately 15% smaller that what you would get on an existing Broadwell Xeon (or a Broadwell-E HEDT part). To put things into perspective, an 8 core RyZen part has a total of 20 MB of L2 (4 MB) and L3 (16 MB) cache. The equivalent 8-core Skylake Xeon Gold/Platinum part would only have a total of 19 MB (8MB L2 + 1.375 * 8 = 11 MB of L3). Fourth: I have a few theories but with this cache hierarchy it looks like Intel has abandoned a fully-inclusive L3 cache that maintains a complete copy of the contents of the L2 cache. Either that or the L3 on these chips actually stores an extremely small amount of data outside of what's already in the L2 caches...

Going to 1 MB of L2 cache is a very big change. I was expecting 512 KB since they halved the associativity in the 256 KB consumer SkyLake. This would imply a 16 way associative design. I'm think the more likely scenario for the L3 cache is that Intel simply ran out of die space and scaled it back. More L2 cache per core, AVX-512 per core, higher core counts and six DDR4 memory channels per die imply a very large die for the 32 core part.

The new cache design could tie into a new on-die fabric. Intel's old ring bus topology is showing its age with these high core counts. A 2D torus (aka grid) topology would cut down on latency for on-die cache-to-cache coherency. This is where I see the big performance gains from the new cache topology.

The inclusive nature of the caches helped keep coherency sane as only the L3 cache had to be queried for data. With so little L3 cache, it wouldn't make sense to keep it inclusive but going exclusive adds considerably complexity to coherency. More so if they go to a new on-die fabric.

Having less cache than Ryzen is odd it may not be an issue dependent upon the topology. Ryzen can have duplicate entries on-die between the two clusters. If Intel has low enough latency to not need duplication, then the capacity difference isn't going to an issue.

chuckula wrote:

2. Notice how there's an 18 core Xeon Gold 6154 CPU with a 200 watt(!) TDP in there? Now look down the list to the 18 Core Xeon Gold 6134 with the higher base clock speed but the rather pedestrian 130W TDP... At first you might be thinking that can't be right, but there's a reason for it: it appears that some Xeon Gold parts don't have AVX-512 activated while others do have it turned on. One major factor affecting the maximum theoretical power consumption beyond the core counts and the clockspeeds is AVX-512.

Not sure if AVX-512 is going to be the big power dictator. Granted AVX does have an impact on clock speeds but not that much for that much power. What doesn't make sense is the Gold 6150 consuming nearly twice as much power at a lower clock speed than the Gold 6136 despite having the same core count.

Intel has been toying with additional on-package logic which would account for some of the power difference. SkyLake-EP is going to have several interesting options like on-package FPGA or Omnipath fabric. This could also mean why Intel is going the color route to help distinguish what is what is what in terms of model naming. Secret decoder ring still required.

chuckula wrote:

3. All of these parts, while clearly capable of being run in servers, at least have the branding of workstation-type CPUs and not traditional Skylake server parts. These parts are clearly not merely Skylake-EP, which incidentally goes up to 32 cores, "only" has a TDP of up to 165 watts or so, and is already publicly available in a limited manner via Google Cloud.

I wouldn't use the Google Cloud chips as a prime example: those are custom SKU specific to Google. Amazon and Microsoft get similar customer-only SKUs for their data centers too.

Intel has been consolidating their socket and chipset infrastructure. The EP and EX line ups are to share a common socket at the high end but the EX line was to continue to utilize the memory buffers. (Note that Intel could do an IBM thing here and include L4 cache with the buffers. There have been some very specific cache changes for consumer Sky Lake's eDRAM cache on GT3e/GT4e parts that would make more sense here.)

Similarly Intel is purportedly going to be using the same chipset interface so the IO hub from the Z270 chipset can be used (or rather its server version). The only thing missing there for servers are 10 Gbit Ethernet options which would hang off a CPU socket anyway for both latency and bandwidth reasons.

Re: Skylake Xeon: Now in Gold & Platinum

Posted: Mon Mar 27, 2017 1:50 pm

by synthtel2

the wrote:

Not sure if AVX-512 is going to be the big power dictator. Granted AVX does have an impact on clock speeds but not that much for that much power.

256-bit AVX already dominates power use when it's active, and this should be roughly doubling it at the same clocks and core counts. 18 cores at 3.0 GHz doing AVX-512 works out to nearly 3.5 SP TFLOPS. That looks like pretty weak FLOPS/W by GPU standards, but it destroys every current Intel CPU (excepting those clocked at a snail's pace for efficiency, and excepting their GPU-ish weirdness).

Re: Skylake Xeon: Now in Gold & Platinum

Posted: Mon Mar 27, 2017 3:31 pm

by the

synthtel2 wrote:

the wrote:

Not sure if AVX-512 is going to be the big power dictator. Granted AVX does have an impact on clock speeds but not that much for that much power.

256-bit AVX already dominates power use when it's active, and this should be roughly doubling it at the same clocks and core counts. 18 cores at 3.0 GHz doing AVX-512 works out to nearly 3.5 SP TFLOPS. That looks like pretty weak FLOPS/W by GPU standards, but it destroys every current Intel CPU (excepting those clocked at a snail's pace for efficiency, and excepting their GPU-ish weirdness).

Power consumption does indeed go up with AVX2 operating in Broadwell-EP but only a drop of 200 Mhz is needed from base clock. Turbo still work but at full AVX load, difficult to push beyond AVX default clock unless some aggressive cooling is used. That is where the real impact of AVX is felt is in the Turbo figures.

The clock speeds listed on the chart to appear to be regular base clocks and in line with the previous Haswell and Broadwell generations.

Re: Skylake Xeon: Now in Gold & Platinum

Posted: Mon Mar 27, 2017 6:51 pm

by synthtel2

200 MHz can buy a fair bit of power headroom, and the power use gap between normal operation and AVX2 is relatively small (1.5x or a bit more?). It's no doubling over non-FMA AVX because you've still got overhead for decoding, scheduling, etc. It is however clearly enough math that the power use of the ALUs (and probably all the shuffling data around) is starting to drown out the other stuff. That's only going to get more clear with AVX-512.

I can't see the image with the purported lineup for some reason, so I am relying on text-format info for that.

Re: Skylake Xeon: Now in Gold & Platinum

Posted: Mon Mar 27, 2017 11:08 pm

by the

200 Mhz isn't that much power as long as voltage remains constant. At the upper echelons of clock speeds, the extra voltage to get that 200 Mhz would matter. Here voltage isn't cranked up to boost clock speeds. That power consumption is better spent on increasing core count while keeping the entire chip at the lowest possible voltage.

A ~50% in power consumption for AVX FMA does seem about right considering there are now three operands utilized in the actual computation.

Re: Skylake Xeon: Now in Gold & Platinum

Posted: Tue Mar 28, 2017 12:48 am

by synthtel2

Voltage won't be constant with a change of 200 MHz, so long as the original voltage is more than ~700 mV. AFAIK, it'll be about a 50 mV difference (25 mV per 100 MHz is a typical slope). Starting in the ballpark of 1000 mV and ignoring all kinds of complexities, that should represent a 12-15% difference in power consumption (kV2 for leakage plus kFV2 for switching energy). That's a bit less than I expected before doing the math, but still enough to help.

Operand count isn't going to be any real predictor of an instruction's power use, but now that you mention it, I'm curious what addition:multiplication ratios are like in typical pieces of high-intensity vector code. FMA exists for a reason, but it won't always be balanced. This matters because multiplication is a whole lot more complex to implement than addition and I presume uses a great deal more power. If that's true though (and ALUs outweigh data movement power use), it should be possible to construct an AVX1 pathological case that makes IVB burn nearly as much power as HSW + FMA. (In normal cases that would benefit from FMA, the IVB code would only be MULing every other cycle.) Now I'm also really curious how much power is used by the ALUs themselves and how much is just getting the data to and from them.

Skylake Gold & Platinum Xeons CONFIRMED

Posted: Thu Apr 27, 2017 7:05 am

by chuckula

New information from a German website about a product update notification that officially confirms the new Xeon Gold & Platinum SKUs. Incidentally, you can get a PDF of this information directly from Intel's website here so this actually counts as "confirmed"!

The most interesting part is that if I am interpreting the column that is second-from-the-right correctly, all of these parts are using socket "R3" which is the new LGA-2066 socket that replaces the old LGA-2011 socket. Even the "platinum" series parts do not appear to be using the larger LGA-3647 socket that we already know is hosting the biggest of the Skylake Xeons, including the 32-core Purley platform systems that companies like Google have already deployed for cloud services. [Edit: However, when you look at the table more carefully, all the Xeon Phi parts, which we know use LGA-3647, are listed using the "R2" nomenclature, so maybe that column does not directly identify the socket. I'll consider it an open possibility that at least some of the bigger Xeons user LGA-3647 until we know more]

From the branding and the socket, it looks like the Xeon Gold/Platinum lineup is clearly targeted at 1/2 socket servers and.. maybe more interestingly.. workstations, while the LGA-3647 parts are in the bigger servers. This table appears to be pretty consistent with some of the earlier rumors that have been floating around, although it also lists some additional variants of the processors like the "T" and "M" variants that weren't expressly listed in the earlier rumors.

You can see the official product update table here [Edit: Same table as from the German site but from Guru3d since their image is easier to read]:

Re: Xeon Gold & Platinum Officially Confirmed

Posted: Thu Apr 27, 2017 7:34 am

by just brew it!

The branding seems a little silly to me for a product aimed at the professional/datacenter market. What's next -- Xeon Fatal1ty Extreme Edition?

Re: Xeon Gold & Platinum Officially Confirmed

Posted: Thu Apr 27, 2017 7:36 am

by chuckula

just brew it! wrote:

The branding seems a little silly to me for a product aimed at the professional/datacenter market. What's next -- Xeon Fatal1ty Extreme Edition?

Don't give them ideas!

Then again, maybe they were jealous of RyZen and felt the need to have their own stupid product name.

Re: Xeon Gold & Platinum Officially Confirmed

Posted: Thu Apr 27, 2017 7:42 am

by Topinio

just brew it! wrote:

The branding seems a little silly to me for a product aimed at the professional/datacenter market. What's next -- Xeon Fatal1ty Extreme Edition?

Oh, don't.

(I actually deployed an ASRock Fatal1ty 990FX Killer ("World's First PCIe x2 M.2 Socket Motherboard") in a professional setting, for actual work, and am still cringing at the memory of the manual's intro ... so these Xeon's won't be the most cringe-worth thing I procure this decade...)

Re: Xeon Gold & Platinum Officially Confirmed

Posted: Thu Apr 27, 2017 7:47 am

by just brew it!

Years ago when I was doing the independent contracting thing full-time, I walked into a new client's office, and discovered that their "data center" consisted of some wire shelves along the wall, with rows of Alienware PCs (yes, with the alien-styled cases) sitting on them.

Re: Xeon Gold & Platinum Officially Confirmed

Posted: Thu Apr 27, 2017 1:21 pm

by Vhalidictes

just brew it! wrote:

The branding seems a little silly to me for a product aimed at the professional/datacenter market. What's next -- Xeon Fatal1ty Extreme Edition?

I agree about the branding weirdness, but more importantly I don't see a Extreme Gaming / SOHO business use case for these CPUs. That's just more cores than any application I can think of is going to need. I guess homebrew VMWARE hosts?

I can squint really hard and see RyZen 8C/16T as "futureproofing", but these SKUs?

EDIT: Reading comprehension - it helps! I got distracted by the socket discussion, there are clearly not HEDT chips. The XEON branding should really have made it clear to me.

Re: Xeon Gold & Platinum Officially Confirmed

Posted: Thu Apr 27, 2017 1:49 pm

by Mr Bill

just brew it! wrote:

The branding seems a little silly to me for a product aimed at the professional/datacenter market. What's next -- Xeon Fatal1ty Extreme Edition?

Even better, after gold and platinum are passe' they can come out with a titanium edition.

Re: Xeon Gold & Platinum Officially Confirmed

Posted: Thu Apr 27, 2017 1:57 pm

by Redocbew

Yeah, then we'd have gamers buying Xeon Titanium's to go with their Ti cards from Nvidia.

A few interestly bits:1. L1 cache sizes appear to be unchanged from existing Xeons at 32KB each for instruction/data.2. L2 cache sizes are once again showing the 1MB (1024KB) numbers that have been widely reported. 3. L3 cache sizes are once again showing 1.375 MB per core (18 cores * (1024KB * 1.375)/core = 25344 KB).4. The L3 is listed as being.. 11-way associative. Which is just weird because it's not a power of 2 like you normally expect. [Edit: 25344 and 1.375 MB are evenly divisible by 11, FWIW, although the associativity of individual cache entries does not strictly have to map to the overall size of the cache like that]5. Base clocks at 3.0 GHz.6. Reading the instruction feature dump at the top of the screen, you see all kinds of AVX-512 goodies.

At this point if the L2/L3 cache sizes are fake then somebody is going to a hell of a lot of trouble to report the exact same numbers all over the internet to the point that I'll be pretty surprised if the launched chips don't actually have that cache structure.