Intel unveils Xeon Cascade Lake Advanced Performance Platform

Ahead of the Supercomputing conference next week, Intel has announced a new market segment for Xeons called Cascade Lake Advanced Platform (CXL-AP). This represents a new, higher core count option in the Xeon Scalable family, which currently tops out at 28 cores.

Through the use of a multi-chip package (MCP), Intel will now be able to offer up to 48-cores, with 12 DDR4 memory channels per socket. Cascade Lake AP is being targeted at dual socket systems bringing the total core count up to 96-cores.

Intel's Ultra Path Interconnect (UPI), introduced in Skylake-EP for multi-socket communication, is used to connect both the MCP packages on a single processor together, as well as the two processors in a 2S configuration.

When asked about this, Intel says that the issues they previously pointed out with aren't inherently because it's a multi-die design, but rather the quality of the interconnect. By utilizing UPI for the interconnect, Intel claims their MCP design will provide performance consistency not found in other solutions. They were also quick to point out that this is not their first Xeon design utilizing multiple packages.

Intel provided some performance claims against the current 32-core Epyc 7601, of up to 3.4X greater performance in Linpack, and up to 1.3x in Stream Triad.

As usual, whether or not these claims are validated will come down to external testing when people have these new Cascade Lake AP processors in-hand, which is set to be in the first half of 2019.

More details on the entire Cascade Lake family, including Cascade Lake AP, are set to come at next week's Supercomputing conference, so stay tuned for more information as it becomes available!

Looking further, Intel has SMT disabled on the EPYK and is using their compiler, not GCC, which they have declarations about not supporting advanced instructions on non-Intel processors. They are also known to do a lot of very specific optimization for linpack as a synthetic benchmark.

In short, I would take it with a grain of salt, along the lines of some of their other recent benchmarks like those of the 9900k.

I'm sure that this will be faster than the current EPYC, but at what cost and power draw? I just hope that the press continues to keep their disingenuous benchmarks in check.

"Intel provided some performance claims against the current 32-core Epyc 7601, of up to 3.4X greater performance in Linpack, and up to 1.3x in Stream Triad."

Is that with or without HyperThreading(TM) enabled as Intel's implementation of SMT(Hyperthreading) has some side channel attack vector issues for a smash right in the execution ports in the form of PortSmash.

And sure Intel's AVX-512 should perform better in some workloads but maybe AMD is better off with smaller AVX Units and less power used for most workloads that do not make use of AVX. And if a customer wants more FP horsepower then AMD's Vega 20 at 7nm will fill the bill for loads more of DP FP/SP FP than any Xeon. Ditto for Nvidia's Volta based Tesla SKUs and x86 or Power9 processors.

If TSMC's 7nm can reduce the power usage to such a degree then maybe it will be easier for AMD to engineer up a Dual Vega 20 DIEs on a single PCIe card variant and still make the single card power/thermal budget. So Dual 7nm Vega 20 DIEs wired up across the Card's PCB via XGMI(Infinity Fabric Based) for a Dual GPU variant that appears to software as just one larger logical GPU. Nvidia does something similar with its NVLink IP that's fully enabled on Nvidia Professional GPU products.

So that's a lot of shader cores if one Vega 20 die still offers that 4096 shader cores and AMD can xGMI up 2 Vega 20 DIEs on a single card's PCB. It's not really about high clock rates on Pro Compute/AI Cards it's more about GFlops/Watt on GInferences/Watt and server density. So if TSMC's 7nm can save enough on the power/thermal loads then Vega 20 could be doubled up and that's at Vega 20's 1/2 DP FP to 1 SP FP ratio.

But Intel is in with Glue now under the influnce of AMD's Zen/Zeppelin modular DIE designs for Epyc/Naples that's about to be doubled up with Epyc/Rome on TSMC's 7nm process node. And here we are in the in the midst of the greatest Core Wars in the microprocessor era with that Battle Cry of Moar Cores echoing through the hills along with the sounds of advancing marketing monkey divisions.

P.S.

I sure hope that some in the Press asks AMD for more information on XGMI(Infinity Fabric Based) as well as More Questions about Epyc/Rome and XGMI support also. Even Fist generation Zen Epyc/Naples has GMI support for cross CPU socket Infinity Fabric support and XGMI is just eXternal Global Memory Interconnect. So that should enable Epyc to Vega 20 GPU Infinity Fabric Traffic also just like NVLink can be used to interface a CPU directly to a Nvidia GPU.

"Other decisions about configuring the systems under test will likely raise louder objections. Intel didn't note whether Hyper-Threading would be available from Cascade Lake-AP chips, and indeed, its comparative numbers against that dual-socket Epyc 7601 system were obtained with SMT off on the AMD platform. 64 active cores is nothing to sniff at, to be sure, but when a platform is capable of throwing 128 threads at a problem and one artificially slices that number in half, eyebrows are going to go up." (1)

This is the expensive marketing part that tries to make it look like they can compete on core count. It is the exact equivalent of a 4 socket system. About all that it will allow is a smaller form factor. Four socket boards are ridiculously large, so this is somewhat important, but it it still isn’t a very good solution. There will be no power savings by placing 2 die on a package. They aren’t going to design completely new UPI links for a single, over priced, marketing part. AMD has specialized on package links that are significantly lower power than the off package links while being faster. AMD also has massive economy of scale since it is the same die in a $100 Ryzen 3 as a 32 core Epyc. There isn’t the same economy of scale with a 24 core Xeon. Since intel will still be stuck on 14 nm, they could have quite good yields with a several year old process, even though it is a much larger die. It will be interesting to see what AMD’s Zen 2 die size is if they go with 16 cores to a die at 7 nm.

That's 8, Zen-2, chiplets around some big giant I/O DIE, and I think that there may even be some L4 cache on that big I/O Die for Zen-2 based Epyc/Rome SKU. I suspect that AMD has stuck with the 4 core CCX unit at 2 CCX units per Zen-2 die but who Knows. I'll bet that Zen-2 L3 cash size is much larger per CCX also what with all the space saved on the chiplets from moving the memory controllers to the I/O die. 256 AVX is also going to take up space as will doubling the FP pathways. That I/O die is still fabbed at 14nm and the TSMC 7nm is for the Zen-2 die chiplets that will most likely have some nice 80% Die/Wafer yields just like the Zen/Naples Zen/Zeppelin Die had high yields due to its small size.