What is “SKL-X”?

“Skylake-X” (E/EP) is the server/workstation/HEDT version of desktop/mobile Skylake CPU – the 6-th gen Core/Xeon replacing the current Haswell/Broadwell-E designs. It naturally does not contain an integrated GPU but what does contain is more cores, more PCIe lanes and more memory channels (up to 6 64-bit) for huge memory bandwidth.

While it may seem an “old core”, the 7-th gen Kabylake core is not much more than a stepping update with even the future 8-th gen Coffeelake rumored again to use the very same core. But what it does do is include the much expected 512-bit AVX512 instruction set (ISA) that are are not enabled in the current desktop/mobile parts.

SKL-X does not only support DDR4 but also NVM-DIMMs (non-volatile memory DIMMs) and PMem (Persistent Memory) that should revolutionise future computing with no need for memory refresh or immediate sleep/resume (no need to save/restore memory from storage).

In this article we test CPU Cache and Memory performance; please see our other articles on:

Hardware Specifications

We are comparing the top-end desktop Core i9 with current competing architectures from both AMD and Intel as well as its previous version.

CPU Specifications

Intel i9 7900X (Skylake-X)

AMD Ryzen 1700X

Intel i7 6700K (Skylake)

Intel i7 5820K (Haswell-E)

Comments

TLB 4kB pages

64 4-way / 64 8-way
1536 8-way

64 full-way
1536 8-way

64 4-way / 64 8-way
1536 6-way

64 4-way
1024 8-way

Ryzen has comparatively ‘better’ TLBs than all Intel CPUs.

TLB 2MB pages

8 full-way
1536 2-way

64 full-way
1536 2-way

8 full-way
1536 6-way

8 full-way
1024 8-way

Again Ryzen has ‘better’ TLBs than all Intel versions

Memory Controller Speed (MHz)

800-3300

600-1200

800-4000

1200-4000

Intel’s UNC clock runs higher than Ryzen

Memory Speed (Mhz) Max

3200 / 2667

2400 / 2667

2533 /2667

2133 / 2133

SKL-X can officially go as high as Ryzen and normal SKL @ 2667 but runs happily at 3200Mt/s.

Memory Channels / Width

4 / 256-bit (max 8 / 384-bit)

2 / 128-bit

2 / 128-bit

4 / 256-bit

SKL-X has 2 memory controllers each with up to 3 channels each for massive memory bandwidth.

Memory Timing (clocks)

16-18-18-36 6-54-19-4 2T

14-16-16-32 7-54-18-9 2T

16-18-18-36 5-54-21-10 2T

14-15-15-36 4-51-16-3 2T

SKL-X can run as tight timings as normal SKL or Ryzen.

Core Topology and Testing

Intel has dropped the (dual) ring bus(es) and instead opted for a mesh inter-connect between cores; on desktop parts this should not cause latency differences between cores (as with Ryzen) but on high-end server parts with many cores (up to 28) this may not be the case. The much increased L2 cache (1MB vs. old 256kB) should alleviate this issue – though the L3 cache seems to have been reduced quite a bit.

Native Performance

We are testing bandwidth and latency performance using all the available SIMD instruction sets (AVX, AVX2/FMA, AVX512) supported by the CPUs.

The large L2 caches also have 2x more bandwidth than either HSW-E or Ryzen.

Aggregated L3 Bandwidth (GB/s)

289

392 [+35%]

247

205

The 2 Ryzen L3 caches have higher bandwidth than all Intel CPUs.

Aggregated Memory (GB/s)

69.3 [+2.4x]

28.5

31

42.5

With its 4 channels SKL-X reigns supreme with almost 2.5x more bandwidth than Ryzen.

The widened ports on the L1 and L2 caches allow SKL-X to demolish the competition with over 2x more bandwidth than either Ryzen or older HSW-E; only the smaller L3 cache falters. Its 4 channels running at 3200Mt/s yield huge memory bandwidth that greatly help streaming algorithms. SKL-X is a monster – we can only speculate what the server 6-channel version would score.

Data In-Page Random Latency (ns)

26 [1/2.84x] (4-13-33)

74 (4-17-36)

20 (4-12-21)

25 (4-12-26)

SKL-X has comparable lantecy with SKL and HSW-E and much better than Ryzen.

Data Full Random Latency (ns)

75 [-21%] (4-13-70)

95 (4-17-37)

65 (4-12-34)

72 (4-13-52)

Full random latencies are a bit higher than expected but on part with HSW-E and better than Ryzen.

Data Sequential Latency (ns)

5.4 [+28%] (4-11-13)

4.2 (4-7-7)

4.1 (4-12-13)

7 (4-12-13)

Strangely SKL-X does not do as well as SKL here or Ryzen but at least it beats HSW-E.

If you were hoping SKL-E to match normal SKL that is sadly not the case even at similar Turbo clock they are higher across the board, even allowing Ryzen a win. Perhaps further platform optimisations are needed.

Code In-Page Random Latency (ns)

12 [-27%] (4-14-28)

16.6 (4-9-25)

10 (4-11-21)

15.8 (3-20-29)

With code SKL-X performs better though not enough to catch normal SKL.

SKL-X again does not manage to match normal SKL but soundly trounces both Ryzen and its older HSW-E brother, delivering a good result overall. Code access seems to perform more consistently than data for some reason we need to investigate.

Memory Update Transactional (MTPS)

52.2 [+12x] HLE

4.23

32.4 HLE

7

SKL-X with working HLE is over 12-times faster than Ryzen and older HSW-E.

Memory Update Record Only (MTPS)

57.2 [+13.6x] HLE

4.19

25.4 HLE

5.47

SKL-X is king of the hill with nothing getting close.

Yes – Intel has finally fixed HLE/RTL which owners of HSW-E and BRW-E must feel very hard done by considering it was “working” before having it disabled due to the errata. Thus after so many years we have both HLE, RTL and AVX512! Great!

If there was any doubt, SKL-X does not disappoint – massive cache (L1D and L2) aggregate and memory bandwidths with server versions likely even more; the smaller L3 cache does falter though which is a bit of a surprise – the larger L2 caches must have forced some compromises to be made.

Latency is a bit disappointing compared to the “normal” SKL/KBL we have on desktop, but are still better than older HSW-E and also Ryzen competitor. Again the L1 and L2 caches (despite being 4-times bigger) clock latencies are OK with the L3 and memory controller being the source of the increased latencies.

SiSoftware Official Ranker Scores

Final Thoughts / Conclusions

After a strong CPU performance we did not expect the cache and memory performance to disappoint – and it does not. SKL-X is a big improvement over the older versions (HSW-E) and competition with few weaknesses.

The mesh interconnect does seem to exhibit higher inter-core latencies with small increase in bandwidth; perhaps this can be fixed.

The very much reduced L3 cache does disappoint both bandwidth and latency wise; the memory controllers provide huge bandwidth but at the expense of higher latencies.

All in all, if you can afford it, there is no question that SKL-X is worth it. But better wait to see what AMD’s Threadripper has in store before making your choice… 😉