The Nvidia DGX-1 is a new HPC system (not just a server) that features the Tesla P100 accelerators for GPU computing. It includes 2x Intel Xeon E5-2698 v3 (16 core, Haswell-EP) and 8 P100s for 28,672 CUDA cores and 128GB of shared VRAM. The DGX-1 is rated to be able to hit 170 FP16 TFLOPs of performance (or 85 FP32 TFLOPs) inside of 3Us.

The P100 has a new form factor and connector requiring a completely new infrastructure to run. The 8 P100s are installed in a hybrid mesh cube configuration, making full use of the NVLink interconnect tooffer a significant amount of memory bandwidth between the GPUs. Each NVLink offers a bidirectional 20GB/sec up 20GB/sec down, with 4 links per GP100 GPU, for an aggregate bandwidth of 80GB/sec up and another 80GB/sec down.

The DGX-1 system runs Canonical’s Ubuntu Server and drivers for the Pascal GPUs created by Nvidia. Note that most hyperscalers are deploying large CPU / GPU clusters to train their neural networksusing Ubuntu. The system also includes Nvidia’s Deep Learning SDK and its DIGITS GPU training system as well as the CUDA programming environment and a bunch of nice machine learning frameworks all bundled and tuned for the Pascal GPUs. Nvidia invested heavily in NVLink, their higher-speed interconnect to enable fast memory access between GPUs, and unified memory between the GPU and CPU.

The downside is you are locked into the Intel / Nvidia combo for inefficient integrated x86 CPU / GPU computing (Nvidia doesn't have an x86 license). The lack of competition in this space is disconcerting.

The upside is that Intel has fast x86 CPUs and fast storage in Octane - and Nvidia has a nice accessible language in CUDA.

The DGX-1 allows high performance for deep learning and neural network applications. Features include:

2x Intel Xeon E5-2698 v3 (16 core, Haswell-EP)

8 P100s for 28,672 CUDA cores and 128GB of shared VRAM

High speed, high bandwidth interconnect for maximum application scalability

The Radeon Open Compute Platform (ROCm) is an open source platform for GPU computing that is language independent and brings modular software development to GPU computing. This provides a real cheaper alternative to Nvidia's CUDA and helps developers in coding compute-oriented software for AMD Radeon GPUs along with converting existing CUDA software to run on GCN hardware.

In the past you almost had to purchase the Intel / Nvidia combo for serious GPU computing (Nvidia doesn't have an x86 license). Now you have AMD Zen as an option on the CPU side and AMD Radeon on the GPU side of things.

For now the compute cards from AMD are the S-series, like the S9300 (2x4096 GCN cores). The fastest APU is stuck at 384 cores. AMD will soon release server based APU's, like Raven Ridge, which will be a much lower cost solution than buying a server CPU and discrete server GPU separately. You should be able to easily port over any previous code that was used for previous hardware.

With Zen AMD could offer a nice HPC ecosystem that has good communication between x86 CPU and GPU, large memory bandwidth and excellent I/O. While Intel may have a faster x86 CPUs and a faster storage medium in Octane - and Nvidia may have faster GPUs along with a more accessible language in CUDA - AMD could have a better integrated HPC system where the sum exceeds the performance of the individual Intel / Nvidia combo parts.

Of course this can be used for deep learning and neural network applications more efficiently at lower cost. Features include: