New HPC-targeted cloud virtual machines

Azure HC-series Virtual Machines are now generally available in the West US 2 and East US regions. HC-series virtual machines (VMs) are optimized for the most at-scale, computationally intensive HPC applications. For this class of workload, HC-series VMs are the most performant, scalable, and price-performant ever launched on Azure or elsewhere on the public cloud.

With the Intel® Xeon® Scalable processors, codenamed Skylake, the HC-series delivers up to 3.5 teraFLOPS (double precision) with AVX-512 instructions, 190 GB/s of memory bandwidth, rich support for Intel® Parallel Studio XE HPC software, and SR-IOV-based 100 Gb/s InfiniBand. For a single VM scale set, a customer can utilize up to 13,200 physical CPU cores and more than 100 TB of memory for a single distributed memory workload.

HC extends Azure’s commitment to delivering supercomputer-class scale and performance for tightly-coupled workloads to the public cloud, and doing so at price points every customer can afford. Today we can happily say that Azure has once again achieved a new milestone in cloud HPC scalability.

HC-series VMs expose 44 non-hyperthreaded CPU cores and 352 GB of RAM, with a baseclock of 2.7 GHz, an all-cores Turbo speed of 3.4 GHz, and a single-core Turbo speed of 3.7 GHz. HC VMs also feature a 700 GB local NVMeSSD, and support up to four Managed Disks including the new Azure P60/P70/P80 Premium Disks.

A flagship feature of HC-series VMs is 100 Gb/s InfiniBand from Mellanox. HC-series VMs expose the Mellanox ConnectX-5 dedicated back-end NIC via SR-IOV, meaning customers can use the same OFED driver stack that they’re accustomed to in a bare metal context. HC-series VMs deliver MPI latencies as low as 1.7 microseconds, with consistency, bandwidth, and message rates in line with bare-metal InfiniBand deployments. For context, this is 8x to 16x lower network latency than found elsewhere on the public cloud.

Molecular dynamics beyond 20,000 cores

The Azure HPC team benchmarked many widely used HPC applications to reflect the diverse needs of our customers. One common class of applications are those that simulate the physical and chemical properties of molecules, otherwise known as molecular dynamics. To see how far HC-series VMs could scale, we benchmarked it using CP2K. We chose CP2K for several reasons. For one, it’s widely-used both in academia and industry. In fact, CP2K is one of 13 applications used by PRACE as part of the Unified European Applications Benchmark Suite to drive acceptance testing of supercomputers deployed in Europe. For another, CP2K benefits from AVX-512 and so it is a good demonstration of what is possible when the latest hardware and software capabilities come together. Anyone can install and run CP2K as we have tested by following the procedure in our documentation here.

Our results from this scaling exercise as follows:

Nodes

Ranks/Node

Threads/Rank

Cases/day

Time to Solution

8

8

5

101

852.715

16

4

11

210

410.224

32

8

5

390

221.202

64

8

5

714

121.192

108

4

11

1028

84.723

128

8

5

1289

67.876

192

12

3

1515

57.827

256

4

11

3756

23.789

288

2

22

3927

22.009

392

2

22

4114

21.818

For the H20-DFT-LS benchmark (Figure 1), which a single-point energy calculation using linear-scaling DFT and 2048 water molecules, HC-series VMs successfully scaled to 392 VMs and 17,248 cores. Most impressively, at the largest level of scale and compared to our baseline of 8 VMs, HC VMs provided a 40.7x improvement in cases-per-day throughput as compared to only a 49x increase in VM resources. Here, 288 VMs offers the optimal balance in price-performance for large scaling.

Nodes

Ranks/Node

Threads/Rank

Cases/day

Time to Solution

24

6

7

55

1556.201

36

4

11

86

1002.111

44

11

4

219

394.847

64

8

5

294

293.091

108

4

11

482

179.469

112

7

6

482

179.344

128

8

5

530

163.095

176

11

4

685

126.899

256

4

11

960

90.14

324

4

11

1016

85.871

512

2

22

1440

60.176

For the LiHFX benchmark, which is a single-point energy calculation simulating a 216 atom Lithium Hydride crystal with 432 electrons, HC-series VMs successfully scaled to 289 VMs and 12,716 cores. Most impressively, at the largest level of scale and compared to our baseline of 24 VMs, HC VMs provided a 26.2x improvement in cases-per-day throughput for a 21.3x increase in VM resources.

Delighting HPC customers on Azure

The unique capabilities and cost-performance of HC-series VMs are a big win for scientists and engineers who depend on high-performance computing to drive their research and productivity to new heights. Organizations spanning aerospace, automotive, defense, financial services, heavy equipment, manufacturing, oil and gas, public sector academic, and government research can now use HC-series VMs to increase HPC application performance and deliver faster time-to-insight.