The Apollo server is an Apollo Gen 10 6500; a dual Xeon SP CPU server equipped with up to 8 Nvidia Tesla V100 GPUs (16GB versions) and the NVLink GPU-to-GPU interconnect. Mellanox EDR (100Gbit/s) InfiniBand was used to interconnect a cluster of eight ProLiant DL360 servers running the WekaIO Matrix filesystem with the Apollo 6500. Each DL360 contained four NVMe SSDs

HPE noted that “the WekaIO shared file system delivers comparable performance to a local NVMe drive and, in all but one test, WekaIO is faster than the local file system when scaled to four or eight GPUs.”

Resnet-50 results at varying GPU counts. Not all suppliers provide values at each GPU count level, which is why there are gaps in the chart.

However, HPE came second to IBM Spectrum AI systems at the 4- and 8-GPU levels in the alternative Resnet-152 workload, but beat it at the 1-GPU level.

There is no Spectrum AI result available at the 2-GPU level.

Why Spectrum AI is better than Apollo 6500/WekaIO at Resnet-152 but worse in Resnet-50 is something for machine learning experts to ponder. An HPE technical paper describes the hardware and software used in the tests above.

For reference, our recorded Resnet-50 and -152 supplier system results are below.

Not all suppliers provide values at each GPU count level, which explains the gaps in the tables.