This year, the same team tried to upgrade the system from the 20 NVIDIA K40 processors to take advantage of the more powerful NVIDIA P100 GPU processor, which was released in April of last year. We ran into a constraint, and (spoiler alert) it wasn’t the hardware.

The CUDA 8 software stack did not support a configuration beyond the device count of 16 to a rack — forcing us to limit our efforts to 16 cards. This is pretty capacious, giving us capacity of 57,344 CUDA cores. However, when the software evolves to support 20 NVIDIA P100 cards, with 71,680 CUDA cores, we can generate 186 TeraFLOPs Single-Precision (FP32) Performance and 94 TeraFLOPS of Double-Precision (FP-64) performance. This should be a contender for the most powerful single-node deep learning supercomputer in the market, based on what we know today.

Source: CocoLink

We also wanted to ensure that the platform we have relied on open standards for inter-device communications, and that means PCIe. We wanted to give our users the flexibility of choosing any PCIe-based accelerator card for deep learning, and any quantity of GPUs, up to the maximum number of 16 slots. And we wanted the maximum compute per slot. When we put that all together — again based on what we know of the market today — we chose to go with the fastest (PCIe-based) GPU for deep learning: the NVIDIA Tesla P100 16GB.

For optimal P2P performance in Deep Learning workloads, all GPUs were allocated under a single PCIe root complex with a bidirectional P2P bandwidth of approximately 25 GB/s. Driving that many powerful GPUs requires significant CPU power. We chose two Intel E5-2699 v4’s at 2.2GHz (88 CPU threads in total) with 512 GB of system memory.

Designed by Lumenir

We replaced the standard heatsinks with specially engineered heatsinks (single width) to pack all the GPUs in a 4RU chassis. We used high CFM fans to provide sufficient airflow, which ensures an optimal operating temperature of the GPUs at full throttle.

To test the capability of the system, we took it for a spin with Caffe2, the recently released deep learning framework which was created by a Facebook research team in collaboration with NVIDIA. We benchmarked it by training ResNet50 on the Imagenet image dataset and by measuring the image throughput during the training phase. All the P100 GPUs were overclocked at 1,328 MHz (base clock 1,189 MHz).

With an image training throughput of 2,050 images/sec this is like crossing the (figurative) Mach 2 barrier and makes it the fastest single node Deep Learning Supercomputer available in the market (as of this date) created out of commercially available components.

With upcoming Volta GPUs (PCIe version), the same system can be upgraded to 20 V100, and it would be very exciting for us to see the acceleration with the new Tensor Cores and FP16 support of Caffe2.

These results are very promising with NVIDIA V100 (Volta with NVLink) with FP16 support. If we can load our system with 20 PCIe V100 GPUs and they are supported by CUDA 9 and Caffe2 software stack then it may be possible to exceed 5,000 images/sec with a single node. That’s “Mach 5.” We can’t wait to see that happen!

Editor’s note: Deep learning researcher Alexandre Delteil collaborated on the project and also contributed to this report.

Artificial intelligence (AI) and natural language processing (NLP) are two fields that have seen enormous strides forward in recent years. Sarah Luger, who works with both technologies at Orange Silicon Valley, shared her impressions of where they are headed, as well as how they may shape livelihoods and human behavior.

Artificial intelligence is one of the most lucrative areas for tech talent in Silicon Valley right now, and companies across industries and sectors are looking for every advantage to get ahead. Already in 2018, Orange Silicon Valley has hosted a learning event about AI in human resources operations, discussed the impact of voice-AI interfaces in

To set the stage for a hard look at the future of technology in the world of Human Resources this week, Orange Silicon Valley brought together a lineup of recognized experts and thinkers to explain how artificial intelligence and deep learning have evolved and where they are headed in the workplace. The opening talks for

Follow us

About Orange Silicon Valley

Orange Silicon Valley (OSV) is the San Francisco Bay Area presence of Orange, one of the world’s leading telecommunications operators. We actively engage with San Francisco and Silicon Valley, and participate in the disruptive innovations changing the way we communicate.