Supercomputer Module Doubles Performance for Faster Neural Nets

NVIDIA’s Jetson TX1 was an amazing platform that delivered supercomputer performance in a compact module. The 256-core GPU was based on NVIDIA’s Maxwell architecture and paired with 64-bit ARM cores.

The latest Jetson TX2 (Fig. 1) doubles the performance of the Jetson TX1. Alternatively, developers can get the same performance out of the TX2 as the TX1 while cutting the amount of power in half to just 7.5 W. Each approach has its merits. This might double the runtime of a battery powered Jetson TX2 device or it could do twice the work in the same power envelope. For example, it might track twice as many objects or handle two camera input streams instead of one.

1. The Jetson TX1 doubles its predecessor's performance.

The Jetson family supports the CUDA programming environment, as well as deep learning and deep neural nets (DNN), courtesy of the Cuda DNN (cuDNN) runtime targets. The cuDNN software can support DNN frameworks like the open-source system from TensorFlow. The modules are small enough to work in midsize and larger drones, providing image recognition and planning support that would not be possible with a less power microprocessor.

The Jetson TX2 is the same 50-mm by 87-mm module size as its predecessor. This allows it to plug into the same carrier boards like ConnectTech’s Orbitty (Fig. 2) that I tested with the Jetson TX1. The Orbitty provides connections for gigabit Ethernet, USB 3.0, USB 2.0 OTG, HDMI, two 3.3V UARTs, I2C, and four GPIOs. It also has a microSD socket. It is designed to operate in temperatures from −40° C to +85° C. The power voltage is between 9 VDC and 14 VDC.

2. The Jetson TX2 can be plugged into carrier boards like ConnectTechâs Orbitty.

The Jetson TX2 gets it performance from the 256-core NVIDIA Pascal GPU. The architecture was first released in the NVIDIA Tesla P100. The P100 uses High Bandwidth Memory 2 (HBM2) and Chip-on-Wafer-on-Substrate (CoWoS) technology for 5 TFLOPS of double-precision performance and almost 10 TFLOPS of single precision performance. The Jetson TX1 delivers 1 TFLOP of single precision performance while the Jetson TX2 doubles that. Of course, the P100 uses 250 W when running full out.

The Jetson TX2 also has a pair of 64-bit NVIDIA Denver 2 ARM-compatible cores, plus four 64-bit ARM A57 cores. The module doubles the amount of memory to 8 Gbytes of LPDDR4 and a 32 Gbyte eMMC flash module. It retains the 802.11ac WLAN, Bluetooth, and 1-Gbit Ethernet links. The video subsystem can now handle 2160p/4K by 2K at 60 frames/s for encode and decode. It can also accept up to 12 CSI lanes supporting up to six cameras with a 2.5 gigabyte/second/lane. The system runs Linux for Tegra, based on Ubuntu.

The Jetson TX2 will be shown at embedded world. It will likely be used to update many existing Jetson TX1-based solutions like Cisco’s 70-in, 4K Spark Board since it is essentially a plug-in replacement. Some tweaking may be needed to take full advantage of the new hardware. The module is available for $399 in quantities of 1,000. The Jetson TX2 Developer Kit is $599 with an education version available for $299.