At GTC Japan today, NVIDIA announced the new Tesla T4 GPU that is “powered by Turing Tensor cores” and the Tesla T4 is designed for AI inference. It is touted to enhance the user experiences in current AI applications by speeding up inference from trained data models extracted from deep learning systems. According to NVIDIA, as these models are increasingly more accurate and complex, Tesla T4 GPU has the compute capabilities to handle multi-precisions computations for real-time inference application like video analytics and conversational AI applications.

With its relatively-low power requirements of 75W of the Tesla T4 GPU and its PCIe form factor, the Tesla T4 GPU is readily deployable in servers and can be scaled up to deliver up to 1 petaflops inference performance in a single scaled-up server. The Tesla T4 GPU has 320 Turing Tensor cores and 2,560 NVIDIA CUDA cores. It features 16GB of GDDR6 VRAM and has a memory bandwidth of more than 320GB/s.

NVIDIA also took the chance to introduce the new TensorRT Hyperscale Platform, which is positioned as the next-generation AI data center platform. The new platform consists of hardware, which is primarily the Tesla T4 GPU, and a “comprehensive set of new inference software.” The NVIDIA TensorRT 5 inference optimizer and runtime engine is able to leverage on Turing Tensor cores. At the same time, NVIDIA TensorRT 5 supports multi-precision workloads with improved performances.

As part of the new software solution from NVIDIA, the TensorRT inference server “encapsulates” data models and frameworks for easy deployment in a cloud computing environment. The latest version of the TensorRT inference server container is also available from NVIDIA GPU Cloud for developers to “experiment with” now.