NVIDIA Tesla T4 aims at the inferencing market

Over at GTC Japan, NVIDIA introduced the Tesla T4 card. This is a new Turing-based product which seems to feature the TU106 GPU. It has 2560 CUDA cores, 320 Tensor cores, 16GB GDDR6 memory, over 320GB/s memory bandwidth, and a 75W TDP. The Tesla T4 aims specifically at the inferencing market, NVIDIA claims it's world's most advanced AI inference platform. Basically, this is not a product to train machine learning systems, but to actually run the trained algorithms.

The NVIDIA TensorRT™ Hyperscale Inference Platform features NVIDIA® Tesla® T4 GPUs based on the company’s breakthrough NVIDIA Turing™ architecture and a comprehensive set of new inference software.

Delivering the fastest performance with lower latency for end-to-end applications, the platform enables hyperscale data centers to offer new services, such as enhanced natural language interactions and direct answers to search queries rather than a list of possible results.

“Our customers are racing toward a future where every product and service will be touched and improved by AI,” said Ian Buck, vice president and general manager of Accelerated Business at NVIDIA. “The NVIDIA TensorRT Hyperscale Platform has been built to bring this to reality — faster and more efficiently than had been previously thought possible.”

Every day, massive data centers process billions of voice queries, translations, images, videos, recommendations and social media interactions. Each of these applications requires a different type of neural network residing on the server where the processing takes place.

To optimize the data center for maximum throughput and server utilization, the NVIDIA TensorRT Hyperscale Platform includes both real-time inference software and Tesla T4 GPUs, which process queries up to 40x faster than CPUs alone.

NVIDIA estimates that the AI inference industry is poised to grow in the next five years into a $20 billion market.

NVIDIA Tesla T4 GPU – Featuring 320 Turing Tensor Cores and 2,560 CUDA® cores, this new GPU provides breakthrough performance with flexible, multi-precision capabilities, from FP32 to FP16 to INT8, as well as INT4. Packaged in an energy-efficient, 75-watt, small PCIe form factor that easily fits into most servers, it offers 65 teraflops of peak performance for FP16, 130 teraflops for INT8 and 260 teraflops for INT4.