Developers can use TensorRT to deliver fast inference using INT8 or FP16 optimized precision that significantly reduces latency, as demanded by real-time services such as streaming video categorization on the cloud or object detection and segmentation on embedded and automotive platforms.

With TensorRT developers can focus on developing novel AI-powered applications rather than performance tuning for inference deployment. TensorRT runtime ensures optimal inference performance that can meet the most demanding latency and throughput requirements.

TensorRT can be deployed to Tesla GPUs in the datacenter, Jetson embedded platforms, and NVIDIA DRIVE autonomous driving platforms.

What's New in TensorRT 3?

TensorRT 3 is the key to unlocking optimal inference performance on Volta GPUs. It delivers up to 40x higher throughput in under 7ms real-time latency vs. CPU-Only inference.Highlights from this release include: