It’s aimed at enterprises dealing with heavy AI workloads that need to be shared across multiple GPU cloud clusters. Large datasets and models take a long time to train so using Kubernetes will speed up training and inference. It automatically deploys, maintains and schedules several GPU containers across cluster of nodes.

“With increasing number of AI powered applications and services and the broad availability of GPUs in public cloud, there is a need for open-source Kubernetes to be GPU-aware. With Kubernetes on NVIDIA GPUs, software developers and DevOps engineers can build and deploy GPU-accelerated deep learning training or inference applications to heterogeneous GPU clusters at scale, seamlessly,” Nvidia said in a blog post.

TensorRT 4

CEO Jensen Huang introduced TensorRT4 onstage at GTC in Santa Clara in March as an ‘inference engine’.

The framework makes it faster to carry out the intense number crunching for applications like neural machine translation apps or recommender systems. It also supports the open source ONNX platform so that developers can import models written in other deep learning frameworks, and it is integrated within TensorFlow via an API.

You can now download it here for free if you’re part of Nvidia’s Registered Developer Program.

Nvidia Dali + Nvidia nvJPEG

These are both libraries for computer vision.

Nvidia Dali makes it quicker to load, resize, and decode images in a dataset so they can be quickly used by models written in different frameworks, including Amazon’s MxNet, Google’s TensorFlow, and PyTorch.

Nvidia nvJPEG is aimed at decoding jpeg files using CPUs and GPUs.

Dali is on GitHub here. The download for the pre-release of nvJPEG is here.

Apex for PyTorch

It’s an open source extension that allows PyTorch developers to optimise their neural network models on Nvidia’s Volta GPU series.

It includes tools to speed up the training process for translational networks, sentiment analysis, and image classification.

“Apex offers automatic execution of operations in either FP16 or FP32, automatic handling of master parameter conversion, and automatic loss scaling, all available with 4 or fewer line changes to the existing code,” Nvidia explained in a blog post.