Challenge: Portability, speed, and cost

Application: Infrastructure for deep learning

Solution: Kubernetes on Azure

Scale experiments

10x – 50x

Scale hundreds of GPUs

2-3 months down to 2-3 days

Significantly reduced costs

for idle nodes

Grow Kubernetes clusters to more than 2,500 nodes

“Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters.” – Christopher Berner, Head of Infrastructure for OpenAI

“Research teams can now take advantage of the frameworks we’ve built on top of Kubernetes, which make it easy to launch experiments, scale them by 10x or 50x, and take little effort to manage.” – Christopher Berner, Head of Infrastructure for OpenAI