GPUs on Compute Engine

Compute Engine provides graphics processing units (GPUs) that you can
add to your virtual machine instances. You can use these GPUs to accelerate
specific workloads on your instances such as machine learning and data
processing.

If you have graphics-intensive workloads, such as 3D visualization,
3D rendering, or virtual applications, you can create virtual workstations that use
NVIDIA® GRID® technology. For information on GPUs for graphics-intensive
applications, see GPUs for graphics workloads.

This document provides an overview of GPUs on Compute Engine, for more
information about working with GPUs, review the following resources:

Introduction

Compute Engine provides NVIDIA® Tesla® GPUs
for your instances in passthrough mode so that your virtual machine instances
have direct control over the GPUs and their associated memory.

Note: GPU instances cannot
live migrate
and must terminate for host maintenance
events. These maintenance events typically occur once each month. Maintenance
events can also occur more frequently when necessary. For information on
handling maintenance events, read GPU restrictions.

For compute workloads, GPU models are available in the following stages:

NVIDIA® Tesla® T4: nvidia-tesla-t4: Generally Available

NVIDIA® Tesla® V100: nvidia-tesla-v100: Generally Available

NVIDIA® Tesla® P100: nvidia-tesla-p100: Generally Available

NVIDIA® Tesla® P4: nvidia-tesla-p4: Generally Available

NVIDIA® Tesla® K80: nvidia-tesla-k80: Generally Available

For graphics workloads, GPU models are available in the following stages:

Network bandwidths and GPUs

Higher network bandwidths can improve the performance of distributed workloads.
On Compute Engine, network bandwidth is dependent on machine type and
the number of CPUs. For VM instances that have attached GPUs, the configuration
of your
GPU count, CPU, and memory impacts the network
bandwidth as well.
Also, to achieve the 50-100 Gbps rates, that are now available in Beta, your
VM instances must use the
Compute Engine virtual network interface (gVNIC).

The maximum bandwidths available on Compute Engine are as follows:

For VM instances that have P100, P4, and K80 GPUs attached, a maximum
bandwidth of 32 Gbps is available. This is similar to the maximum rate
available to VM instances that do not have GPUs attached. For more information
about network bandwidths, see maximum egress data rate.

For VM instances that have V100 and T4 GPUs attached, based on the GPU count,
you can now get a maximum bandwidth of up to 50 or 100 Gbps. To create VM
instances with V100 and T4 GPUs, which use up to 100 Gbps, see
using network bandwidths of up to 100 Gbps.

Bandwidth configurations

The following tables summarize the available network bandwidth for different VM
configurations of T4 and V100 GPU types.

Network bandwidth is automatically applied based on the VM instance
configuration. For example, if you have a VM instance that has a single V100 GPU
core, 12 vCPUs, and 78 GB of memory, then the maximum network bandwidth is 24 Gbps.

When you add a GPU to a preemptible instance, you use
your regular GPU quota. If you need a separate quota for preemptible GPUs,
request a separate Preemptible GPU quota.

Note: If you are requesting a Preemptible GPU quota for NVIDIA® V100®
GPUs, in the justification for the request, specify that the request is for
preemptible GPUs.

During maintenance events, preemptible instances with GPUs are preempted by
default and cannot be automatically restarted. If you want to recreate your
instances after they have been preempted, use a
managed instance group.
Managed instance groups recreate your instances if the vCPU, memory, and
GPU resources are available.

If you want a warning before your instance is preempted, or want to configure
your instance to automatically restart after a maintenance event, use a
non-preemptible instance with a GPU. For non-preemptible instances with GPUs,
Google provides
one hour advance notice
before preemption.

Compute Engine does not
charge you for GPUs if their instances are preempted in the first
minute after they start running.

Reserving GPUs with committed use discounts

To reserve GPU resources in a specific zone, see
Reserving zonal resources.
Reservations are required for committed use discounted pricing for GPUs.

GPU comparison chart

Review this section to learn more about factors such as performance
specifications, feature availability, and ideal workload types that are best suited
for the different GPU types that are available on Compute Engine.

The maximum CPU and memory that is available for any GPU type is dependent on the
zone in which the GPU resource is running. For more information about memory,
CPU resources, and available region and zones,
see GPU list.

Local SSD is supported for GPUs running in
all the available regions and zones with the exception of P4 GPUs.
P4 GPUs support local SSD in us-central1-c and us-central1-f zones only.

To compare GPU pricing for the different GPU types and regions that are available on Compute Engine,
see GPU pricing.

1To allow FP64 code to work correctly, a small number of FP64
hardware units are included in the T4 and P4 GPU architecture.

2This performance is achieved by using Tensor cores.

3TeraOperations per Second.

Restrictions

Instances with GPUs have specific restrictions that make them behave
differently than other instance types.

If you want to use Tesla K80 GPUs with your instances, the instances cannot
use the Intel Skylake or later CPU platforms.

GPUs are currently only supported with general-purpose N1 machine types.

GPU instances must
terminate for host maintenance events,
but can
automatically restart.
These maintenance events typically occur once per month,
but can occur more frequently when necessary. You must configure your
workloads to handle these maintenance events cleanly. Specifically,
long-running workloads like machine learning and high-performance
computing (HPC) must handle the interruption of host maintenance events.
Learn how to
handle host maintenance events
on instances with GPUs.

To protect Compute Engine systems and users, new projects have a
global GPU quota, which limits the total number of GPUs you can create in
any supported zone. When you request a GPU quota, you must request a quota
for the GPU models that you want to create in each region, and an additional
global quota for the total number of GPUs of all types in all zones.

Instances with one or more GPUs have a maximum number of vCPUs for
each GPU that you add to the instance. For example, each
NVIDIA® Tesla® K80 GPU allows you to have up to eight vCPUs
and up to 52 GB of memory in your instance machine type. To see
the available vCPU and memory ranges for different GPU configurations,
see the GPUs list.

GPUs require device drivers in order to function properly. NVIDIA GPUs running
on Compute Engine must use the following driver versions:

Linux instances:

NVIDIA 410.79 driver or greater

Windows Server instances:

NVIDIA 411.98 driver or greater

Instances with a specific attached GPU model are covered by the
Compute Engine SLA only if that attached GPU model is
available in more than one zone in the same region where the instance is
located. The Compute Engine SLA does not cover specific GPU models in
the following zones:

NVIDIA® Tesla® T4:

asia-northeast1-a

asia-south1-b

asia-southeast1-b

southamerica-east1-c

NVIDIA® Tesla® V100:

asia-east1-c

NVIDIA® Tesla® P100:

us-west1-b

europe-west4-a

NVIDIA® Tesla® K80:

us-west1-b

us-central1-c

Instances with NVIDIA® Tesla® P100 GPUs in europe-west1-d cannot
use Local SSD devices.