This site may earn affiliate commissions from the links on this page. Terms of use.

With recent news that Google’s Cloud TPU is more cost efficient than Nvidia’s Volta, you might think Google would be gearing up to replace competitor GPUs with its own hardware. Instead, the company is extending its GPU offerings to Nvidia’s Volta and deploying the GPU across a wider range of engines. More than anything, this speaks to the wide range of support for various types of machine learning, AI workloads, and general compute performance now available through cloud services.

According to Google’s latest blog post, Nvidia V100 Tesla GPUs are now available for both the Compute Engine and Kubernetes Engines. Kupernetes is typically used to scale containerized applications across the cloud, while Google’s Compute Engine is the cloud of servers that are used to run workloads that can be offloaded for processing and computation. And now, courtesy of this latest upgrade, you can stack eight Tesla V100 GPUs, 96vCPUs (Google’s term for a virtualized CPU core), and 624GB of memory in a single VM, which tends to be a bit more horsepower than your typical individual server can muster. Next-generation NVLink offers up to 300GB/s of GPU-to-GPU bandwidth, and the benefits apparently scale well — Google claims this boosts performance in deep learning and HPC workloads by up to 40 percent.

Google writes that “NVIDIA V100s are available immediately in the following regions: us-west1, us-central1 and europe-west4. Each V100 GPU is priced as low as $2.48 per hour for on-demand VMs and $1.24 per hour for Preemptible VMs. Like our other GPUs, the V100 is also billed by the second and Sustained Use Discounts apply.”

The company also gives a bit of light workload guidance. It recommends Nvidia’s P100 GPUs, which are based on Pascal, for workloads that need a balance between price and performance, while K80 (Kepler) GPUs are still available as well for even less-demanding workloads.

This chart indirectly highlights why cloud computing can make financial sense for both companies and users. In consumer parlance, K80 is ancient. It’s a dual-GPU part with 2,496 cores and 2x 12GB of GDDR5. It’s thoroughly outclassed and outperformed by GPUs like the V100 — but it’s also available for as little as 45 cents per hour. If you were comparing that kind of hourly rate with the cost of buying an entire server when K80 was new, you’d obviously save a huge amount of money by only tapping the cloud for the workloads you need to process.

Based on the price we’re seeing for basic servers and K80 itself, the break-even point between the cost of a new K80 and using Google’s Compute Engine is around 10,000 hours. So if you think you’re going to need the equivalent of 416 days of GPU time to process a workload, it might be better to just buy the card. And Nvidia, of course, doesn’t really care about the difference between selling a card to a research institution that’ll deploy it in its own servers, and selling one to Google or Amazon or Microsoft to integrate as part of a cloud server offering. Nvidia just wants to sell the card.

Interestingly enough, there’s a hint on the same page that Google might soon extend its cloud offerings to encompass cheaper cards. One reader, Seth Price, writes: “I don’t need a top-shelf GPU for the work I do; I only need cheap FLOPS/$. Will we ever see consumer-level GPUs offered on GCP that are worth less than $1,000 each… I’m just looking for fast, efficient, GPU processing at a low cost. I don’t need them to be the best CUDA boxes available. I don’t need scientific-level double floating point precision. I don’t need ridiculous amounts of VRAM. It’s not cost effective for me to pay 100x more per hour for a GPU instance than a CPU instance and I really don’t want to have to build my own boxes. Will GCP ever be able to fill these requirements?”