Move Fast and Break the Cloud

A Benchmark of VM Boot Times on the Google Cloud

For an overview of Google Cloud vs. AWS in terms of pricing and features, see my previous article, A Tale of Two Clouds.

A good rule of thumb on The Cloud™ is that VMs typically boot in 1 to 2 minutes. In this article, we’ll test this hypothesis on the Google Cloud by launching ~5,000 VMs and timing how long it takes for each one to boot.

To that end, the metric we’ll use is time-to-SSH, i.e. the number of seconds from the time we request a new VM to the time we can SSH into it. The scripts used to perform these automated tests are shared towards the end of the article.

Starting Small

We’ll start by measuring the boot time of a relatively small VM, and keep track of how boot time fluctuates over time.

Using a VM with 8 CPUs and 30 GB of RAM (with a 20 GB disk and Debian 9 as the boot OS), I measured time-to-SSH once every 10 minutes, over the course of a few days. Although there were some fluctuation, booting a VM is surprisingly fast: it consistently takes only ~23 seconds for an 8-CPU VM to boot, regardless of the time of day:

Launching VMs on GCP is very fast! This test ran between 07/25/2018 and 07/29/2018 in us-west1-b.

The Bigger Picture

At this point, you’re probably wondering how a VM’s boot time would scale with increasing amounts of CPU and RAM. Well, you’re in luck, because the next benchmark measures just that, using the following VM configurations:

For each of the machine types above, I launched a VM of that type every 10 minutes, for a week. Interestingly, the increase in boot time is not linear with VM size! Instead, it consistently averages ~25 seconds for VMs with ≤32 CPUs, and under a minute for VMs with ≤96 CPUs:

VMs with ≤32 CPUs boot in under 25 seconds on GCP. Overall, VM with ≤96 CPUs feature a median boot time of under a minute! This test was run in region us-west1-b between 07/31/2018 and 08/07/2018.

It’s interesting that VM boot times stop being constant past 32 CPUs. I’d love to hear from someone at Google Cloud about why that might be.

Don’t Take My Word for It

If you’d like to run the benchmarks above in your own environment, using different VM size/zone/image/etc, the code to automatically launch thousands of benchmarks on GCP is available on GitHub. Simply clone the repo on a small VM in GCP, define benchmark settings in config.json, and launch ./run.sh config.json.

Depending on the VM sizes you’re launching and how many you launch at once, you may need to request a CPU quota increase. Also, if you’re launching benchmarks with thousands of VMs, make sure to keep an eye on the billing! That said, VMs created by the benchmark are programmed to self-destruct after 2 seconds, which avoids runaway VMs.

We’ve already seen that VM size plays a big factor in boot time. The other important factor is the contents of your boot image. Here I used a standard Debian 9 image, but if you’re using large, custom boot images, you may see a slowdown.

Conclusion

Overall, launching virtual machines on the Google Cloud is very fast. The ability to launch 96-CPU VMs in under a minute is certainly great news for load balancing — fast boot times means you can react and auto-scale more rapidly in response to changing traffic patterns. Moreover, fast boot times and per-second usage billing support a model where launching VMs on the fly for time-consuming analyses (more than say 10 minutes) is more cost effective than keeping large VMs waiting around for the next job to process.