Moving Fulcrum

Negative impact of AVX Workloads on Cloud VMs and Kubernetes clusters

Published August 15th 2018

Intel processors offer the AVX-512 instruction set to allow high performance for vectorized workloads. You would be correct in being tempted to use it on your applications/databases deployed in the cloud.

However, there is a flip side to it.

Using the AVX instructions will cause the entire processor to get clocked down! This has huge implications.

Affect on Cloud VMs

The AVX slowdown doesn't care about VM boundaries. When you rent a VM on AWS, GCP, etc, you are getting access to just a few of the many cores from any physical processor.

Lets say a processor on AWS has 4 cores, and you request 2 for your VM. Another account B on AWS spins up a VM and gets assigned 2 of the remaining cores from that same processor. Now B starts running some AVX heavy workload. Well, what do you know, it results in your VMs getting slowed down too!

AVX512 is architecturally transparent to VTx (or the other way around, depending on how you view these things).

It means your own docker containers running AVX workloads can slowdown your other containers, despite the resource limits being set. Not only that, a different account's Kubernetes cluster which has pods scheduled on a different VM but on the same physical processor as your VM can impact your containers!

This was pointed out by Kelly Sommers yesterday.

So here’s a real question. What does Amazon and Microsoft and other kubernetes cloud services do to prevent your containers from losing 11ghz of performance because someone deployed some AVX optimized algorithm on the same host?

This is a Catch-22 situation all around. Cloud vendors want to offer VMs with AVX-512 instructions enabled to allow their users to get better performance. It is in the best interest of the individual user to use it. However, doing so may not only impact their own VMs/containers but even another account's VMs.