Monitoring Kubernetes HPA Utilization

In the past few weeks, I was working on migrating a legacy micro-service to Kubernetes platform. The migration process was relatively simple – mainly migrating the code from .NET 4.5 framework to .NET core 2.2. After making sure the service is deployed and working is expected, I started to gradually move production traffic to the new instance. The new service handle the traffic well, and I was happy – look like this task is about to complete!

After a few days of a gradual rollout, I felt good enough to move all the traffic to the new service. And then it hit me: will the new service be able to handle the load of production traffic? I mean, I configured a Horizontal Pod Autoscaler (HPA) for this service – but does it enough? Apparently – no. But before I’ll explain why, let’s do a quick recap on HPA.

Kubernetes Horizontal Pod Autoscaler (HPA)

The HPA is responsible to scale up (or down) our service by adding (or removing) pods, based on specific metrics. For example, this is the HPA for my new service, set to scale based on CPU usage:

maxReplicas and minReplicas defines the maximum and the minimum number of pods for my new service. The number of pods will be between 2 and 10.

targetCPUUtilizationPercentage define the CPU utilization thresholds – the HPA will change the number of pods if the average CPU of all pods will go above (or below) 50%. The actual algorithm is a bit more complex.

scaleTargetRef defines the HPA target – in this case, the deployment of my new service.

As I said, I configured the HPA – so, theoretically, my service should be able to scale and handle the load of production traffic. This is correct – with one small problem: What will happen if my service needs more than 10 pods? The HPA will not scale it up – because this will breach the maxReplicas setting. This could cause a serious production issue – so it’s better to ensure we have good monitoring in place!

Measuring HPA Utilization

In order to monitor the HPA, I want to measure it’s “utilization” – the number of running pods divided by the maximum number of pods defined. For example, 2 out of 10 is low utilization – but 9 out of 10 is high and should trigger an alert.

Calculating the utilization is simple using Kube-state-metrics. Kube-state-metrics is an agent that expose various metrics about all the objects in the cluster, including HPA. Calculating the utilization of the HPA is simple using the following Prometheus query:

Wrapping Up

Now I can create a simple alert, that will be triggered when the HPA efficiency is over 80% – giving me enough time to investigate and take an action. With this alert configured, I can safely move all the production traffic to the new service – and feel safe about it! What about you? Do you already monitor the utilization of your HPA?