Scaling Based on CPU or Load Balancing Serving Capacity

The simplest form of autoscaling is to scale based on the CPU utilization of a
group of virtual machine instances. You can also choose to scale based on the
HTTP(S) load balancing serving capacity of a group of instances.

Read the Before you
begin section of the Autoscaling Overview topic for important setup steps.

Scaling based on CPU utilization

You can autoscale based on the average CPU utilization of a managed
instance group. Using this policy tells the autoscaler to collect the CPU
utilization of the instances in the group and determine whether it needs
to scale. You set the target CPU utilization the autoscaler should maintain and
the autoscaler will work to maintain that level.

The autoscaler treats the target CPU utilization level as a fraction of the
average use of all vCPUs over time in the instance group. If
the average usage of your total vCPUs exceeds the target utilization, the
autoscaler will add more virtual machines. For example, setting a 0.75 target
utilization tells the autoscaler to maintain an average usage of 75% among all
vCPUs in the instance group.

Enable autoscaling based on CPU utilization

Console

If you have an instance group, select it and click Edit group. If you
don't have an instance group, click Create instance group.

Under Autoscaling, select On.

Under Autoscale based on, select CPU usage.

Enter the Target CPU usage. This will be treated as a percentage. For
example, for 60% CPU usage, enter 60.

Provide a number for the maximum number of instances that you want in
this instance group. You can also set the minimum number of instances and
the cool down period. The cool down period is the number of seconds the
autoscaler should wait after a virtual machine has started before the
autoscaler starts collecting information from it. This accounts for the
amount of time it can take for a virtual machine to initialize, during
which the collected usage is not reliable for autoscaling. The default
cool down period is 60 seconds.

Save your changes.

gcloud

Use the
set-autoscaling
sub-command to enable autoscaling for a managed instance group. For example,
the following command creates an autoscaler that has a target CPU
utilization of 75%. Along with the --target-cpu-utilization parameter,
the --max-num-replicas parameter is also required when creating an
autoscaler:

Optionally, you can use the --cool-down-period flag, which tells the
autoscaler how many seconds to wait after a new instance has started before
it starts to collect usage. After the cool-down period passes, the
autoscaler begins to collect usage information from the new instance and
determines if the group requires additional instances. This accounts for
the amount of time it might take for the instance to initialize, during
which the collected usage is not reliable for autoscaling. The default
cool down period is 60 seconds.

You can verify that autoscaling was successfully enabled using the
instance-groups managed describe sub-command, which describes the
corresponding managed instance group and provides information about
any autoscaling features for that instance group:

Optionally, you can use the coolDownPeriodSec parameter, which tells the
autoscaler how many seconds to wait after a new instance has started before
it starts to collect usage. After the cool-down period passes, the
autoscaler begins to collect usage information from the new instance and
determines if the group requires additional instances. This accounts for
the amount of time it might take for the instance to initialize, during
which the collected usage is not reliable for autoscaling. The
default cool down period is 60 seconds.

How autoscaler handles heavy CPU utilization

During periods of heavy CPU utilization, if utilization reaches close to
100%, the autoscaler estimates that the group may already be heavily
overloaded. In these cases, the autoscaler increases the number of virtual
machines by at least an extra 50% or a minimum of 4 instances, whichever is
higher. In general, CPU utilization within a managed instance group will not
exceed 100%.

Note: Although this is the current behavior, this might change in the future
and it is not recommended that you rely on this behavior.

Scaling based on HTTP(S) load balancing serving capacity

Compute Engine provides support for load balancing within your
instance groups. You can use autoscaling in conjunction with load balancing by
setting up an autoscaler that scales based on the load of your instances.

An HTTP(S) load balancer spreads load
across backend services,
which distributes traffic among instance groups. Within the backend service, you
can define the load balancing serving capacity of the instance groups associated
with the backend as maximum CPU utilization, maximum requests per second (RPS),
or maximum requests per second of the group. When an instance group reaches the
serving capacity, the backend service will start sending traffic to another
instance group.

When you attach an autoscaler to an HTTP(S) load balancer, the autoscaler will
scale the managed instance group to maintain a fraction of the load balancing
serving capacity.

For example, assume the load balancing serving capacity of a managed instance
group is defined as 100 RPS per instance. If you create an autoscaler with
the HTTP(S) load balancing policy and set it to maintain a target utilization
level of 0.8 or 80%, the autoscaler will add or remove instances from the
managed instance group to maintain 80% of the serving capacity, or 80 RPS per
instance.

The following diagram illustrates how the autoscaler interacts with a managed
instance group and backend service:

The autoscaler watches the serving capacity of the managed instance group,
which is defined in the backend service, and scales based on the target
utilization. In this example, the serving capacity is measured in the
maxRatePerInstance value.

Applicable load balancing configurations

You can set one of three options for your load balancing
serving capacity when you
first create the backend: maximum CPU utilization, maximum requests per second
per instance, or maximum requests per second of the whole group. Autoscaling
only works with maximum CPU utilization and maximum requests per
second/instance because the value of these settings can be controlled by
adding or removing instances. For example, if you set a backend to handle 10
requests per second/instance, and the autoscaler is configured to maintain 80%
of that rate, then the autoscaler can add or remove
instances when the requests per second/instance changes.

Autoscaling does not work with maximum requests per group because this setting
is independent of the number of instances in the instance group. The load
balancer continuously sends the maximum number of requests per group to the
instance group, regardless of how many instances are in the group.

For example, if you set the backend to handle 100 maximum requests per group
per second, the load balancer will continue to send 100 requests per second to
the group, whether the group has two instances or 100 instances.
Since this value cannot be adjusted, autoscaling does not work with a load
balancing configuration that uses maximum number of requests per second per
group.

Enable autoscaling based on load balancing serving capacity

Console

If you have an instance group, select it and click Edit group. If you
don't, click Create instance group.

Under Autoscaling, select On.

Under Autoscale based on, select HTTP load balancing usage.

Enter the Target load balancing usage. This will be treated as a
percentage. For example, for 60% HTTP load balancing usage, enter 60.

Provide a number for the maximum number of instances that you want in
this instance group. You can also set the minimum number of instances and
the cool down period. The cool down period is the number of seconds the
autoscaler should wait after a virtual machine has started before the
autoscaler starts collecting information from it. This accounts for the
amount of time it might take for the instance to initialize, during
which the collected usage is not reliable for autoscaling. The default
cool down period is 60 seconds.

Save your changes.

gcloud

To enable an autoscaler that scales on serving capacity, use the
set-autoscaling
sub-command. For example, the following command creates an autoscaler that
scales the target managed instance group to maintain 60% of the serving
capacity. Along with the --target-load-balancing-utilization parameter,
the --max-num-replicas parameter is also required when creating an
autoscaler:

Optionally, you can use the --cool-down-period flag, which tells the
autoscaler how many seconds to wait after a new virtual machine has started
before the autoscaler starts collecting usage information from it. This
accounts for the amount of time it might take for the virtual machine to
initialize, during which the collected usage is not reliable for
autoscaling. The default cool down period is 60 seconds.

You can verify that you autoscaler was successfully created using the
describe sub-command:

Optionally, you can use the coolDownPeriodSec parameter, which tells the
autoscaler how many seconds to wait after a new instance has started before
it starts to collect usage. After the cool-down period passes, the
autoscaler begins to collect usage information from the new instance and
determines if the group requires additional instances. This accounts for
the amount of time it might take for the instance to initialize, during
which the collected usage is not reliable for autoscaling. The
default cool down period is 60 seconds.

Caution: Autoscaler cannot perform autoscaling when there is a
backup target pool
attached to the primary target pool because when the autoscaler scales down, some
instances will start failing health checks from the load balancer. If the
number of failed health checks reaches the failover ratio, the load balancer will
start redirecting traffic to the backup target pool, causing the utilization of
the managed instance group in the primary target pool to drop to zero. As a
result, the autoscaler won't be able to accurately scale the managed instance
group in the primary target pool. For this reason, we recommend that you do not
assign a backup target pool when using autoscaler.