Before you begin

Read the Before you begin
section of the Autoscaling Overview topic for important setup steps.

Per-instance metrics

Per-instance metrics provide data for each instance in a group separately.
The metrics provide data for each instance in the managed instance group
indicating resource utilization. For per-instance metrics, the instance group
cannot scale below a size of 1 because the autoscaler requires metrics about at
least one running instance in order to operate.

If you need to scale using other Stackdriver metrics that are not specific to
individual instances or scale your instance groups down to zero instances from
time to time, you can configure your instances to scale using
per-group metrics instead.

Standard per-instance metrics

Stackdriver Monitoring has a set of standard metrics that
you can use to monitor your virtual machine instances. However, not all standard metrics
are a valid utilization metric that the autoscaler can use.

A valid utilization metric for scaling meets the following criteria:

The standard metric must contain data for a gce_instancemonitored resource. You can
use the
timeSeries.list
API call to verify whether a specific metric exports data for this resource.

The standard metric describes how busy an instance is, and the
metric value increases or decreases proportionally to the number of virtual
machine instances in the group.

The following is an invalid metric because the value does not change based on
utilization and the autoscaler cannot use the value to scale proportionally:

Custom metrics

You can create custom metrics using
Stackdriver Monitoring and write your own monitoring data to the Stackdriver
Monitoring service. This gives you side-by-side access to standard Cloud
Platform data and your custom monitoring data, with a familiar data structure
and consistent query syntax. If you have a custom metric, you can choose to
scale based on the data from these metrics.

instance_id with the value of unique numerical ID assigned to the
instance.

The metric must export data at least every 60 seconds. You can export data
more often than 60 seconds and the autoscaler will be able to respond faster
to load changes. If you export your data less than every 60 seconds, the
autoscaler might not be able to respond quickly enough to load changes.

The metric must be a valid utilization metric, which means that data from the
metric can be used to proportionally scale up or down the number of virtual
machines.

The metric must export int64 or double data values.

For autoscaler to work with your custom metric, you must export data for this
custom metric from all the instances in the managed instance group.

Note: You can get an instance's numerical ID by making a
request for the metadata server's ID property from within the instance. For
example, you can do this in curl:

Configuring autoscaling using per-instance monitoring metrics

The process of setting up an autoscaler for a standard or custom metric is the
same. To create an autoscaler that uses Stackdriver Monitoring metrics, you must
provide the metric identifier, the desired target utilization level,
and the utilization target type. Each of these properties are described
briefly below:

Metric identifier: The name of the metric to use. If you use a
custom metric, you defined this name when you initially created the metric.
The identifier has the following format:

Target utilization level: The target utilization level that the
autoscaler must maintain for this metric. This must be a positive number.
For example, both 24.5 and 1100 are acceptable values. Note that this is
different from CPU and load balancing utilization, which must be a float
value between 0.0 and 1.0.

Target type: This defines how the autoscaler computes the data collected
from the instances. The possible target types are:

GAUGE: The autoscaler computes the average value of the data collected
in last couple minutes and compares that to the target utilization value
of the autoscaler.

DELTA_PER_MINUTE: The autoscaler calculates the average rate of growth
per minute and compares that to the target utilization.

DELTA_PER_SECOND: The autoscaler calculates the average rate of growth
per second and compares that to the target utilization.

If you expressed your desired target utilization in seconds, you will want
to use DELTA_PER_SECOND and likewise, use DELTA_PER_MINUTE if you
expressed your target utilization in minutes, so the autoscaler can perform
accurate comparisons.

Console

The instructions for configuring autoscaling are different for regional
versus single-zone managed instance groups. Regional managed instance groups
do not support filtering for per-instance metrics.

To configure autoscaling for a regional (multi-zone) managed instance
group:

If you do not have an instance group, create one.
Otherwise, click the name of an instance group to open the instance group
details page. The instance group must be single-zone.

On the instance group details page, click the Edit Group button.

Under Autoscaling, select On to enable autoscaling.

In the Autoscale based on section, select Stackdriver monitoring
metric.

In the Metric export scope section, select Time series per
instance to configure autoscaling using per-instance metrics.

In the Metric identifier section, enter the metric name in the
following format: example.googleapis.com/path/to/metric.

In the Additional filter expression section, optionally enter a
filter to use individual values from metrics with multiple streams or
labels. See Filtering per-instance metrics
for more information.

In the Utilization target section, specify the target value.

In the Utilization target type section, verify that the target type
corresponds to the metric's kind of measurement.

Save your changes when you are ready.

gcloud

For example, in gcloud, the following command creates an autoscaler that
uses the GAUGE target type. Along with the --custom-metric-utilization
parameter, the --max-num-replicas parameter is also required when creating
an autoscaler:

Optionally, you can use the --cool-down-period flag, which tells the
autoscaler how many seconds to wait after a new virtual machine has started
before the autoscaler starts collecting usage information from it. This
accounts for the amount of time it might take for the virtual machine to
initialize, during which the collected usage is not reliable for
autoscaling. The default cool down period is 60 seconds.

For multi-zonal managed instance groups, use the --region flag to specify
where to find the instance group. For example:

Your request body must contain the name, target, and autoscalingPolicy
fields. In autoscalingPolicy, provide the maxNumReplicas and the
customMetricUtilizations properties.

Optionally, you can use the coolDownPeriodSec parameter, which tells the
autoscaler how many seconds to wait after a new instance has started before
it starts to collect usage. After the cool-down period passes, the
autoscaler begins to collect usage information from the new instance and
determines if the group requires additional instances. This accounts for
the amount of time it might take for the instance to initialize, during
which the collected usage is not reliable for autoscaling. The
default cool-down period is 60 seconds.

gcloud

The process for creating an autoscaler that filters a per-instance metric is
similar to creating a normal per-instance
autoscaler, but you must specify a metric filter and individual flags for
the utilization target and target type. For example, the
compute.googleapis.com/instance/network/received_bytes_count
metric includes the instance_name and loadbalanced labels. To filter
based on the loadbalanced boolean, specify the
--stackdriver-metric-filter filter flag with the
'metric.label.loadbalanced = true' value. Include the
utilization target and target type flags individually.

Note: If autoscaling is already enabled for a managed instance group, the
set-autoscaling command will update the existing autoscaler to the
new specifications.

API

Note: Although autoscaling is a feature of
managed instance groups, it is a separate API resource. Keep that in mind
when you construct API requests for autoscaling.

The process for creating an autoscaler that filters a per-instance metric is
similar to creating a normal per-instance
autoscaler, but you must specify a metric filter and individual flags for
the utilization target and target type. For example, the
compute.googleapis.com/instance/network/received_bytes_count
metric includes the instance_name and loadbalanced labels. To filter
based on the loadbalanced boolean, specify the filter parameter
with the "metric.label.loadbalanced = true" value.

In the API, make a POST request to the following URL, replacing
myproject with your own project ID and us-central1-f with the
zone of your choice. The request body must contain the name, target,
and autoscalingPolicy fields. In autoscalingPolicy, provide the
maxNumReplicas and the customMetricUtilizations properties.

This example configures autoscaling to use only the loadbalanced
traffic data as part of the utilization target.

Per-group metrics

Beta

This is a Beta release of
Autoscaling using per-group metrics
. This feature
is not covered by any SLA or deprecation policy and may be subject to backward-incompatible
changes.

Per-group metrics allow autoscaling with a standard or custom metric that does
not export per-instance utilization data. Instead, the group scales based on
a value that applies to the whole group and corresponds to how much work is
available for the group or how busy the group is. The group scales based on
the fluctuation of that group metric value and the configuration that you
define.

Note: Regional managed instance groups do not support autoscaling using
per-group metrics.

When you configure autoscaling on per-group metrics, you must indicate how
you want the autoscaler to provision instances relative to the metric:

Instance assignment: Specify an instance assignment to indicate that you
want the autoscaler to add or remove instances depending on how much work
is available to assign to each instance. Specify a value for this parameter
that represents how much work you expect each instance can handle.
For example, specify 2 to assign two units of work to each instance, or
specify 0.5 to assign half a unit of work to each instance. The
autoscaler adds enough instances to the managed instance group to ensure
that there are enough instances to complete the available work as indicated by
the metric. If the metric value is 10 and you assigned 0.5 units of
work to each instance, the autoscaler creates 20 instances in the managed
instance group. Scaling with instance assignment allows the instance group to
shrink to 0 instance when the metric value drops down to 0 - and back up
again when it rises above 0. The following diagram shows the proportional
relationship between metric value and number of instances when scaling with an
instance assignment policy.

Utilization target: Specify a utilization target to indicate that you
want the autoscaler to add or remove instances to try and maintain the metric
at a specified value. When the metric is above the specified target,
autoscaler gradually adds instances until the metric decreases to the target
value. When the metric is below the specified target value, autoscaler
gradually removes instances until the metric increases to the target value.
Scaling with a utilization target cannot shrink the group to 0 instances.
The following diagram shows how autoscaler adds and removes instances in
response to a metric value in order to maintain a utilization target.

Each option has the following use cases:

Instance assignment: Scale the size of your managed instance groups
based on the number of unacknowledged messages in a Google Pub/Sub
subscription or a total QPS rate of a network endpoint.

Utilization target: Scale the size of your managed instance groups
based on a utilization target for a custom metric that does not come from
the standard per-instance CPU or memory use metrics. For example,
you might scale the group based on a custom latency metric.

When you configure autoscaling with per-group metrics and you specify an
instance assignment, your instance groups can scale down to 0 instances. If your
metric indicates that there is no work for your instance group to complete, the
group will scale down to 0 instances until the metric detects that new work is
available. In contrast to per-group instance assignment, per-instance
autoscaling requires resource utilization metrics from at least one instance, so
the group cannot scale below a size of 1.

Filtering per-group metrics

You can apply filters to per-group Stackdriver metrics, which allows you to
scale managed instance groups using individual values from metrics that have
multiple streams or labels.

You cannot use the = direct equality comparison operator with any
functions for each selector.

You can specify a metric type selector of metric.type = "..." in the
filter and also include the original metric field. Optionally, you
can use only the metric field. The metric must meet the following
requirements:

The metric must be specified at least in one place.

The metric can be specified in both places, but must be equal.

You must specify the resource.type selector, but you cannot set it to
gce_instance if you want to scale using per-group metrics.

For best results, the filter should be specific enough to return
a single time series for the group. If the filter returns
multiple time series, they are added together.

Console

If you do not have an instance group, create one.
Otherwise, click the name of an instance group to open the instance group
details page. The instance group must be single-zone.

On the instance group details page, click the Edit Group button.

Under Autoscaling, select On to enable autoscaling.

In the Autoscale based on section, select Stackdriver monitoring
metric.

In the Metric export scope section, select Single time series per
group.

In the Metric identifier section, specify the metric name in the
following format: example.googleapis.com/path/to/metric.

Specify the Metric resource type.

Provide an additional filter expression to use individual
values from metrics that have multiple streams or labels. The filter must
meet the
autoscaler filtering requirements.

In the Scaling policy section, select either Instance assignment
or Utilization target.

If you select an instance assignment policy, then provide a Single
instance assignment value that represents the amount of work to assign
to each instance in the managed instance group. For example, specify 2
to assign two units of work to each instance. The autoscaler maintains
enough instances to complete the available work (as indicated by the
metric). If the metric value is 10 and you assigned 2 units of work
to each instance, the autoscaler creates 5 instances in the managed
instance group.

If you select a utilization target policy:

Provide a Utilization target value that represents the metric
value that the autoscaler should try to maintain.

Select the Utilization target type that represents the value
type for the metric.

Save your changes when you are ready.

gcloud

Create an autoscaler for a managed instance group similarly to the
per-instance autoscaler, but specify the
--update-stackdriver-metric flag. You can specify how you want the
autoscaler to provision instances by including one of the following
flags:

Specify a metric that you want to measure and specify the
--stackdriver-metric-single-instance-assignment flag to indicate
the amount of work that you expect each instance to handle. You must also
specify a filter for the metric using the
--stackdriver-metric-filter flag.

[INSTANCE_ASSIGNMENT] is the amount of work to assign to each instance
in the managed instance group. For example, specify 2 to assign two
units of work to each instance, or specify 0.5 to assign half a unit
of work to each instance. The autoscaler adds enough instances to
the managed instance group to ensure that there are enough instances
to complete the available work, which is indicated by the metric. If the
metric value is 10 and you've assigned 0.5 units of work to each
instance, the autoscaler provisions 20 instances in the managed
instance group.

Utilization target:

In some situations, you might want to use utilization targets with
per-group metrics rather than specify a number of instances relative
to the value of the metric that your autoscaler measures. You can
still point the autoscaler to a per-group metric, but the autoscaler
attempts to maintain the specified utilization target. Specify the target
and target type with the --stackdriver-metric-utilization-target flag.
You must also specify a filter for the metric using the
--stackdriver-metric-filter flag.

[TARGET_VALUE] is the metric value that the autoscaler attempts to
maintain.

[TARGET_TYPE] is the value type for the metric. You can set the
autoscaler to monitor the metric as a GAUGE, by the delta-per-minute
of the value, or by the delta-per-second of the value.

To see a full list of available autoscaler gcloud commands and flags
that work with per-group autoscaling, see the
gcloud beta reference.

Note: If autoscaling is already enabled for a managed instance group, the
set-autoscaling command will update the existing autoscaler to the
new specifications.

API

Note: Although autoscaling is a feature of
managed instance groups, autoscalers
are a separate API resource. Keep that in mind when you construct API
requests for autoscaling.

Create an autoscaler for a managed instance group. You can specify how
you want the autoscaler to provision instances by including one of the
following parameters:

Instance assignment: Specify the singleInstanceAssignment parameter.

Utilization target: Specify the utilizationTarget parameter.

Instance assignment:

In the API, make a POST request to create an autoscaler.
In the request body, include the normal parameters that you would use to
create a per-instance autoscaler, but specify the
single-instance-assignment parameter. The parameter specifies the amount
of work that you expect each instance to handle.

[INSTANCE_ASSIGNMENT] is the amount of work to assign to each instance
in the managed instance group. For example, specify 2 to assign two
units of work to each instance, or specify 0.5 to assign half a unit
of work to each instance. The autoscaler adds enough instances to
the managed instance group to ensure that there are enough instances
to complete the available work, which is indicated by the metric. If the
metric value is 10 and you've assigned 0.5 units of work to each
instance, the autoscaler provisions 20 instances in the managed
instance group.

Utilization target:

In some situations, you might want to use utilization targets with
per-group metrics rather than specify a number of instances relative
to the value of the metric that your autoscaler measures. You can
still point the autoscaler to a per-group metric, but the autoscaler
attempts to maintain the specified utilization target. Specify
those targets with the utilizationTarget parameter. You must also
specify a filter for the metric using the filter parameter.

Example: Using instance assignment to scale based on a Pub/Sub queue

An active Google Cloud Pub/Sub subscription is connected to the topic in a
pull configuration. The subscription is named our-subscription.

A pool of workers is pulling messages from that subscription and processing
them. The pool is a single-zone managed instance group named
our-instance-group and is located in zone us-central1-a. The pool must not
exceed 100 workers, and should scale down to 0 workers when there are no
messages in the queue.

On average, a worker processes a single message in one minute.

To determine the optimal instance assignment value, consider several approaches:

To process all messages in the queue as fast as possible, you can choose 1
as the instance assignment value. This creates one instance for each message
in the queue (limited to the maximum number of instances in our group). However,
this can cause overprovisioning. In the worst case, an instance is created to
process just one message before the autoscaler shuts it down, which consumes
resources for much longer than doing actual work.

Note that if the workers were able to process multiple messages
concurrently, it would make sense to increase the value to the number of
concurrent processes.

Note that, in this example, it does not make sense to set the value below
1 because one message cannot be processed by more than one worker.

Alternatively, if processing latency is less important than resource
utilization and overhead costs, you can calculate how many messages each
instance must process within its lifetime to be considered efficiently
utilized. Take into account startup and shutdown time and the fact that
autoscaling does not immediately delete instances. For example, assuming that
startup and shutdown time takes about 5 minutes in total and assuming that
autoscaling deletes instances only after a period of approximately 10 minutes,
you calculate that it is efficient to create an additional instance in the
group as long as it can process at least 15 messages before the autoscaler
shuts it down, which results in at most 25% overhead due to the total time
it takes to create, start, and shutdown the instance. In this case, you can
choose 15 as the instance assignment value.

Both approaches can be balanced out, resulting in a number between 1 and
15, depending on which factor takes priority, processing latency versus
resource utilization.

Looking at the available Pub/Sub metrics,
we find a metric that represents the subscription queue length:
subscription/num_undelivered_messages.

Note that this metric exports the total number of messages in the queue,
including messages that are currently being processed but that are not yet
acknowledged. Using a metric that does not include the messages being processed
is not recommended because such a metric can drop down to 0 when there is still
work being done, which prompts autoscaling to scale down and possibly interrupt
the actual work.

Example: Using a utilization target to scale based on average latency

There might be a situation when the metric providing a relevant signal does not
represent a total amount of available work or another resource applicable to the
group, as in the previous example, but instead an average, a percentile, or some
other statistical property. For this example, assume you will scale based on the
group's average processing latency.

Assume the following setup:

A managed instance group named our-instance-group is assigned to perform a
particular task. The group is located in zone us-central1-a.

You have a Stackdriver Monitoring custom metric
that exports a value that you would like to maintain at a particular level. For
this example, assume the metric represents the average latency of processing
queries assigned to the group.

The custom metric is named:
custom.googleapis.com/example_average_latency.

The custom metric has a label with a key named group_name and value
equal to the instance group's name, our-instance-group.

You have determined that when the metric value goes above some specific value,
you need to add more instances to the group to handle the load, while when it
goes below that value, you can free up some resources. Autoscaling gradually
add or remove instances at a rate that is proportional to how much the metric is
above or below the target. For this example, assume that the calculated target
value is 100.

You can now configure autoscaling for the group using a per-group utilization
target of 100, which represents the metric value that the autoscaler must
attempt to maintain: