One of the appeals of cloud computing is the idea of using what you need when you need. One of the ways that Amazon provides for this is through autoscaling. In essence, this allows you to vary the number of (related) running instances according to some metric that is being tracked.

In this article, we look at how you can trigger a change in the number of running instances using a custom Cloudwatch metric – including the setup of said metric, and a brief look at the interactions between the various autoscaling commands used.

Setting up a custom Cloudwatch metric

Autoscaling uses Cloudwatch alarms to trigger events, in order to use it, we therefore need a functioning Cloudwatch metric and alarm.

Cloudwatch’s basic monitoring is in 5 minute increments and measures a number of parameters that are independent of the operating system and user data, including CPU utilization, disk I/O (in operations and bytes) and network usage. Additional metrics are made available for other services (e.g. EBS volumes, SNS, etc.)

Cloudwatch metrics do not have to be precreated, nor is it necessary to allocate space for them. It is worth mentioning that you cannot delete a metric – any data saved is retained for 2 weeks. Metrics are automatically created when data is added to them.

Custom metrics are created using the ‘PutMetricData’ request. This is available as one of the CLI tools for AWS, mon-put-data:

Note: Metrics differing in any name, namespace, or dimensions (case sensitive) are classified as different metrics.

Pass the command a metric name, namespace, credentials, and a value and you are good to go. It might take a few minutes for the results to initially show up on Cloudwatch, but from my experience it is usually only a few seconds.

It is worth noting, here, that the AWS command line tools are Java based, and have to load Java before they can do anything, which makes them quite slow. None the less, they are easy to use, and a good starting point (we’ll look at an alternate approach later).

I’ll use the example of used memory throughout this post, since it is a fairly common use case (and it can be easily adapted for other metrics).

Amazon has posted a bash script to get us started, modified slightly below:

The script takes the number from the ‘-/+ buffers/cache‘ row under the ‘free‘ column, as a percent of ‘total‘ (under the ‘Mem‘ row), and sets up one metric (UsedMemoryPercent), in the namespace ‘System/Linux’, with a single dimension (InstanceID).

Notes:

AWS_CLOUDWATCH_HOME/bin contains the cloudwatch command line tools

The paths I have used, above, are for Amazon’s Linux AMI

As a personal preference, I have used curl instead of wget.

It should also be mentioned that the bash math used above will only yield integer results.

To use the script, make it executable:

chmod +x /path/to/script.sh

Set it up to run every 5 minutes with crontab -e

*/5 * * * * /path/to/script.sh

The project ‘aws-missing-tools’, hosted on Google Code has a few more scripts, similar to the one above, for gathering other metrics.

Due to the poor performance of the AWS CLI tools, it is far more efficient to call the API directly. This can be accomplished using any of the available SDKs (e.g. PHP, Ruby, etc.). However, even an SDK seems to be overkill for one command. I came across a simple python script, from Loggly, that signs the passed parameters, and can easily be setup to put the Cloudwatch metrics. I have modified it to be a single script, and accept a value on as a command line argument:

The absolute paths are used to avoid errors with cron. Of course, the above scripts have no real error checking in them – but they do serve my purposes quite well.

Hopefully, once you are up and running, you can see something like the following, in CloudWatch:

Setting up Autoscaling

The setup of autoscaling is the same for custom metrics or existing instance-metrics.

For ease of use (i.e. so we don’t have to pass them to every command), we should set (export) either:

AWS_CREDENTIAL_FILE or

both: EC2_PRIVATE_KEY and EC2_CERT

The CLI tools for autoscaling are sufficient for our needs, since we only have to run them once – from the command line – and not multiple times.

Create the launch config

This step sets up the EC2 instance to launch – therefore, it resembles the call to ec2-run-instances. As with the run command, you must pass an AMI and instance type, but can also specify additional parameters such as a block device mapping, security group, or user data.

For example, to create a launch config called ‘geek-config’ which will launch an m1.small instance based on the 32-bit Amazon’s Linux AMI (ami-31814f58), using the keypair ‘geek-key’ into the security group, ‘geek-group’, we would use the following:

Note: it is acceptable to use the security group name unless launching into VPC, in which case you must use the security group id.

Create the autoscaling group

Here we define the parameters for scaling – for instance, the availability zone(s) to launch into and the (upper and lower) limits on the number of instances and we associate the group with the launch configuration we created previously. This command also gives us a chance to setup loadbalancers if needed, to specify a freeze time on scaling (i.e. while the group size is being adjusted), and to start with a number of instances other than the minimum value.

For example, to have a group (‘scaly-geek’) start with 2 instances and scale between 1 and 5 instances based on the above launch config, all of them in the us-east-1a region, with a 3 minute freeze on scaling, we would use:

Note: if you do not specify a –desired-capacity then the –min-size number of instances will be used)

Create a policy to scale with

This command allows us to define a new capacity – either via a change (numerical or percent) or by specifying an exact number of instances, and associates itself with the scaling group we have created previously. Negative numbers are used to represent a decrease in the number of instances. This policy will be referenced by its Amazon Resource Name (ARN) and used as the action of a Cloudwatch alarm. We can create multiple policies depending on our needs, but at least two policies – one to scale up and one to scale down – are common.

Create Cloudwatch Alarms

This is the final step – tying everything together. We have collected data in Cloudwatch, and we will can setup an alarm to be triggered when our metric breaches the target value. This alarm will then be setup to perform one or more actions, specified by their ARN(s). In our case, the alarm will trigger a scaling policy – which will then change the number of instances in our scaling group.

As with our scaling policies, there can be multiple alarms – in our case, two – one to define the lower bound (and trigger our scale-down policy) and one to define the upper bound (and trigger our scale-up policy).

--metric-name and --namespace must match those used to create the original Cloudfront metric.

--period and --evaluation-periods are both required. The former defines the length of one period in seconds, and the latter defines the number (integer) of consecutive periods that much match the criteria to trigger the alarm.

Just to clarify, the aggregation (--statistic) is performed over a single period – and its result must compare (--comparison-operator) to the threshold for the specified number of consecutive periods (--evaluation-periods).

Two following two AWS CLI commands are helpful in debugging errors with autoscaling:

Absolutely – thanks for point that out – ampersands don’t always seem to make it into my posts intact. Hopefully that is the only error that there is in it (I should put up a downloadable copy of the script actually).

This doesn’t tie together.
When “Setting up a custom Cloudwatch metric” you need to push that data into the autoscaling group (rather than an instances metrics) so the AutoSclaing group alarm has data to work off. I am getting the “Insufficient Data” warning for this setup. Perhaps something has changed enough since this tutorial was made to affect this, but the logic wouldn’t have changed.

(or the aws cloudwatch put-metric-data if one is using the new awscli tools)

What you have is good for capturing single instances metrics and I am using that (thanks!) but for autoscaling the metrics need to be all put into the same dimension from all the machines in the AutoScaleGroup.