Automating AWS EC2 Management with Python and Boto3

Introduction

In this article I will be demonstrating the use of Python along with the Boto3 Amazon Web Services (AWS) Software Development Kit (SDK) which allows folks knowledgeable in Python programming to utilize the intricate AWS REST API's to manage their cloud resources. Due to the vastness of the AWS REST API and associated cloud services I will be focusing only on the AWS Elastic Cloud Compute (EC2) service.

Here are the topics I will be covering:

Starting an EC2 instance

Stoping an EC2 instance

Terminating an EC2 instance

Backing up an EC2 instance by creating an image

Creating an EC2 instance from an image

Scheduling backup and clean up using cron on a server and AWS Lambda

Dependencies and Environment Setup

To start I will need to create a user in my AWS account that has programmatic access to the REST API's. For simplicity I will be granting this user admin rights, but please note that is only for simplicity in creating this tutorial. If you are following along you should consult your organization's IT security policies before using this user in a production environment.

Step 1: In my AWS console I must go to IAM section under the services menu, then click the Users link and finally click the Add user button which takes me to the screen shown below. In this screen I give the user the name "boto3-user" and check the box for Programmatic access before clicking the next button.

Step 2: In the permissions screen I click the Attach existing policies directly tile and then select the checkbox for AdministratorAccess before clicking next as shown below.

Step 3: Click through to next since I am not adding any optional tags.

Step 4: I review the user about to be created and then click Create user.

Step 5: Finally, I download credentials as a CSV file and save them.

Next up I need to install the necessary Python 3 libraries locally within a virtual environment, like so:

Creating and EC2 Instance to Work On

In this section I am going to go over how to create an AWS region specific boto3 session as well as instantiate an EC2 client using the active session object. Then, using that EC2 boto3 client, I will interact with that region's EC2 instances managing startup, shutdown, and termination.

To create an EC2 instance for this article I take the following steps:

Step 1: I click the EC2 link within the Services menu to open the EC2 Dashboard and then click the Launch Instance button in the middle of the screen.

Step 2: In the Choose Amazon Machine Image (AMI) page I click the Select button next to the Amazon Linux AMI.

Step 4: On the review page I expand the Tags section and click Edit Tags to add tags for Name and BackUp, then click the Launch Review and Launch again to go back to the review page before finally clicking the Launch button to launch the instance.

I now have a running EC2 instance, as shown below.

Boto3 Session and Client

At last, I can get into writing some code! I begin by creating an empty file, a Python module, called awsutils.py and at the top I import the library boto3 then define a function that will create a region-specific Session object.

If I fire up my Python interpreter and import the module just created above I can use the new get_session function to create a session in the same region as my EC2 instance, then instantiate an EC2.Client object from it, like so:

I can then use this EC2 client object to get a detailed description of the instance using pprint to make things a little easier to see the output of calling describe_instances on the client object.

>>> import pprint
>>> pprint.pprint(client.describe_instances())
...

I am omitting the output as it is quite verbose, but know that it contains a dictionary with a Reservations entry, which is a list of data describing the EC2 instances in that region and ResponseMetadata about the request that was just made to the AWS REST API.

Retrieving EC2 Instance Details

I can also use this same describe_instances method along with a Filter parameter to filter the selection by tag values. For example, if I want to get my recently created instance with the Name tag with a value of 'demo-instance', that would look like this:

There are many ways to filter the output of describe_instances and I refer you to the official docs for the details.

Starting and Stopping an EC2 Instance

To stop the demo-instance I use the stop_instances method of the client object, which I previously instantiated, supplying it the instance ID as a single entry list parameter to the InstanceIds argument as shown below:

Alternative Approach to Fetching, Starting, and Stopping

In addition to the EC2.Client class that I've been working with thus far, there is also a EC2.Instance class that is useful in cases such as this one where I only need to be concerned with one instance at a time.

Below I use the previously generated session object to get an EC2 resource object, which I can then use to retrieve and instantiate an Instance object for my demo-instance.

In my opinion, a major benefit to using the Instance class is that you are then working with actual objects instead of a point in time dictionary representation of the instance, but you lose the power of being able to perform actions on multiple instances at once that the EC2.Client class provides.

For example, to see the state of the demo-instance I just instantiated above, it is as simple as this:

>>> instance.state
{'Code': 16, 'Name': 'running'}

The Instance class has many useful methods, two of which are start and stop which I will use to start and stop my instances, like so:

Creating a Backup Image of an EC2.Instance

An important topic in server management is creating backups to fall back on in the event a server becomes corrupted. In this section I am going to demonstrate how to create an Amazon Machine Image (AMI) backup of my demo-instance, which AWS will then store in it's Simple Storage Service (S3). This can later be used to recreate that EC2 instance, just like how I used the initial AMI to create the demo-instance.

To start I will show how to use the EC2.Client class and it's create_image method to create a AMI image of demo-instance by providing the instance ID and a descriptive name for the instance.

Similarly, I can use the Instance class's create_image method to accomplish the same task, which returns an instance of an EC2.Image class that is similar to the EC2.Instance class.

>>> image = instance.create_image(Name=name + '_2')

Tagging Images and EC2 Instances

A very powerful, yet extremely simple, feature of EC2 instances and AMI images are the ability to add custom tags. You can add tags both via the AWS management console, as I showed when creating the demo-instance with tags Name and BackUp, as well as programmatically with boto3 and the AWS REST API.

Since I have an EC2.Instance object still floating around in memory in my Python interpreter I will use that to display the demo-instance tags.

Both the EC2.Instance and the EC2.Image classes have an identically functioning set of create_tags methods for adding tags to their represented resources. Below I demonstrate adding a RemoveOn tag to the image created previously, which is paired with a date at which it should be removed. The date format used is "YYYYMMDD".

Again, the same can be accomplished with the EC2.Client class by providing a list of resource IDs, but with the client you can tag both images and EC2 instances at the same time if you desire by specifying their IDs in the Resource parameter of create_tags function, like so:

Creating an EC2 Instance from a Backup Image

I would like to start this section by giving you something to think about. Put yourself in the uncomfortable mindset of a system administrator, or even worse a developer pretending to be a sys admin because the product they are working on doesn't have one (admonition... that's me), and one of your EC2 servers has become corrupted.

Eeek! Its scramble time... you now need to figure out what OS type, size, and services were running on the down server... fumble through setup and installation of the base server, plus any apps that belong on it, and pray everything comes up correctly.

Whew! Take a breath and chill because I'm about to show you how to quickly get back up and running, plus... spoiler alert... I am going to pull these one-off Python interpreter commands into a workable set of scripts at the end for you to further modify and put to use.

Ok, with that mental exercise out of the way let me get back to work. To create an EC2 instance from an image ID I use the EC2.Client class's run_instances method and specify the number of instances to kick off and the type of instance to run.

I am omitting the output again due to its verbosity. Please have a look at the official docs for the run_instances method, as there are a lot of parameters to choose from to customize exactly how to run the instance.

Removing Backup Images

Ideally, I would be making backup images on a fairly frequent interval (ie, daily at the least) and along with all these backups come three things, one of which is quite good and the other two are somewhat problematic. On the good side of things I am making snapshots of known states of my EC2 server which gives me a point in time to fall back to if things go bad. However, on the bad side I am creating clutter in my S3 buckets and racking up charges with each additional backup I put into storage.

A way to mitigate the downsides of clutter and rising storage charges is to remove backup images after a predetermined set of time has elapsed and, that is where the Tags I created earlier are going to save me. I can query my EC2 backup images and locate ones that have a particular RemoveOn tag and then remove them.

I can begin by using the describe_images method on the EC2.Client class instance along with a filter for the 'RemoveOn' tag to get all images that I tagged to remove on a give date.

Terminating an EC2 Instance

Well, having covered starting, stoping, creating, and removing backup images, and launching an EC2 instance from a backup image, I am nearing the end of this tutorial. Now all that is left to do is clean up my demo instances by calling the EC2.Client class's terminate_instances and passing in the instance IDs to terminate. Again, I will use describe_instances with a filter for the name of demo-instance to fetch the details of it and grab it's instance ID. I can then use it with terminate_instances to get rid of it forever.

Note: Yes, this is a forever thing so be very careful with this method.

Pulling Things Together for an Automation Script

Now that I have walked through these functionalities issuing commands one-by-one using the Python shell interpreter (which I highly recommend readers to do at least once on their own to experiment with things) I will pull everything together into two separate scripts called ec2backup.py and amicleanup.py.

The ec2backup.py script will simply query all available EC2 instances that have the tag BackUp then create a backup AMI image for each one while tagging them a with a RemoveOn tag with a value of 3 days into the future.

Cron Implementation

A relatively simple way to implement the functionality of these two scripts would be to schedule two cron tasks on a Linux server to run them. In an example below I have configured a cron task to run every day at 11PM to execute the ec2backup.py script then another at 11:30PM to execute the amicleanup.py script.

AWS Lambda Implementation

A more elegant solution is to use AWS Lambda to run the two as a set of functions. There are many benefits to using AWS Lambda to run code, but for this use-case of running a couple of Python functions to create and remove backup images the most pertinent are high availability and avoidance of paying for idle resources. Both of these benefits are best realized when you compare using Lambda against running the two cron jobs described in the last section.

If I were to configure my two cron jobs to run on an existing server, then what happens if that server goes down? Not only do I have the headache of having to bring that server back up, but I also run the possibility of missing a scheduled run of the cron jobs that are controlling the EC2 server backup and cleanup process. This is not an issue with AWS Lambda as it is designed with redundancy to guarantee extremely high availability.

The other main benefit of not having to pay for idle resources is best understood in an example where I may have spun up an instance just to manage these two scripts running once a day. Not only does this method fall under the potential availability flaw of the last item, but an entire virtual machine has now been provisioned to run two scripts once a day constituting a very small amount of compute time and lots of wasted resources sitting idle. This is a prime case for using AWS Lambda to improve operational efficiency.

Another operational efficiency resulting from using Lambda is not having to spend time maintaining a dedicated server.

To create an AWS Lambda function for the EC2 instance image backups follow these steps:

Step 1. Under the Service menu click Lambda within the Compute section.

Step 2. Click the Create function button.

Step 3. Select the Author from scratch option, type "ec2backup" as a function name, select Python 3.6 from the run-time options, then add the boto3-user for the role and click Create Function as show below:

Step 4. In the designer select CloudWatch Events and add a cron job of cron(0 11 * ? * *) which will cause the function to run everyday at 11PM.

Conclusion

In this article I have covered how to use the AWS Python SDK library Boto3 to interact with EC2 resources. I demonstrate how to automate the operational management tasks to AMI image backup creation for EC2 instances and subsequent clean up of those backup images using scheduled cron jobs on either a dedicated server or using AWS Lambda.

If you are interested in learning how to use Boto and AWS Simple Storage Service (S3) check out Scott Robinson's article here on StackAbuse.

As always, thanks for reading and don't be shy about commenting or critiquing below.