Automating the discovery of unused AWS Lambda functions

In 2017 Kyle Somers explained how you can gain visibility into the execution of your AWS Lambda functions in his blog post announcing AWS CloudTrail data events for AWS Lambda. In my blog post, I’ll expand upon Kyle’s post to show you how you can combine CloudTrail data events for AWS Lambda with the power of the Amazon Athena SQL engine to answer the question, “Do I have any Lambda functions that haven’t been used in the past 30 days?”

Whether you are a large financial institution or start-up company, understanding which functions are being invoked and which are not can help you maintain an up-to-date Lambda environment and control costs and risks by removing unused functions from a production environment.

In addition to helping identify which Lambda functions have been invoked, CloudTrail Lambda data events can be used to detect and automatically act on invocations of Lambda functions across your AWS account. For example, you can meet your IT auditing and compliance requirements by validating that your functions were invoked by permitted users, roles, and services. Customers with regulatory audit and compliance requirements can maintain the same level of visibility and auditability of Lambda function invocations as they do for other AWS services.

To help us identify unused Lambda functions we’ll use a simple Python script that demonstrates an example workflow. The Python script requires Python 2.7+ or 3.3+, but before we dive right in, let’s make sure we have all the prerequisites set up.

Recording the AWS Lambda invoke API

Enabling CloudTrail data events for recording Lambda invoke API activity is a simple setup that can be added to an existing CloudTrail trail or added during the creation of a new trail within your account. To understand which functions are being used, ensure that you enable the Log all current and future functions option during the setup.

Kyle’s blog does a great job of providing step-by-step instructions. For the Python script to return accurate results based on the past 30 days, you’ll need to ensure that CloudTrail data events for AWS Lambda have been enabled for all functions for at least that period of time. Data events are charged at the rate of $0.10 per 100,000 events. See the CloudTrail pricing page for more information.

Boto3 setup and configuration

The sample Python script depends on Boto3, the AWS SDK for Python. To install Boto3 you can clone the repository and type:

pip install -r requirements.txt

Or you can install boto3 using pip:

pip install boto3

Before you can begin using Boto3, you need to set up authentication credentials. Credentials for your AWS account can be found in the IAM console. You can create a new user or use an existing user that has the required permissions described below. Go to the Users -> Security credentials -> Access keys page and copy the existing keys or generate a new set of keys for the chosen IAM user.

If you have the AWS CLI installed, then you can use it to configure your credentials file using the command:

aws configure

Alternatively, you can create the credential file yourself. By default, its location is at ~/.aws/credentials for Mac and Linux users or C:\Users\USER_NAME.aws\credentials for Windows users. Add the following lines in the file:

[default]

aws_access_key_id = YOUR_ACCESS_KEY

aws_secret_access_key = YOUR_SECRET_KEY

See the Security Credentials page for more information on getting your keys. For more information on configuring Boto3, check out the Quickstart section in the developer guide.

2. Name of the Amazon S3 bucket where Amazon Athena will store the query history when running the script. This bucket will be created in the Region where the script is executed if it doesn’t currently exist, and the Region name will be appended to the name you provide. Example:

ATHENA_S3_BUCKET_NAME = “s3://athena-history-bucket-demo”

3. Name of the Athena table to create for CloudTrail logs. This table will be created in the ‘default’ Athena database. Example:

TABLE_NAME = “cloudtrail_logs”

4. Location of the Amazon S3 bucket where CloudTrail logs are stored for your CloudTrail Lambda data events. You can find this location by viewing the CloudTrail trail and copying the S3 bucket where the log files are delivered. This is in the format of s3://{BucketName}/AWSLogs/{AccountID}/. Example:

Running the script

With all the prerequisites met and the variables configured, you can now run the script. Before running it, it’s important to understand that while there is no cost for the script to create the CloudTrail table within Athena, the step of running an actual Athena query to search for Lambda invocations will incur standard Athena charges. Please visit the Athena pricing page for more information.

To run the script:

python unusedlambda.py

This performs the following actions in order:

1. Retrieves a list of the current Lambda functions found in the Region specified in the configuration file or set using the AWS CLI. The script will output the total count of functions found.

The next three steps will run a set of queries within Athena. When each query is launched you’ll see a Query Execution ID and the script will return “Running” every 5 seconds waiting for the query results to return.

2. Create an Athena table with the name specified in the ‘TABLE_NAME’ variable. This table is created using the AWS CloudTrail SerDe and specific DDL required to query AWS CloudTrail logs.

3. Create a partition in the newly created CloudTrail table for the year 2018. This limits the amount of data that Amazon Athena will need to query to return the results within the past 30 days.

4. Using the list of functions retrieved in Step 1, create and execute an Amazon Athena query that returns a list of which functions have been invoked in the past 30 days.

Note: If you run the script more than once within the same Region, using the same set of script variables, you’ll notice that the queries for Step 2 and Step 3 will return failed results. This is expected behavior as the CloudTrail table and partition already exist.

5. Finally, the script will output the difference between the list of functions that exist within the Region and the query results.

Example output:

The output represents all the functions within the designated Regions that have NOT been called in the past 30 days.

Conclusion

By combining CloudTrail data events for Lambda with Athena’s SQL engine, we can now easily automate an answer to the question, “Do I have any Lambda functions that haven’t been used in the past 30 days?”.

About the Author

Bob O’Dell is a Sr. Product Manager for AWS CloudTrail. AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of AWS accounts. Bob enjoys working with customers to understand how CloudTrail can meet their needs and continue to be an integral part of their solutions. In his spare time, he enjoys spending time adventuring through the Pacific Northwest.