Deplication

Sunday, 14 October 2018

Introduction

Earlier this year the S3 team announced
that S3 will stop accepting API requests signed
using AWS Signature Version 2 after June 24th, 2019. Customers will need to
update their SDKs, CLIs, and custom implementations to make use of AWS
Signature Version 4 to avoid impact after this date. It might be
difficult to find older applications or instances using outdated versions of
the AWS CLI or SDKs that need to be updated, the purpose of this post is to
explain how AWS CloudTrail data
events and Amazon Athena can be used to help identify applications
that may need to be updated. We will cover the setup of the CloudTrail data
events, the Athena table creation, and some Athena queries to filter and refine
the results to help with this process.

Setting up CloudTrail data events in the AWS console

The first step is to create a trail to capture S3 data
events. This should be done in the region you plan on running your Athena
queries in order to avoid unnecessary data transfer charges. In the CloudTrail console
for the region, create a new trail specifying the trail name. The ‘Apply trail
to all regions’ option should be left as ‘Yes’ unless you plan on running
separate analyses for each region. Given that we are creating a data events
trail, select ‘None’ under the Management Events section and check the “Select
all S3 buckets in your account” checkbox. Finally select the S3 location where
the CloudTrail data will be written, we will create new bucket for simplicity:

Setting up CloudTrail data events using the AWS CLI

If you prefer to create the trail using the AWS CLI then you
can use the create-subscription
command to create the S3 bucket and trail with the correct permissions,
updating it to be a global trail and then adding the S3 data event
configuration:

A word on cost

Once the trail has been created, CloudTrail will start
recording S3 data events and delivering them to the configured S3 bucket. Data
events are currently priced at $0.10 per 100,000 events with the storage costs
being the standard S3 data storage charges for the (compressed) events, see the CloudTrail pricing page for additional details. It is recommend that you disable the data event trail once you are satisfied that you have gathered sufficient
request data, it can be re-enabled if further analysis is required at a later stage.

Creating the Athena table

The CloudTrail team simplified the process for using Athena
to analyse CloudTrail logs by adding
a feature to allow customers to create an Athena table directly from
the CloudTrail console event history page by simply clicking on the ‘Run
advanced queries in Amazon Athena’ link and selecting the corresponding S3
CloudTrail bucket:

Analysing the data events with Athena

We now have all the components needed to begin searching for
clients that may need to be updated. Starting with a basic query that filters
out most of the AWS requests (for example the AWS Console, CloudTrail, Athena, Storage
Gateway, CloudFront):

These results should mostly be client API/CLI requests but the
large number of requests can still be refined by only including regions that
actually support AWS Signature Version 2. From the region
and endpoint documentation for S3 we can see that we only need to
check eight of the regions. We can safely exclude the AWS Signature Version 4
(SigV4) regions as clients would not work correctly against these regions if
they did not already have SigV4 support. Let’s also look at distinct user
agents and extract the version from the user agent string:

We are unfortunately not able to filter on the calculated
‘version’ column and as it is a string it is also difficult to perform direct
numerical version comparison. We can use some arithmetic to create a version
number that can be compared. Using the AWS CLI requests as an example for the moment and
adding back the source IP address and user identity

The version comparison number (10110108) translates to the
version string 1.11.108 which is the first version of AWS CLI supporting SigV4
by default. This results in a list of clients accessing S3 objects in this
account using a version of the AWS CLI that needs to be updated:

The same query can be applied to all the AWS CLI and SDK
user agent strings by substituting the corresponding agent string and version
number for SDK versions using SigV4 by default:

AWS Client

SigV4 default version

User Agent String

Version comparator

Java

1.11.x

aws-sdk-java

10110000

.NET

3.1.10.0

aws-sdk-dotnet

30010010

Node.js

2.68.0

aws-sdk-nodejs

20680000

PHP

3

aws-sdk-php

30000000

Python Botocore

1.5.71

Botocore

10050071

Python Boto3

1.4.6

Boto3

10040006

Ruby

2.2.0

aws-sdk-ruby

20020000

AWS CLI

1.11.108

aws-cli

10110108

Powershell

3.1.10.0

AWSPowerShell

30010010

Note:

.NET35,.NET45, and
CoreCLR only, PCL, Xamarin, UWP platforms do not support SigV4 at all

All versions of Go and C++ SDKs support SigV4 by default

Tracing the source of the requests

The source IP address will reflect the private IP of the EC2
instance accessing S3 through a VPC endpoint or the public IP if accessing S3
directly. You can search for either of these IPs in EC2 AWS Console for the
corresponding region. For non-EC2 or NAT access you should be able to use the
ARN to track down the source of the requests.

Saturday, 25 August 2018

Introduction

S3 has had event notifications since 2014 and for individual object notifications these events work well with Lambda, allowing you to perform an action on every object event in a bucket. It is harder to use this approach when you want to perform an action a limited number of times or at an aggregated bucket level. An example use case would be refreshing a dependency (like Storage Gateway RefreshCache) when you are expecting a large number of objects events in a bucket. Performing a relatively expensive action for every event is not practical or efficient in this case. This post provides a solution for aggregating these events using Lambda, DynamoDB, and SQS.

The problem

We want to call RefreshCache on our Storage Gateway (SGW) whenever the contents of the S3 bucket it exposes are updated by an external process. If the external process is updating a large number of (small) S3 objects then a large number of S3 events will be triggered. We don't want to overload our SGW with refresh requests so we need a way to aggregate these events to only send occasional refresh requests.

The solution

The solution is fairly simple and uses DynamoDB's Conditional Writes for synchronisation and SQS Message Timers to enable aggregation. When the Lambda function processes a new object event it first checks to see if the event falls within the window of the currently active refresh request. If the event is within the window it will automatically be included when the refresh executes and the event can be ignored. If the event occurred after the last refresh then a new refresh request is sent to an SQS queue with a message timer equal to the refresh window period. This allows for all messages received within a refresh window to be included in a single refresh operation.

Implementation

At a high level we need to create resources (SQS queue, DynamoDB table, Lambda functions), set up permissions (create and assign IAM roles), and apply some configuration (linking Lambda to S3 event notification and SQS queues). This implementation really belongs in a CloudFormation template (and I may actually create one) but I was interested to try and do this entirely via the AWS CLI, masochistic as that may be. If you are not interested in the gory implementation details then skip ahead to 'Creation and deletion script' section

Let's start with the S3 event aggregation piece. We need:

A DynamoDB table to track state

An SQS queue as a destination for aggregated actions

A Lambda function for processing and aggregating the S3 events

IAM permissions for all of the above

As the DynamoDB table and SQS queue are independent we can create these first:aws dynamodb create-table --table-name S3EventAggregator --attribute-definitions AttributeName=BucketName,AttributeType=S --key-schema AttributeName=BucketName,KeyType=HASH --provisioned-throughput ReadCapacityUnits=1,WriteCapacityUnits=5aws sqs create-queue --queue-name S3EventAggregatorActionQueue
Naturally this needs to be done with a user that has sufficient permissions and assumes your default region is set.The Lambda function is a bit trickier as it requires a role to be created before the function can be created. So let's start with the IAM permissions. First let's create a policy allowing DynamoDB GetItem and UpdateItem to be performed on the DynamoDB table we created earlier. To do this we need a JSON file containing the necessary permissions. The dynamo-writer.json file looks like this:

We need to replace REGION and ACCOUNT_ID with the relevant values. As we are aiming at using the command line for this exercise, let's use STS to retrieve our account ID, set our region, and then use sed to substitute both variables:ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

We now have a policy that allows the caller (our soon to be created Lambda function in this case) to update items in the S3EventAggregator DynamoDB table. Next we need to create a policy to allow the function to write messages to SQS. The sqs-writer.json policy file contents are similar to the DynamoDB policy:

Retrieving the file and substituting the ACCOUNT_ID and REGION using the environment variables we created for the DynamoDB policy:wget -O sqs-writer.json https://raw.githubusercontent.com/watchamcb/s3-event-aggregator/master/iam/sqs-writer.json sed -i "s/ACCOUNT_ID/$ACCOUNT_ID/g" sqs-writer.jsonsed -i "s/REGION/$AWS_DEFAULT_REGION/g" sqs-writer.json

We can now attach the SQS and DynamoDB policies to the new created Lambda role. We also need the AWSLambdaBasicExecutionRole which is an AWS managed policy providing access to CloudWatch logs and Lambda function execution:

The function concurrency is set to 1 as there is no benefit to having the function processing S3 events concurrently and 'single threading' the function will limit the maximum concurrent DynamoDB request rate to reduce DynamoDB capacity usage and costs.

All that is left now is to give S3 permission to execute the Lambda function and link the bucket notification events to the S3EventAggregator function. Giving S3 permission on the specific bucket:

Interestingly, the --source-arn can be omitted to avoid needing to add permissions for each bucket you want the function to operate on but it is required (and must match a specific bucket) for the Lambda Console to display the function and trigger correctly. The S3 event.json configuration creates an event on any object creation or removal events:

Moving onto the final part of the solution, we need a Lambda function that processes the events that the S3EventAggregator function sends to SQS. For the function's permissions we can reuse the S3EventAggregatorDynamo policy for DynamoDB access but will need to create a new policy for reading and deleting SQS messages and refreshing the Storage Gateway cache.

The sgw-refresh.json is as follows, note that SMB file shares are included but the current Lambda execution environment only supports boto3 1.7.30 which does not actually expose the SMB APIs (more on working around this later):{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "storagegateway:RefreshCache", "storagegateway:ListFileShares", "storagegateway:DescribeNFSFileShares", "storagegateway:DescribeSMBFileShares" ], "Resource": "*" } ]}

And then creating the role and adding the relevant policies:wget -O lambda-trust.json https://raw.githubusercontent.com/watchamcb/s3-event-aggregator/master/iam/lambda-trust.jsonaws iam create-role --role-name S3AggregatorActionLambdaRole --assume-role-policy-document file://lambda-trust.jsonaws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole --role-name S3AggregatorActionLambdaRoleaws iam attach-role-policy --policy-arn arn:aws:iam::$ACCOUNT_ID:policy/S3EventAggregatorSqsReader --role-name S3AggregatorActionLambdaRoleaws iam attach-role-policy --policy-arn arn:aws:iam::$ACCOUNT_ID:policy/S3EventAggregatorDynamo --role-name S3AggregatorActionLambdaRoleaws iam attach-role-policy --policy-arn arn:aws:iam::$ACCOUNT_ID:policy/StorageGatewayRefreshPolicy --role-name S3AggregatorActionLambdaRole
Next we will create the Lambda function, this will however depend on whether or not you require SMB file share support. As mentioned earlier the current Lambda execution environment does not expose the new SMB file share APIs so if you have SMB shares mapped on your Storage Gateway you will have to include the latest botocore and boto3 libraries with your deployment. The disadvantage of this is that you are not able to view the code in the Lambda console (due to the deployment file size limitation). If you are only using NFS shares then you only need the code without the latest libraries but it will break if you add an SMB share before the Lambda execution environment supports it. Including the dependency in the deployment is the preferred option so that is what we are going to do:

Creation and deletion scripts

For convenience a script to create (and remove) this stack is provided on GitHub. Clone the s3-event-aggregator repository and run the create-stack.sh and delete-stack.sh scripts respectively. You need to have the AWS CLI installed and configured and sed and zip must be available. Be sure to edit the BUCKET variable in the script to match your bucket name and change the REGION if appropriate.

Note that the delete stack script will not remove the S3 event notification configuration by default. There is no safe and convenient way to remove only the S3EventAggregator configuration (other than removing all configuration which may result in unintended loss of other event configuration). If you have other events configured on a bucket it is best to use the AWS Console to remove the s3-event-aggregator event configuration. If there are no other events configured on your bucket you can safely uncomment the relevant line in the deletion script.

Configuration

The two Lambda functions both have a LOG_LEVEL environment variable to control the details logged to CloudWatch Logs, the functions were created with the level set to INFO but DEBUG may be useful for troubleshooting and WARN is probably appropriate for use in production.

The S3EventAggregator function also has an environment variable called REFRESH_DELAY_SECONDS for controlling the event aggregation window. It was initialised to 30 seconds when the function was created but it may be appropriate to change it depending on your S3 upload pattern. If the uploads are mostly small and complete quickly, or if you need the Storage Gateway to reflect changes quickly then this may be a reasonable value. If you are performing larger uploads or the total upload process takes significantly longer then the refresh window would need to be increased to be longer than the total expected upload time.

The DynamoDB table was created with 5 write capacity units and as the entries are less than 1KB this should be sufficient as long as you are not writing more than 5 objects a second to the Storage Gateway S3 bucket. Writing more than this will required additional write capacity to be provisioned (or auto scaling enabled).

The same code can be used for multiple buckets by simply adding additional bucket event configurations via the CLI put-bucket-notification-configuration as above or using the AWS Console.

Cost

There are three component costs involved in this solution, the two Lambda functions, DynamoDB, and SQS. The Lambda and DynamoDB costs will scale fairly linearly with usage with both the S3EventAggregator and DynamoDB being charged for each S3 event that is triggered. To get an idea of the number of events to expect you can enable S3 metrics on the bucket and check the PUT and DELETE counts. The S3StorageGatewayRefresh function and SQS messages will be a fraction of the total S3 event counts and dependent on the REFRESH_DELAY_SECONDS configuration. A longer refresh delay will result in fewer SQS messages and S3StorageGatewayRefresh function executions.

As an example lets use an example of 1000 objects uploaded a day with these being aggregated into 50 refresh events. For simplicity we will also assume that the free tier has been exhausted and that there are 30 days in the month. The total Lambda request count will then be:

As Lambda requests are charged in 1 million increments this will result in a charge of $0.2 for the requests

The compute charges are based on duration, with the S3EventAggregator executing in less than 100ms for all aggregate events and around 300 - 600ms for the refresh events. The S3StorageGatewayRefesh function takes between 400ms and 800ms. Giving us:

SQS charges per million requests with the 1500 send message requests and a further 1500 receive and delete requests all falling under this limit (and thus only costing $0.40 for the month). It is worth noting that Lambda does poll the SQS queue roughly 4 times a minute and this will contribute to your total SQS request costs, using around 172,800 SQS requests a month.

There are some other costs associated with CloudWatch Logs and DynamoDB storage but these should be fairly small compared to the request costs and I would not expect the total cost of the stack to be more than $10 - $15 a month.

Conclusion

And so ends this post, well done for reading to the end. I quite enjoyed building this solution and will look at converting it to a CloudFormation template at a later stage. Feel free to log issues or pull requests against the GitHub repo.