How To Monitor a Serverless Application

Andreas Wittig – 17 Aug 2018

Compared to a typical web application deployed to EC2 a Serverless Application - consisting of an API Gateway and a Lambda function - needs less monitoring as you are outsourcing most of the operations to AWS. However, there are still some metrics you should keep an eye on. This article shows you how to monitor a Serverless Application with CloudWatch.

Typically a Serverless Application consists of an API Gateway forwarding incoming requests to Lambda. Lambda executes your business logic and makes use of S3 to store objects, DynamoDB to store and query data, and SES to send emails, for example. The following figure illustrates the architecture and shows which CloudWatch metrics need your attention.

You should create a CloudWatch dashboard showing all metrics and define CloudWatch alarms for all highlighted metrics.

Namespace

Metric Name

Description

AWS/ApiGateway

5XXError

Number of requests with status code 5XX (server-side error).

AWS/ApiGateway

Latency

Time between incoming request and response on API Gateway.

AWS/Lambda

Errors

Number of failed function invocations (e.g. timeout, exception, …).

AWS/Lambda

Throttles

Number of throttled function invocations.

AWS/DynamoDB

ReadThrottleEvents

Number of throttled read requests.

AWS/DynamoDB

WriteThrottleEvents

Number of throttled read requests.

AWS/DynamoDB

SystemErrors

Number of server-side errors.

AWS/SES

Reputation.BounceRate

Percentage of bounced messages (multiply by 100).

AWS/SES

Reputation.ComplaintRate

Percentage of messages reported as spam (multiply by 100).

Start with the following configuration for your CloudWatch alarms. Don’t forget to refine the thresholds after a few days. To get notified about server-side errors typically resulting in error messages for your users you should create the following metric:

Metric namespace: AWS/ApiGateway

Metric name: 5XXError

Metric dimension: ApiName and optional Stage

Metric period: 60 seconds

Number of periods: 5 or 1 out of 5

Statistic: Sum

Alarm condition: > 1

Additionally, you should not miss when your users are experiencing long waiting times caused by high latencies of your Serverless Application.

Metric namespace: AWS/ApiGateway

Metric name: Latency

Metric dimension: ApiName and optional Stage

Metric period: 60 seconds

Number of periods: 5 or 1 out of 5

Statistic: p90, p95, or p95 (depending on number of requests)

Alarm condition: > 500 ms

You don’t need to create CloudWatch alarms for the metrics of Lambda, S3, and DynamoDB as problems with all of these components result in a 5XX error or high latencies at the API Gateway. Instead of creating CloudWatch alarms for these metrics put them on a CloudWatch dashboard to simplify investigating issues.

When sending emails with SES, you should create the following CloudWatch alarms to make sure you get notified when there is a problem with your reputation as a sender. Create the following alarm to get notified when the bounce rate is too high.

Further reading

Incident Management for Slack

Team up to solve incidents with marbot. Never miss a critical alert. Escalate alerts from your AWS infrastructure among your team members. Strong integrations with all parts of your AWS infrastructure: CloudWatch, Elastic Beanstalk, RDS, EC2, ...