Introducing LambdaClock

Currently, CloudWatch Event rules can be scheduled at minimum intervals of 5 minutes. At Trek10, we saw the need for a smaller interval, namely 1 minute. This need arose specifically in the context of handling Auto Scaling for ECS. Since scaling ECS can require two steps to scale out (increase services desired count, then increase cluster instances), the resulting 10 minute delay (5 minutes for each action) for scaling was just too long.

We’ve found this to be extremely valuable for customers opting for smaller and faster release cycles. Faster deployments let you have less scaling jitter, and controlled scaling reduces costs.

LambdaClock Overview

There are five CloudWatch Event Rules each of which uses a cron expression for scheduling:

0/5 * ? * MON-SUN * // LambdaClock_0

1/5 * ? * MON-SUN * // LambdaClock_1

2/5 * ? * MON-SUN * // LambdaClock_2

...

...

Each one of these rules has a target of the LambdaClock function. This leads us to what we feel is an obvious question, “If an event triggers each minute, why not have the event publish to an SNS topic and be done?”

This is how LambdaClock Began

However, we saw inconsistent delays between the CloudWatch Event trigger time and publishing to an SNS topic or invoking a Lambda function. The problem might have ended there if the delays were consistent, but after reviewing over 4,700 CloudWatch Events, we recorded an average delay of 25 seconds. That said, some rules consistently invoked Lambda functions within 10 seconds, while others were consistently recorded at over 40 seconds. Thankfully, all rules tended to stay below 60 seconds.

The Solution is Simple

A Lambda function (LambdaClock) is the target of each CloudWatch Event rule. LambdaClock calculates the time it must wait until the next minute by comparing the current time and the time the event was triggered. LambdaClock then uses setTimeout to wait until the 0th second of the next minute, at which point it publishes to an SNS topic, which is also named LambdaClock. A second Lambda function is used to analyze the time it took for this SNS topic to invoke a subscribed Lambda function. Some preliminary results are shown below:

Average: 480.76ms

Standard Deviation: 294.57ms

Count: 5013 total invocations (~3.5 days)

Failures: 0

Within One SD: 4047

Within Two SD: 942

Within Three SD: 10

Beyond Three SD: 14

The ultimate goal was to achieve a scheduling period of one, and thus far the LambdaClock system seems to be holding up nicely.

Getting Started with LambdaClock

Visit our GitHub repo to download the LambdaClock.json file, use it to create a stack through CloudFormation, and you’ll have your very own LambdaClock SNS topic. All we ask is that you treat it well, and don’t feed it after midnight … well, it’ll probably work fine after midnight.

Check out our other blog posts and learn more about our work at trek10.com. And, don’t be afraid to tweeter at us trek10inc. Let us know what you think and we’ll be back with more soon.