Cloud Deep Dive: Part 1 — Serverless Pizza Oven

Organisations need to accelerate innovation and reduce operational costs. Cloud adoption has reached 88% in the UK and efforts like the Cloud Native Computing Foundation are having wide-reaching success. The latest frontier in cloud software development is serverless computing.

Is it just about breaking up functionality into smaller chunks, focusing on business logic? Is it about not paying for idle resources, making spiky usage more affordable? Or is it about empowering developers with a wide selection of powerful managed services to provision along with their code?

To find out, let’s take a deep dive building out the YLD Cloud Pizza Place — a virtual restaurant in the cloud. We’ll start by creating the functionality for baking our pizzas…

AWS SDK and CLI

The AWS SDK provides a simple, well documented interface to the AWS APIs, available as a library for most popular high level languages: Java, .Net, Node.js, Python, Go, and more. Pretty much everything you can do in AWS has an API, which means everything is programmable.

Identity and Access Management (IAM)

We can reduce our security risk by following the principle of least privilege, giving any execution environment access to only what is essential for it to function. AWS IAM policies allow for setting granular API access permissions, controlling who can do what, using which AWS resources, under which conditions.

CloudFormation

CloudFormation automates provisioning of AWS resources using text-based configuration files (JSON or YAML). By defining parameters, our CloudFormation configuration files can become templates for deployments in different environments, AWS accounts or regions.

These text-based files can be checked into source control systems along with source code, allowing for the implementation of security auditing and quality controls.

The Bake Function

Lambda function execution is limited by duration (up to 5 minutes) and reserved memory allocation (3GB). Lambda functions are charged for per invocation and per duration of reserved memory in seconds. Execution of the entire program is stopped after completion of the invoked function or when reaching the configurable duration limit. Idle time is not charged for.

AWS Lambda invokes a configured handler function within a deployed module. The module needs to be packaged up with runtime dependencies and uploaded to an AWS S3 (Simple Storage Service) to be deployed as part of a CloudFormation template. This can be done automatically by the AWS CLI. We can just specify the path relative to own template file. E.g.

Policy permissions relating to resources normally reference the corresponding Amazon Resource Name (ARNs), possibly including wildcards (‘*’). In this case the logging resources are created upon deployment of our function, so we have to use convention to predict what the ARN is going to be. We can use template parameters, pseudo parameters and the CloudFormation Sub function to generate it. E.g.

The New Items Queue

We’ll use a Simple Queueing Service (SQS) message queue for new items. That means we do not need to know where the items will be coming from and this source also wouldn’t need to know how we are processing the items.

We can configure an ‘event source’ that would let AWS Lambda poll the SQS queue for available items, invoking our function for batches of up to 10 items at a time. Incoming messages to our function will look something like this:

Our Lambda function’s execution role will need to contain policies with permissions that allow for the event source to listen for new messages in our SQS queue and delete messages that resulted in a successful execution. E.g.

We don’t want to lose items when our function gets throttled due to concurrency limits (there is a default of 1000 concurrent Lambda executions per account per region), or if some other temporary error occurs. The event source will retry delivery for the lifetime of the message, which is defined in the SQS configuration as a function of age and number of reads (attempts). We’ll set up a Dead-Letter Queue where undeliverable items will be sent.

So far so good

Our CloudFormation template would now reflect all of this:

Baking Time

Different items may need a different amount of time to bake in our oven…

A State Machine

As Lambda functions have a limited execution time (too short to bake a pizza), we need to use something else to manage the duration of the pizza in the oven.

Each execution of a state machine takes a message as initial state and proceeds over the other states defined in the Amazon States Language JSON definition. Reaching a particular defined state can trigger different sorts of behaviour: conditionally selecting the next state to advance to, or executing Lambda functions, for example. A Wait State can be set to read the number of seconds to wait before proceeding to the next state. (Note: state transitions are relatively expensive, so you may want to use them judiciously.)

We will need to give our Bake Function’s execution role the states:StartExecution permission for our State Machine resource. Our states definition should reflect this very simple flow:

The Baked Items Queue

Similar to the queue we’ve defined before: we don’t know yet where exactly our baked items will end up. We can leave them in an SQS queue and allow for them to be picked up from there.

The Item Removed Function

The State Machine cannot directly output to SQS, so we’ll create a Lambda function to be triggered by the Removed state transition. The state machine will need to have an IAM role with the permission to invoke this function. The function will need an execution role with permission to sqs:SendMessage to our Baked Items Queue.

Progress

Our CloudFormation template would now additionally reflect all of this:

Limited Capacity

With great scalability comes great concurrency. State Machines can have up to 1 million concurrent State Machine executions, rate limited at 200 new executions per second. For that to be possible for our oven, it would either have to be really big, or the pizzas really tiny. No, let’s give our oven a realistic capacity limit.

DynamoDB Counter

In order to ensure we never insert items when the oven is full, we need to keep count of how many items we have in the oven at any time. We can use DynamoDB to persist our count.

DynamoDB is a distributed database, with data automatically replicated across multiple availability zones within a region. This means that reads are by default considered eventually consistent, meaning the data you read immediately following a write could be stale — it may take a short while to reflect recent writes. This would not work in our situation, where we will be receiving multiple items concurrently.

We need our read and write operations to be strongly consistent. The best way to achieve this will be to do both read and write in a single transaction. DynamoDB provides for conditional writes — letting us check whether we have available oven capacity, as well as update expressions — allowing for incrementing a value without knowing the existing value, all in one transaction. As with all database transactions, the transaction comes with the trade-off of taking longer — a bottleneck is exactly what we are trying to achieve.

Should the update fail, we’ll know that there’s no space in our oven, otherwise we’re okay to insert the item.

Waiting Queue

So what do we do with new messages when our oven is fully occupied? We can send them to another SQS queue, of course (assuming our pizzas will keep).

We’d want to get the order right. It’s only fair to bake the items in the order we receive them. That means we should always bake items from this queue before new items. It also means we should ensure that waiting items are baked in the order they were received. A standard SQS queue would not preserve message order, but a FIFO (First-In-First-Out) Queuewould. We can configure the Waiting Queue to be FIFO.

Our Bake function can be updated to pull items from this queue, only deleting them from the queue if we manage to allocate capacity and successfully insert them into the oven.

Closing the Loop

When items are removed from the oven, we have to reflect the available capacity in our counter. We can do this by decrementing the DynamoDB counter from our Item Removed function (without the need for a conditional write this time).

Freed capacity means that waiting items can be pulled from the queue. We know we can already do this by triggering our Bake function. Although it is possible to invoke this function directly using the AWS SDK, we can avoid having to think about concurrency during this operation, and building handling of retry behaviours, by using the New Items queue. We’d have to introduce a special item, not meant to be baked, but only functioning to trigger a new SQS event (which in turn triggers our function). We’ll insert this item into the New Items queue. Our Bake function will simply ignore and discard these items.

Our CloudFormation template, and function code, now also reflect this:

All Together Now

We can write a script to see how items move through our solution. We’ll load 100 items, specifying random baking times, into our New Items queue, and check the number of items at every stage of our flow.