AWS Lambda inside insight (part 3)

AWS Lambda part 3: Call Lambda via SDK

Lambda Invocation without API Gateway

During our previous blog post (https://www.trivento.io/aws-lambda-inside-insight-part-2/)
in this series we saw strange response spikes during the invocation of our Lambda function. Our conclusion was that API Gateway caused those spikes. This week we’re invoking our Lambda function without AWS API Gateway to see whether or not it makes a difference. Like before, we’re running our code on a AWS EC2 instance (m4.xLarge). The goal of this series is to get a better understanding of Amazons Lambda implementation of the serverless architecture. By testing the functions with Gatling, we try to investigate the internals of AWS Lambda to get an idea of what is happening under the hood.

Direct call

We wanted to remove API Gateway from the equation to look at the barest form of Lambda function as possible. The AWS SDK has the ability to invoke Lambda functions directly (AWSLambdaClient API documentation). We don’t know the infrastructure that is used under the hood, but this is as close as we can get to invoking Lambda functions directly from Gatling as we can. We noticed that the status codes we get after calling a lambda function are HTTP status codes. So the SDK is using HTTP under the hood, but the results (see below) indicate that a different mechanism than API Gateway is used.

Gatling

There is not much (next to none) documentation about writing your own protocol in Gatling on the internet. After looking at the API and the implementation for JMS and HTTP in Gatling itself, we figured out how we could create code to call the Lambda function from Gatling and measure performance. This is worthy of a blog post by itself, so we will not go into details here.

All containers hot

To get the best comparison with last week we wanted to perform the test with all containers hot. After we ran the test, the first thing we noticed when we looked at the graph is that there were still some spikes. Spikes can be expected due to network latency, scheduling and the number of user (50 in this case). This time the spikes in the graph were mostly well below the 250ms. But this visual conclusion isn’t very scientific.

When we look at the statistics we can see that we got great results. Almost all requests (99%) were handled within 31ms. The standard deviation is quite low, which means that the service performance is predictable.

Detailed comparison

To better understand the difference between invoking an AWS Lambda function through the API Gateway and a direct invocation, we setup gatling to simultaneously fire 50 users to both scenario’s.

API Gateway

The above graph shows the part of the run that communicates via API Gateway. The second graph is the same as the first, but slightly zoomed in to get a better view of the spikes. At one point we got a spike of over 13 seconds! This large spike changes the scale so much that spikes of 1 to 2 seconds seem small. In the zoomed in view we see a lot of spikes that are well over 1 second! This would be really noticeable to clients.

Direct call

In this section we look at the results that we got by calling AWS Lambda functions directly at the same time as the API Gateway calls. We changed the graphs scale to get the same scale as the graphs in the API Gateway run. In the first graph we see a 5 second spike which we got at the same time as the 13 second spike in the API Gateway. At this time we don’t know what happened there. We need more tests to see if this is incidental. In the more zoomed in lower part we see some spikes, but they stay well below 1 second and looks quite promising. We can probably use this methodology to get a better understanding of the internals. Even though, we don’t know what is exactly in between the call from Gatling and the Lambda Function execution on AWS.

Conclusion

When we take a look at the statistics and especially the standard deviation, we see that AWS Lambda through the SDK is fairly stable. With a mean of 11ms, the 99th percentile at 40ms and a standard deviation is 49ms, we can conclude that the performance is predictably fast. From the same statistics, we can also conclude that API Gateway generates more overhead than we’d like to see. This may adversely affect our results during the blog series. To get the best understanding how AWS Lambda works, we’ll invoke our functions directly with the SDK in our future blog posts.

Next time

Next time we would like to focus more on the internal workings of AWS Lambda. Especially cold-startup times and when are hot containers stopped. Stay tuned for the next blog or feel free to contact us via mail: Martijn van de grift (martijn.van.de.grift@trivento.nl) or Jeroen Gordijn (jeroen.gordijn@trivento.nl)
This blog is a co-creation of Martijn van de Grift and Jeroen Gordijn.

Several years ago I started making websites for family and friends. Subsequently, my
interest in software continued to grow and I start making applications for myself. This is one
of the reasons I chose to study Technical Computing. I am constantly improving my skills in
order to achieve the best result for the client. My goal is to make software that reaches
millions of people. I am especially interested in building applications using cutting-edge
technology.

Gerelateerd

My goal is to build systems that last! A system that will put a smile on the face of people using it and the people paying for it. In my mind the Typesafe Reactive Platform (Scala, Akka, Play, etc.) will be a big game changer in the coming years. It is giving a boost to Reactive programming. Together with DDD (Domain Driven Design) and CQRS (Command Query Responsibility Segregation) we will be able to implement all business requirements in a way that fits the rapid changing world. Helping the business to realize their needs is what it is all about. I have great interest in software languages and technologies and how to use them to help the business in new and better ways.