Optimizing Azure Functions to avoid the 300 connection limit

December 2nd, 2018 /
by api2pdf /

Intro

Api2Pdf is a web service for generating PDFs from HTML, URLs, and Offices files. In order to scale to millions of requests, our backend runs on a serverless architecture. Our infrastructure is comprised of three main components:

Web App, located at portal.api2pdf.com and built on .NET Core. The portal allows you to pay for credits and manage your API keys. It runs on a normal Microsoft Azure Web App.

API. This is what you use to actually call our service. It’s the “front gate” to generating PDFs, and it is hosted on Azure Functions, the serverless architecture provided by Microsoft Azure.

PDF Engine. The engine that actually powers the PDF generation runs on AWS Lambda. When you call our Azure Function endpoints, we process the request, ensure you have a valid API key and credits remaining on your account. If authorized, we send your request to our AWS Lambda for PDF generation.

The focus of this blog post is how we optimized the connection between our API on Azure Functions to the PDF engine on AWS Lambda.

Azure Functions, 300 Connection Limit

When we first went live with Api2Pdf, we had no issues with our API at all. However, as more and more customers came on board, we noticed completely randomly, a spike in HTTP errors. Up time and availability is the #1 job for us here at Api2Pdf and even a blip of errors is cause for concern. After all, the whole point of being on serverless is that we can scale infinitely.

After putting in a support ticket with Microsoft, that had informed us that we hit a limit of 300 connections on our Azure Functions app.

Azure has a very good article about managing connections here, which they directed us to.

Investigation

How are we hitting 300 connections? The problem was simple actually. We were making a new connection every single time we were calling our AWS Lambda function to generate a PDF. It was only a matter of time before we’d hit that 300 limit.

Resolution Part I: Use static httpClient

We employed the first refactoring to use a static httpClient and static AWS client library rather than create a new object every single time. This provided immediately relief and brought the number of connections back down to a manageable 30 to 40.

Several months go by and we notice the number of connections is increasing again. We re-read the article and take note that the number of connections go up with every app domain that is added to your Azure Functions pool. So the more demand your Azure Functions app has, it will start to auto-scale for you. But every time it auto-scales, it adds N connections. We figured out app is using about 6 connections at any given time.

We started to get concerned when the app had about 100 connections. We always want room for a spike so it was time for another optimization.

Resolution Part II: Use async

Do not under estimate the power of async functionality in .NET. The key issue for Api2Pdf is that when we make a call to generate a PDF on AWS Lambda, we were making a “blocking” call. It might take AWS Lambda a couple seconds to generate a PDF. Since we were not using async, that means the entire thread was being hogged for the entire duration that we were waiting on AWS Lambda to respond.

The more requests you receive, the more the Azure Functions app needs to scale out to spin up more threads to handle the load. Every new app domain increased the number of connections.

By refactoring the code to use async methods, we now return control to IIS while we are waiting for the PDFs to be generated by AWS Lambda so the thread can be reused to process other PDFs going on at the same time. This is substantially more efficient.

We saw an immediate drop from 90 connections to 12 connections and our thread pool dropped by 50%.

Our Azure Functions app is doing great and will be able to sustain a massive amount of requests for PDFs for a very, very long time.