How to Load Test on Heroku

Max BrosnahanNovember 16, 2018

Over the past year, my team built and deployed a revenue-critical, product purchasing app for one of our largest clients. Deployed to Heroku, this new version of the app needed to handle 10x the volume of customer orders of the prior app, for customers making an initial purchase as well as all the subsequent background processing. Combine that with potentially large spikes in demand during peak traffic periods, and we needed to perform load testing in a serious way.

Identify Your Rationale for Load Testing

There are many reasons why you might perform load testing (besides just wanting your app to work). Here are some other common scenarios that warrant this work:

You’re building a new app and want to ensure top performance

You’re modifying an existing app with problem areas to isolate and fix

You’re supporting a growing network of users or partners

You’re adding features and functionality and need to maintain performance as you scale

You want to avoid Heroku overload problems*

You want to tune your dyno configuration to achieve the most efficiencies

*If the request takes longer than 30 seconds, then Heroku will kill it, which means that the user will have to try again. This may cause even more load on the system.

Set Load Testing Goals

Now that you have a general idea of why you’re load testing, it’s time to get more specific by mapping out your app’s workflow and asking questions. This process will prepare you to best simulate those high load scenarios. You’ll also discover the key metric to track and benchmark. Your load tests will inform you whether this metric’s goal is attainable.

As an illustration, let’s assume we’re building a frontend application that integrates with a backend application, which in turn is sourcing some data from other systems.

1. Map the workflowMapping out your application’s workflow is important from the start. This process will help you create a realistic simulation that will drive your load tests. You’ll determine which endpoints within your application will experience higher levels of load in production and where you might want to isolate your load tests later (i.e. to avoid impacting payment gateways or downstream integrations). Finally, this process will help you identify key metrics to benchmark, and which information you’ll need to collect as you build and deploy the program.

Here is an example of a portion of a workflow:

2. Ask questionsAlong the way, you’ll need to ask and answer a set of important questions to help you craft the simulation. For example, suppose you are building an app to support product orders from 40 thousand customers. You might ask questions like these:

What does 40 thousand customers mean in reality?

Will 20% of them be very active, with 80% using the system infrequently?

Of those 20%, how many system requests will they each make?

How fast do we want to service the load?

Do we have a performance target? (i.e. should certain services load in <500 milliseconds?)

What are the priorities?

What are the critical numbers to test?

3. Choose your key metricBased on your goals and workflow, you’ll want to identify a key metric that you’ll use to benchmark performance. In our case, we chose the number of customer orders per minute. We knew that we needed our system to process a high volume of customer orders, or else our system would crash and the company would lose revenue.

Choose the Parameters to Test

Now that you have created your workflow and chosen your key metric, you’re ready to set up a simulation that is as realistic as possible.

You might have to guess a bit, but start by basing your numbers on historical information while building in a healthy buffer. In our case, we knew how many orders per minute we had received in past years, and estimated that we’d have to achieve 10x that volume. We also needed to estimate the number of products people would buy per order.

As you develop your parameters, consider abandon rates or other nuances of your workflow to create a realistic user funnel. In our scenario, we were dealing with ecommerce and had to account for attrition. Based on Google Analytics, we had to determine how many people would drop out of the process in each step. As a result, we set up our simulation to mirror this reality by dropping the volume of requests, so that the final endpoint in the workflow received only 20% of the volume of requests of the first endpoint.

Load Test Tooling

After much experimentation, we recommend Gatling, an open source load testing tool that has worked well for us. (Note that you have to use Scala to write the simulations.) In addition to Gatling, we also built and deployed a small app to Heroku that works with Gatling and helps customize the simulation. Whereas Gatling runs the simulation and displays the results, the app we wrote takes a series of steps to ensure that the test will run smoothly, such as:

Ensuring no other load test was already running

Disabling integrations in order to isolate the test to the right systems. Many integrations have a testing environment that is different from their production environment in terms of performance. If we didn’t isolate these integrations, then we would see false positive errors and poor performance.

Scaling the dynos to the necessary levels for the test

Starting Gatling and collecting the metrics

Stopping the test run, building the report and re-enabling the integrations

We ran our load tests on 15 dynos, and combined each of those results into one set to get the full picture. Our tests ran in a particular region of the US, but if you wanted a wider geographic test, you could set up load testing on different Heroku apps on different regions, depending on what you need.

Perform the Load Tests

Though your most accurate, full load testing programs won’t happen until your app is fully functional from end to end, you can, and will, still want to isolate parts of your system to start testing certain components earlier.

Isolate systems that aren’t ready for testingIn your simulations, you may encounter parts of your workflow that impact other systems that are needed in production, or that are still in development. In those circumstances, you can write code to bypass specific parts of the workflow that might throw errors or create issues unnecessarily.

For example, our application interacts with a payment gateway in production—we didn’t want to include the interaction with our payment gateway in our load tests. To do this, we established the time taken for a full roundtrip interaction with the payment gateway. This is ideally done by measuring as opposed to guessing. Then we wrote code that would bypass this part of our application to simulate the presence of a payment gateway without it actually being there.

Avoid load testing during deploymentsEnsure that the application under testing is not changing in ways that will alter the results; for example, by deploying new code or making changes to the application’s configuration. These activities can cause the app to become unavailable, resulting in a spike in response times during test runs.

Interpret the Results

In this example CSV file, you can see the key metric results. Each line represents a one-minute period.

The data below shows sample output from the load testing tool. The Total column shows the number of requests for each phase between the start and the end of the workflow. The Response Time columns show performance times of the 95th and 99th percentiles in milliseconds. If your performance budget is <500 ms for 95% of the requests, and the load test shows that you’re over that threshold, then this data highlights an endpoint that needs attention.

Learn and Tune

As illustrated above, properly executed load tests will help you note where requests are being made, how long they are taking, and which endpoints are taking way too long and may offer opportunities for performance improvements. As you test and iterate, you’ll continue uncovering areas to tune performance and fix errors.

Isolate problem areasSometimes you’ll want to perform a highly focused load test by adding a new endpoint and slamming it with usage. Other times, you’ll recognize that a particular kind of data undergoes changes very infrequently, which means that this data is a great candidate for caching rather than hitting the dynos. We achieved many performance benefits this way.

Prioritize job queuesIt’s also wise to break up job queues such that high priorities (such as password resets) are treated most urgently with others next in queue, such as order fulfillment or data synching between various systems. You want to be responsive to the user while still being timely with everything else.

Dyno Considerations

As I stated earlier, we used 15 standard dynos for our simulation. As you run simulations, your dyno counts and configurations are important. This is an opportunity to test out the number and type of dynos and ask questions like: Does switching to fewer larger capacity dynos have a positive impact on performance? As you test, make sure you have room to play with your storage and processing to see what works and what doesn’t work to achieve the buffer your need as you continue to grow.

Find Your Ceiling

As you test different simulation scenarios, it’s really easy to experiment and see the impacts of those tests. Remember that you’re looking for not only the expected system load, but also the right amount of buffer. Where is the ceiling? If you double your expected load, then do you start to see a jump in error rates?

As our Heroku colleagues recommend, it’s also important to test what happens if you experience sustained load: a steady amount of heavy traffic over a couple of hours. What starts breaking? Check your add-ons, your database behavior and your background jobs.

Load Testing Monitoring Tools

Throughout this process of testing and iterating, it’s important to have good monitoring tools. Those metrics inform your simulations and tell you where to focus in order to improve.

New Relic for Performance MonitoringWe use New Relic for application performance monitoring. New Relic shows you how much time it takes to run various components of the app. You can observe which endpoints are behaving slowly, and view how much time requests are taking throughout your workflow.

Librato for Exposing Custom MetricsWe use Librato to display a dashboard of standard and custom metrics that we proactively gather and use to monitor the apps. For example:

How long are specific steps taking, in different percentiles?

How many requests are being made?

What impact is the load is having on the dyno memory? the dyno load? The Postgres database?

Are we within the recommended boundaries of each component so we have room to breathe?

How long does it take the payment gateway to respond?

What is the queue time for confirmation emails?

Proactively Collect Metrics

As you develop expertise in load testing, start writing your code to automatically collect data as you build your apps. Think about what will be beneficial to collect as you go. For example, we wanted to know queue times and run times for all the background jobs. So we set our systems to automatically collect this data in a generic way, and then we started seeing this information exposed in a tool like Librato.

Continuously Improve

As you continue to simulate, test, iterate, and scale, load testing is a great way to ensure your performance will hold up under pressure in production. It’s enlightening to look back at big load events and see how real-world usage compared to the load simulations you ran in preparation for going live. As long as you continue to collect data and tune as you go, and keep the most business-critical metrics in mind, you’ll be in good shape.