Thursday, 6 July 2017

Introduction

I’ve recently been attempting to write some blog posts about my recent experiences with performance testing, each time I try they are very long winded and feel like a mouthful to read. So this is an attempt to provide some quick summarised points, mistakes, lessons and general tips that I’ve learned or relearned over the past 2 months.

Where to start?

Why performance test? - If you’ve been asked to perform some performance testing, find out why. If you’re thinking you might need to, think about why that is. You need this context in order to make sure the performance testing is useful to the other members of your team.

What do you mean by “performance test”? - The phrase “performance testing” encompasess a lot of different kinds of tests and information that you could find out. Do you want to try load tests, stress tests, spike tests, soak tests? Are you looking to test one component, an integration or a whole system? Be aware of people not understanding and meaning these words and phrases the same way. Someone might ask you to perform “some load tests”, but they don’t mean only load tests, they may really mean “can you explore the performance of the product”. They may not ask for stress tests, they may not be thinking of planning capacity for the future, but that doesn’t mean you shouldn’t raise it as an area to explore. People may be concerned about one specific component but actually they hadn’t thought of load testing an integration point too.

What numbers are we using? - Are there NFRs (non-functional requirements) or functional requirements? Is the application already running in live, if so, what does the current load and performance look like? If it’s a new application, what do we expect the load and performance to be? What would we expect it to be next year? In 2 years? What would be “too much” load? What would a spike realistically look like? Are there peaks and troughs in the load profiles?

You might not know everything right now, so start with the basics - Start with basic functionality smoke tests, move on to small load tests that check the acceptance criteria then start exploring around that as you learn more about the system.

You can’t performance test something if you don’t understand how it works - The application might be very fast, if you send bad data. How do you know you’re sending the correct data? What happens when you send bad data? How do you know what good or bad looks like?

Isolated, stable, “like-live” environment - The tests should be run against something that you control, anything could affect performance and you want to control as many variables as possible. You want the environment to be as close to production hardware and configuration as possible so you can rule out issues like the hardware not being powerful enough.

Understand the infrastructure and architecture of your tools and environment - Consider where you are going to run the tests from, what is going to generate the load? Think about where the environment is in respect to that. Try to make sure the load generator is on the same network and isn’t throttled or blocked by proxies or load balancers accidentally (unless you’re testing them). Make sure your tests aren’t affected by the performance of the server generating requests or the bandwidth of the connection.

It’s ok to start with an environment that’s not like-live such as a local environment to help design your tests - this ties in with understanding how it works, but you can design the tests against a smaller environment while you wait for a larger environment to be built. This is useful when you’re trying to figure out how to get API requests just to work and check what to check for in the responses, or tweak the timings of particular scenarios where you only need to run 1 or 2 tests.

Stuff you might need that might take time to get sorted (so get the ball rolling!):

Access to a server to run the load generator from.

Access to monitoring of the servers and application logs.

Access to any databases.

An ability to restart servers and reset databases between test runs.

Access to an environment you can start exploring right now.

Documentation of how the system works.

Mistakes & Lessons

Completely random test data may not be very useful - If the test data is completely random, it means you are running a different test on every run. You can use weighted distributions instead - this is where you give a probability that a particular result will occur. For example, 90% of the time, it will pick one value, but randomly it will pick another value 10% of the time. Why is this useful? It gives you control over the randomness and lets you explore different patterns that might affect performance.

If you’re just designing the tests and want to test them out with small load on a dev environment, don’t guess the numbers to try a small load test - I did this and accidentally brought down an environment being used for UAT (user acceptance testing). I had picked a number off the top of my head and assumed it was a safe number, well, turned out it wasn’t. Always discuss with other people about what numbers to try and warn people before you run any test, even if you think you’re not going to stress the environment, don’t just rely on guess work.

Not all of the data needs to be automatically generated - Be pragmatic and try to understand which parts of the data matter for performance. There may be some parts of the data that have no affect on performance. It’s not always possible to know which parts, but start with pieces of data you would expect to have an effect and gradually include other parts later. Initially I started writing a very complicated automation for generating a variety of data before I realised that most of it could be identical as it wasn’t expected to affect performance.

In tandem with the above point, consider how to discover information quickly - You may spend a long time writing a very complicated performance test that covers all kinds of data and scenarios, only to find that the application or the environment hasn’t been configured correctly. Simpler, quicker tests can be run earlier to discover bits of information about whether you are ready to performance test or discover very obvious issues. Simply rapidly sending API requests manually through Postman may stress test the server and that can be done in a few minutes!

Consider as many user stories as possible - Anna Baik shared this one in the testing community slack that I would never have thought of - Health Check endpoints. One of the users of the system is your internal monitoring which may regularly hit a health check endpoint. This can affect performance! What other user stories are there that you may not have considered?

General tips

Find a way to monitor your performance tests live as they run! - If you’re using a tool such as Gatling, you can configure real-time monitoring. This is extremely useful as you can quickly tell how the test is going and stop it early if it’s already killed the application. You can also do this through monitoring the application through tools such as AppDynamics or using any tools provided by cloud service providers such as AWS CloudWatch. The more information you can have to observe how the application and its hardware behaves, the better.

Treat performance tests as exploratory tests - Expect to run lots of tests and to keep changing and tweaking the tests. Be prepared to explore different questions and curiosities. Treat your first runs of your tests as opportunities to check your tools and tests actually work how you expect. Try to avoid people investing too much in the result of the first load test - you will learn a lot from it, but it won’t tell you “good to ship” first time.

No seriously, it will be more than just “one test” - Imagine if someone asked you to verify some functionality in just one test? Do you really believe you will not make any mistakes and the product will perform as expected first time? If you have that much faith, why run the performance test? If you’ve decided there is value in performance testing, then surely you’ve accepted that you will take the time to run as many tests as it takes to have some better confidence and reliable information?

Errors might be problems with your tests, not just problems with the application - Just as with automated tests, expect there to be mistakes and errors with your tests. Don’t jump too quickly to conclusions about why errors might be occurring.

Separate generating test data from your test execution - Consider what you are performance testing, does the performance test need to create data before it does something else with it? Or is it unrealistic for it to generate load that creates data and does data need to pre-exist? In my case I needed to create 1000s of user accounts, but the application wasn’t intended to handle 1000s of user accounts being created all at once. So I created a separate set of automation to handle building the data prior to the performance test run.

Gradually introduce variables such as different users or different loads - For example, if you have two different types of user - an admin and a customer - try the customer load test on its own and the admin load test on its own before running them together. If there is a significant problem with one or the other, you can more easily identify it. In other words, try to limit how many tests you run at once and how many variables you play with at once.When you run a stress test, measure throughput - This lets you measure how much data you are sending and help you figure out if your stress test is reaching the limits of your machine, the network or the application you're testing.

Test ideas

What happens when the load spikes? Does the application ever recover after the spike? How long does it take to recover?

What happens if we restart the servers in the middle of a load test?

How efficiently does the application use its hardware? If it’s in a cloud service, would it be expensive to scale?

What happens when we run a soak test (a load test that runs for a long time with sustained load, e.g. 12 hours or 2 days).

What happens when we run with a tiny amount of load?

What happens when we send bad requests?

What do we believe to be the riskiest areas and how can we assess them?