We run performance tests in QA. They follow a familiar pattern. A multi-threaded client generates sends traffic to a server. The traffic pattern is configurable: it might be a constant number of requests for the entire test, or it might increase by a constant amount per interval for the entire test. We measure the actual number of requests per minute.

The test's output is a graph with two curves. The x axis is time and the y axis is number of requests per minute. The two curves are expected requests per minute (as configured in the tests) and actual requests per minute. Of course, if the test attempts more traffic than the server can handle, the actual requests per minute will be lower than he expected amount. By looking at the graph, you can get a sense for the QA server's capacity.

I would like to use a comparison of the two curves to score the server's capacity. The score would be a single number that relates how well the expected curve matches the actual curve. I am looking for a reasonable formula for that score.

Here are two formula's I've considered. I don't know how to enter equations here so I've linked to the corresponding Wikipedia pages:

L2-norm, aka Euclidean distance. For each minute, take the difference between the expected and actual values. Square that difference, sum the squares, then take the square root.

Root Mean Squared. Similar to the L2-norm, except that you divide by the number of samples (in my case, the number of minutes) before taking the square root.

For both of these, the score is always non-negative, with zero being the optimum.

Both of those formulas are probably reasonable for tests of the same duration (e.g. two tests of 10 minutes each).

What are some other reasonable ways to quantify how well the actual responses/minute matches the expected responses/minute?

Note: for the purposes of this question, I am intentionally ignoring other interesting metrics (e.g. average response time) that would factor into a performance test. Edit: Once reason for this: I am testing a backend server (rather than a web server) where there is only an indirect relationship between response time and user experience.

1 Answer
1

We use APDEX in order to score our performance tests and take and index. Simply, it counts how many requests are being served under some thresholds. It is quite nice since you can produce an index between 0 and 1 (0: all requests were bad / 1: all requests were good) that everybody, including end users, can understand.

I see APDEX depends on response time. It is easy (and intuitive) to score by response time with a web server. For a backend server, you need to make some assumptions to relate response time to user experience.
–
user246Apr 24 '14 at 17:13

Yes but it also applies if you only want to compute the time that requests are running in the backend server because since it's formula simplicity you only compare expected times (thresholds) with actual ones and compute and index.
–
cadidApr 24 '14 at 17:24