In a project where there are non-functional requirements which specify the maximum execution time for a specific action, QA must check the performance of this action on a dedicated machine using precise hardware under precise load, both hardware and load being specified in the requirements.

On the other hand, some erroneous changes to the source code may severely impact the performance. Noticing this negative impact early, before the source code reaches source control and is verified by QA department, could be beneficial in terms of time lost by QA department reporting the issue, and by the developer fixing it several commits later.

To do this, is it a good idea:

To use unit tests to have an idea of the time spent executing the same action² n times,

Conceptually, unit tests are not really for that: they are expected to test a small part of a code, nothing more: neither the check of a functional requirement, nor an integration test, nor a performance test.

Does unit test timeout in Visual Studio measure really what is expected to be measured, taking in account that initialization and cleanup are nonexistent for those tests or are too short to affect the results?

Measuring performance this way is ugly. Running a benchmark on any machine¹ independently of the hardware, load, etc. is like doing a benchmark that shows that one database product is always faster than another. On the other hand, I don't expect those unit tests to be a definitive result, nor something which is used by the QA department. Those unit tests will be used just to give a general idea about the expected performance, and essentially to alert the developer that his last modification broke something, severely affecting performance.

Too many performance tests will affect the time required to run the tests, so this approach is limited to short actions only.

Taking in account those problems, I still find it interesting to use such unit tests if combined with the real performance metrics by QA department.

Am I wrong? Are there other problems which makes it totally unacceptable to use unit tests for this?

If I'm wrong, what is the correct way to alert the developer that a change in source code severely affected performance, before the source code reaches source control and is verified by QA department?

¹ Actually, the unit tests are expected to run only on developer PCs having comparable hardware performance, which reduces the gap between the fastest machines which will never be able to fail the performance test, and the slowest machines which will never succeed at passing it.

² By action, I mean a rather short piece of code which spends a few milliseconds to run.

4 Answers
4

We are using this approach as well, i.e. we have tests that measure runtime under some defined load scenario on a given machine. It may be important to point out, that we do not include these in the normal unit tests. Unit tests are basically executed by each developer on a developer machine before commiting the changes. See below for why this doesn't make any sense for performance tests (at least in our case). Instead we run performance tests as part of integration tests.

You correctly pointed out, that this should not rule out verification. We do not assume our test to be a test of the non-functional requirement. Instead, we consider it a mere potential-problem-indicator.

I am not sure about your product, but in our case, if performance is insufficient, it means a lot of work is required to "fix" that. So the turn-around time, when we leave this entirely to QA is horrible. Additionally, the performance fixes will have severe impacts on a large part of the code-base, which renders previous QA work void. All in all, a very inefficient and unsatisfying workflow.

That being said, here are some points to your respective issues:

conceptually: it is true, that this is not what unit tests are about. But as long as everyone's aware, that the test is not supposed to verify anything that QA should do, it's fine.

Visual Studio: can't say anything about that, as we do not use the unit test framework from VS.

Machine: Depends on the product. If your product is something developed for end-users with custom individual desktop machines, then it is in fact more realistic to execute the tests on different developers' machines. In our case, we deliver the product for a machine with a given spec and we execute these performance tests only on such a machine. Indeed, there is not much point in measuring performance on your dual-core developer machine, when the client ultimately will run 16 cores or more.

TDD: While initial failure is typical, it's not a must. In fact, writing these tests early makes it serve more as a regression test rather than a traditional unit test. That the test succeeds early on is no problem. But you do get the advantage, that whenever a developer adds functionality that slows down things, because s/he was not aware of the non-functional performance requirement, this TDD test will spot it. Happens a lot, and it is awesome feedback. Imagine that in your daily work: you write code, you commit it, you go to lunch and when you're back, the build system tells you that this code when executed in a heavy load environment is too slow. That's nice enough for me to accept, that the TDD test is not initially failed.

Run-time: As mentioned, we do not run these tests on developer machines, but rather as part of the build system in a kind of integration test.

I am mostly inline with your thinking. Just putting up my reasoning with independent flow.

1. Make it work before making it better/faster
Before the code provides any performance measure (let alone guaranteeing) it should be first made correct i.e. make it functionally working. Optimizing code which is functionally wrong is not only waste of time, but puts impediments in development.

2. Performance of a system make sense only on full system
Typically, any meaningful performance always depend on a given infrastructure and it should only been seen under a full system. For example, during the mock test if the module receives answers from a local text files but in production environment it fetches from database, your earlier

3. Performance scaling should be done by objective
Once you have the functional system, you need to analyse performance of the system and find bottlenecks to understand where you need to scale up performance. Blindly trying to optimize every method even before you know the performance of a full system may incur useless amount of work (optimizing methods which doesn't matter) and may create your code unnecessarily bloated.

I am not quit aware of Visual studio functionality, but generally you need broader profiling tool.

I've had similar task some time ago and the final solution was somewhere in the middle between unit-testing and full-blown automated performance testing.

Some considerations in no particular order, which may be useful:

Performance testing by QA was labor-intensive and had it's own schedule (say, once in the iteration), so hitting source control was not a problem.

Our system was large and modular, unit-tests were too granular for our needs, and we've created special "fat" unit-tests carefully crafted to trigger performance problems in the specific areas of interest (they were categorized also, but this is an implementation detail).

Usual constraints for unit-tests still apply: they should be small, fast and to the point.

To exclude test framework influence, they were running by a special wrapper, so we knew exactly how much time the given operation take.

It is possible to write them before the actual implementation is complete (results may be irrelevant or useful, depending on the process, maybe the developers still experimenting with the implementation and would like to see how it's going overall).

They were running by CI server after each build, so total run time should be kept relatively short (if this is not so, it becomes considerably harder to pinpoint the exact change triggered the problem).

The CI server was powerful and had it's hardware fixed, so we counted this as dedicated machine (it is possible to use really dedicated server by using a remote build agent).

The test wrapper collected all relevant information (hardware specs, test names/categories, system load, elapsed time, etc.) and exported it as the reports or to the database.

We've had a gadget for JIRA pulling those reports and drawing nice charts by name/category/build number with some controls (overlay previous release to current, etc.), so the developers can quickly see their impact and managers can get an overview (some red, all green, you know, it's important for them).

It was possible to analyze how the project is going over time by using the collected statistics.

So, in the end, we had scalable, flexible and predictable system which we can quickly tune for our special requirements. But it required some effort to implement.

Returning to the questions. Conceptually unit-tests are not for that, but you can leverage features of your testing framework. I've never regarded test timeouts as a means to measure, it's just a safety net for hangs and such things. But if your current approach works for you, then continue to use it, be practical. You can always go fancy later if the need arises.

I think you're doing fine. This is exactly the point of having unit test timeouts: to check if something is taking way, way longer than it should. There are limitations to this approach, but you seem to be aware of them already, so as long as you keep those limitations in mind, I don't see a problem.