Thursday, 7 October 2010

Automation benefit measured by EMTE - good or bad?

Being able to run tests that we would not have had time to run manually is one of the benefits of automated testing; we should be able to measure this benefit in some way.

What is EMTE?

EMTE stands for "Equivalent Manual Test Effort" and is a way of measuring the benefit of running automated tests.

If an automated test (A) would have taken 4 hours to run manually, then its EMTE is 4 hours; another test (B) that would have taken 7.5 hours to run manually has an EMTE of 7.5 hrs.

In a test cycle, if Test A is run five times, and Test B is run twice, then the EMTE for that cycle is 5*4 hrs + 2*7.5 hrs = 20 + 15 = 35 hours EMTE.

What is EMTE used for?

EMTE can be used as a way to measure a benefit of test automation (automated running of functional tests).

When tests are automated, they can be run in much less time than they could be run manually. Our tests A and B may be able to run in 5 and 10 minutes respectively, for example. So we can achieve "4 hours' worth" of manual testing in 5 minutes of automated testing. Whenever we run Test A, we can "clock up" 4 hours of EMTE.

Is EMTE a good thing?

Yes, because it is a way to show the benefit of automation.

Costs (of automation as well as other things) tend to become visible by themselves - managers see that people are spending time on the automation. But what is the benefit of this automation? If you don't make benefits visible to managers, there is a risk that they will not see the benefits, and may eventually conclude that there are no benefits. EMTE is one way to make an automation benefit visible.

So how could it be a bad thing?

I have had discussions with a couple of people recently (thanks Julian and Wade) about abusing EMTE, and yes, it can be abused (as any metric can). Here is how it could be mis-used:

"Test A takes 5 minutes, so let's run it 12 times every hour for 2 hours. This gives 24*4 hours of EMTE = 96 hours. This will make us look really great!"

The problem is that after the first run, the other 23 runs are being done just for the sake of looking good, not for a valuable benefit in terms of running that test. This is an abuse of EMTE, and is a bad thing.

What to do about it?

Use EMTE (and other measures of the benefit of test automation) sensibly.

Perhaps only "count" EMTE once a day, however many times a test is run? (e.g. in continuous integration testing)

In what other ways can the benefit of automation be shown? (e.g. more coverage, freeing testers to find more bugs, number of times tests are run, more variety of data used?)

Have you encountered the abuse of this (or other automation) measures? How have you solved the problem? (Please comment, thanks!)

Hi Dot,in Germany and Austria, we have a quote which I try to translate: "I only believe in statics, if I faked it myself.". People think this quote would be from Winston Churchill but there is no proof.I think this is the problem with all statistics. You can use them for a lot of purposes, a number never tells the truth. Why is it bad to come to an EMTE of 96 hours? The point is you have to ask what is the purpose.Some years ago I introduced data driven tests. I used an csv file for my datas. I had 2500 test cases in this csv file. My boss told me he needs for marketing reasons at least 5000 test cases. I couldn't do this in a short time, so I made a joke and told him I can copy and paste all my data and then we reach 5000 test cases in no time, but we don't test anything new. He said this is a good idea and he knows that but for marketing reasons 5000 test cases would be cool. My tester soul cried but who am I to evaluate marketing strategies ?My point is, if you use metrics or an statistics you need to ask yourself what is the purpose ? And you need to be aware that every metric can be misused or faked.

As I commented on your previous post about whether or not automation should find bugs, I think the same thing applies here. Testing (be it automated, manual, done by developer, tester, or management) needs to have a goal. There has to be a point to testing, otherwise why do it? (http://www.wadewachs.com/2010/10/why-do-we-test/ - go there for a bit more discussion on that) Once testing has a goal, all of the tests that are run need to support that goal.

In the article mentioned above, I state that the current company I work for tests our software to prevent defect leakage that ultimately costs the company money. Our goal in testing is to save the company as much money as possible through identifying problems in the software that will adversely affect our customers, ultimately driving them to another company. All of our testing supports this goal.

Tracking how much money we save with every bug we find is impossible, so we use some of the data we have of bugs we have missed in the past to help identify what we could have lost had we let similar bugs into production. These metrics we use tell us about our goals of testing. We track cost in money, because that is the goal of what we are trying to do.

When I see a metric like EMTE, it tells me something about the goals of the tests it is measures. Tracking the amount of time it would take to run the same test repeatedly, tells me that the goal of testing is to run the exact same test as many times as possible. I think that is a very flawed goal of testing. Running a test for the simple pleasure of seeing your screen turn green when it passes is a waste of time. The point of the test should be to build value in the software in some way, however value is determined in your specific situation.

If that test is automated, now what used to take you 4 hours only takes 15 minutes, you can bank the 3 hours and 45 minutes there as saved time by automation, but only if you actually would have run that test in the first place. If you had no intention of repeating the same test you did today on tomorrows build, running that same test and saying you saved another 3.75 hours is not accurate, because you never would have actually spent that time testing.

I think EMTE should be replaced by a metric called AMTE, or "Actual Manual Testing Effort". Count how often the manual test was actually run in the past, then talk about the time saved from not having to manually run that test in the future. If the test historically was only run once a week, then AMTE only measures one test run a week of actual saved time. Sure, we can count the other 10 times it ran that week, and as long as those tests actually help to meet the over-arching goal of the test department, then there is value in those runs, but that doesn't translate into actual hours of time saved.

Like I said at the beginning of this comment, all of our testing should have some goal in terms of creating a better product for our company or our customers. If the metrics we use to measure our testing are pointing in a different direction from our true goal, then we need better metrics. The only direction I have been able to see EMTE pointing is towards feeding inflated numbers to management to attempt to somehow fool them to think automation is more valuable than it is.

Automation is obviously a valuable tool in testing software, but only as far as it helps us meet our goals as a testing department.

You're right, any metric can be abused so EMTE is not any different in its suseptibility to misuse. For example metrics for manual testing can be equally abused if someone chooses to run the same manual test 10 times rather than 10 different manual tests ("because it is easier / more fun / whatever") and yet they can report "Number of test runs = 10".

Perhaps the more likely abuse is that of not measuring automation benefit at all. It is this 'not measuring' abuse that we should be wary of.

EMTE isn't perfect but it is one of many ways that can be useful in the right context.

It seems my previous comment on this topic was deleted. Was that intentional?

The topic of my previous comment was that metrics drive action. You have to be careful what action you are incentivizing with your metrics. EMTE puts the incentive on running tests repeatedly with (potentially) little to no value. If it only takes a few minutes for the new automated test to run, it would be very easy to skew this metric by running the test more often, because it is measuring the number of times the test is run.

To me the question is not how careful you are in measuring this, the question is what is measuring this telling you about your process.

No time to go into the rest of my thought process, perhaps Dorothy can find the comment I submitted earlier this month about the topic...

Aditya, thanks for putting my post on your blog - you have some interesting articles there.

Mark, a good point about abusing manual tests as well, and I agree with you about not measuring also being an abuse. And thanks for coming up with the idea for EMTE in the first place!

Software test consultant - best wishes in starting automation. There's a very good book on the subject.... ;-)

Wade, my apologies - I seem to have accidentally deleted your earlier comment when trying to rationalise a duplicate entry on this topic.

I fully agree that metrics drive action - this is how they get corrupted, as in the famous Dilbert cartoon where the Pointy-Haired Manager will pay for every bug found, and the comment is "I'm going to write me a minivan".

I found an email with your earlier post so will try to summarise what you said.

"Testing should have a goal - http://www.wadewachs.com/2010/10/why-do-we-test/ describes saving money as the goal of testing in my current organisation."The flaw with EMTE is that it implies the goal of testing is to run the same tests as often as possible."If a test takes 4 hours manually and 15 minutes automated, the 3.75 hours is not saved if you wouldn't have run that test anyway."We should only count time saved that would actually have been used to run tests (Actual Manual Test Effort) - if a test was normally run once a week, we should only count that, not the other 10 times it was run in the week."The only goal I see EMTE point at is towards feeding inflated numbers to management to attempt to fool them into thinking automation is more valuable than it is"Automation is a valuable tool but only as far as it helps us meet our goals as a testing department."

I hope I have reasonably summarised your points, and I agree that EMTE can be mis-used by running tests for no good reason.

Your blog article asks "Why do we test?" - a good question. So I ask: Why do we automate?

The main reason is to be able to execute tests in a more efficient way, to be able to run tests that people don't have time to run.

The benefit of automation is not just in being able to run our test once a week, as we could manually. Running it twice a day means that we can find bugs (or gain confidence that there aren't any unexpected problems) in half a day instead of every 7 days. Surely this is beneficial? And having run the 4-hour test 10 times, this is the equivalent of 40 hours of testing, providing that the tests were not run just for the sake of running them, but as part of a continuous integration, for example.

I don't think this is an "inflated number", I think it is a reasonably accurate number which does represent one aspect of the value of automation. If the goal for the testing department is to run tests efficiently and more often, then EMTE reflects that and is a useful measure if used sensibly.

Apologies also to Kai, who also left a comment on the other version of this entry. His comment reads:

"Hi Dot,in Germany and Austria, we have a quote which I try to translate: "I only believe in statics, if I faked it myself.". People think this quote would be from Winston Churchill but there is no proof.I think this is the problem with all statistics. You can use them for a lot of purposes, a number never tells the truth. Why is it bad to come to an EMTE of 96 hours? The point is you have to ask what is the purpose.Some years ago I introduced data driven tests. I used an csv file for my datas. I had 2500 test cases in this csv file. My boss told me he needs for marketing reasons at least 5000 test cases. I couldn't do this in a short time, so I made a joke and told him I can copy and paste all my data and then we reach 5000 test cases in no time, but we don't test anything new. He said this is a good idea and he knows that but for marketing reasons 5000 test cases would be cool. My tester soul cried but who am I to evaluate marketing strategies ?My point is, if you use metrics or an statistics you need to ask yourself what is the purpose ? And you need to be aware that every metric can be misused or faked."

Thanks Kai - very good points. And very sad to hear that your manager is encouraging the blatant mis-use of a metric!

About Me

I have been in software testing for over 40 years. Now an independent consultant in testing and related areas, I am available for short-term consultancy and to speak at seminars and conferences.
I was Programme Chair for the EuroSTAR Conference for 2009.
The new book Software Test Automation Experiences, with co-author Mark Fewster, is now available (on InformIT or amazon).
The 3rd edition of Foundations of Software Testing is also now out, with Rex Black and Erik Van Veenendaal.