Thursday, June 30, 2016

It's no secret that I've a huge interest in history. It's one of the reasons I was very proud when my son chose to study the subject at University this year.

This week sees a significant anniversary of a very poignant event to me - the start of the infamous Battle Of The Somme.

Two years into World War One, and things were not going well for the allies of France and Britain on the Western Front. In February, Germany had started a major offensive against the fort of Verdun, which was controlled by the French. The sole purpose of this offensive was to bring French forces into a meat-grinder where they'd take casualties in such numbers it would impact their ability to continue the war, meaning they'd have no choice but to sue for peace.

Desperate to relieve the forces at Verdun by inflicting a counter-attack which Germany would be forced to react to, Britain planned an imaginative and daring attack against the German line. In the week before attack, German lines would be bombarded with over a million shells, including the detonation of a gigantic mine immediately prior to the attack.

British commanders believed this barrage would completely annihilate the opposition - no one would be left alive in the enemy trenches. What would remain for the troops would be to cross no mans land and occupy German positions, preparing them for an inevitable German counterattack.

The first day's offensive was planned in minute detail, with a timetable of follow up waves, who would come forward to support the first waves, together with set later artillery support for where the Germans were anticipated.

On July 1st 1916, the massive mine was detonated, and officers blew their whistles to "go over the top" into no-man's land. Within minutes the problems in the plan were being exposed - unknown to British generals, the Germans had deeper trenches, with better fortifications than their British counterparts. As the British army marched in formation into no man's land, the Germans on the other side were already in position, able to cut down whole swathes of men with their machine gun positions.

But the real tragedy of the Somme's first day was the inflexibility of the plan. With wave after wave of later formations being forced to stick to the plan, and follow a previous wave which had been annihilated just an hour before. What we now would describe as "doing the same thing over and over again, but expecting different results".

The grave of Horrace Iles, who died only 16 on the first day of the Somme.

It is still remembered as the worst day in British military history - with 60,000 casualties, including 20,000 dead. The inflexibility of the plan, which saw so many futile losses even when the shortcomings had become obvious would tarnish the Generals involved as uncaring leaders "far from the trenches" more wrapped up in dreams of glory than for any care of their men.

Such a view is perhaps somewhat unfair - the disaster of the first day of the Somme was a somber wake up call to what was still a relatively new kind of warfare compared with the wars of the 19th Century. Tactics did change (although arguably perhaps not fast enough) and led to ideas such as,

Development of the tank armoured vehicle to attempt to break the stalemate and cross no-man's land

Use of telephone and radio communications to signal back the situation to generals behind the trenches

Integration of artillery to support the movement of infantry, particularly "creeping barrages"

Use of staggered formations to approach machine gun positions instead of approaching in formation

Development of grenades and portable mortars to support troops against dug-in positions

The Battle Of The Somme (which lasted until November 1916) did succeed in taking pressure off the French at Verdun, and prevent the collapse of the Western Front, but at a terrible price. Sadly, even on it's best day, trench warfare remained a bloody and costly business.

To me, the Somme is a ghastly reminder about planning. We often think that the more detail a plan is scoped out in, the better. However there are always "things that go wrong". Hence probably the greatest weapon developed was greater flexibility for junior officers to appraise and make their own decisions, over following a plan. This is the hallmark of all modern armies, but one which was forged in blood.

Reading a lot of American testimonies from World War Two, there is a lot of criticism of the British army for being "overly cautious". It's impossible to pin that comment to a single root cause, but after the costly disasters of World War One, caution seems to have worked into the DNA of army thinking.

Resources

Dan Snow's History Hit is doing a series of podcasts on the Somme this week with some of the world's experts. The series has taught me a lot of things I'd never knew. Start with part one here.

Paul Reed's been sharing a lot of poignant material on Twitter, and his blog is well worth checking out. He talks to Dan Snow here.

YouTube channel The Great War is well worth checking out. Here Indiana Neidell looks at General Haig, the man behind the British offensive in the Somme. He also looks at Arthur Currie here who suggested some changes in tactics such as flexibility of units.

Wikipedia has an interesting page looking at the development of tactics for trench warfare here.

Although just a dramatisation, this Blackadder sketch is well worth a watch,

Wednesday, June 29, 2016

In this next section, we're going to start to explore the use of automated GUI checking. I'm going to focus this down onto browser based GUIs for ease. Indeed most applications today are web-based (at least the ones I seem to test), so this isn't a major assumption.

To help us get there, today I'm going to introduce you to the developer tools on your browser.

For every browser the developer tools are slightly different. For Firefox and Chrome currently all it takes is selecting F12.

These tools allow you to peek at the code that makes up a web page. One of the most useful is the inspection icon which allows you to select an item from the page, and it will show you the code and the name for it. The icon looks like this,

When I select the Google search box on my browser, it shows me this,

And here I am selecting my gender as female for Facebook,

The starting part of thinking about driving GUI tests is understanding some of the page elements of your system. It's something we probably do intuitively as testers, however automation requires we specifically state actions (as we've talked about previously).

Here are some common page elements - if any seem unfamiliar, spend some time looking them up.

Labels and text - any textual information displayed to the user

Fields - this is where you as a user can enter test into

Button - an item on the page which causes an action when selected

Checkbox - an item which can clearly be toggled between being selected and deselected

Radio buttons - a cluster of buttons which when selected will deselect any others in the group which have been selected. [Bonus - tell me in the comments below why they are called radio buttons]

Drop-down list - the user is presented a list of items to select

Next time we'll take the next step, and use this understanding of web elements to perform some basic steps.Extension material

Look up the web elements I've talked about, and think about how they can be manipulated. Drop me a line below on how radio buttons got their name!

Find out what the developer tools are like on other browsers.

Use the developer tools to select different page elements on a range of systems - especially anything you test at work. Have a go at reading the information about the elements, and don't be afraid to try and demystify what you read by asking around or Googling.

So far we've looked in depth at unit and API testing - sometimes it's got a bit technical, and I know some of you will have thrived on this. Others of you will have found it hard going, and today we're going to champion your place within automation, looking through another experience report, and giving you some guidance on what you should have access to on your project as a tester - even if you don't program the scripts!

Revisiting Automation As A Service

Let's start by looking at the automation as a service model we've previously looked at. Of of the reasons we evolved this model was because of the terrible friction we've seen on the question of "who should automate". Friction which just hasn't seemed to have gone away. Fundamentally the arguments go that,

Developers write maintainable code

Testers known the best scenarios to check

Some developers have good testing skills

Some testers have good development skills

Automation then seems to live in a no-man's land between testing and development.

Going back to my original automation experience report, where testers were in charge, my team could create superb ideas for checking. But the resulting automation scripts were not really very maintainable due to the poor level of coding skills.

On the other hand, on a recent project a few years ago, developers were solely in charge of our checking automation. They would write their own automation scripts, but we testers didn't really know what those scripts covered, or even whether on any day they've passed or failed. Developers likewise occasionally just ignored failures, because "it always does that", or they felt they had other more pressing priorities.

As I've said, the problem is, I'm still seeing this kind of squabbling on Twitter about "who should automate". However, it was the coverage in Lisa Crispin and Janet Gregory's book Agile Testing which was the first time I saw a model where the automation responsibilities were shared between both testers and developers to get the best outcome possible for the project.

Their work was really behind the model of "automation as a service" which we evolved for an established project where we had to build the automation checking from the ground up to cover an already mature project. We took Lisa and Janet's work, and refined though consultation and retrospection (after some prototyping) with different roles within the team to find out what they needed/expected.

What we found was testers and developers needed different things from automation. A model that only served one, didn't serve the other. Most importantly, there became elaborated this role of automator, who could be a developer, or maybe even a tester, depending on their skills. But their role was creating and maintaining the automation framework, but key was that they were helping to serve the needs of developer and tester alike.

With this key role defined as outside of tester and developer, this freed us to focus on how testers, especially the non-technical tester could contribute to automation. [We assumed if the tester was sufficiently techinical, they could take on the automator role as well]

Defining scripts

On our project, the developers were clearly initially the ones suitable to take on the role of automator (newsflash: this too has changed over time). So they were given a trial period to "get automating", creating automating checks as they saw fit. About a fortnight into this, we all met up again, and started with the basics ...

Did you cover login with automation?

Yup!

Excellent ... what kind of scenarios?

Erm ... you can login of course.

And?

That's it, why? What else is there to check?

How about if I give the wrong password it doesn't log me in. If I log in incorrectly 3 times it locks my account.

Oh. We'd not thought of that - anything else?

This was a really important conversation, because I saw in the eyes of developers a light bulb go off, and they could see how testers had some important knowledge to drive what kind of checks we should be automating.

In Agile Testing, Janet and Lisa talk a lot about using Cucumber to define tests. I think one of the few criticisms I have of the book was it's too locked into that tool - something Lisa was keen to avoid when writing More Agile Testing, because technology changes just too rapidly. [For instance, cloud storage and even Skype wasn't really much of a thing back when the original Agile Testing book was being authored]

For my team, they didn't use Cucumber, but instead built up a Wiki where testers specified tests and developers would make a note in the Wiki when they were built. Pretty simple, but it worked, and fulfilled that need we'd defined for testers to,

As a tester, even if you're not technical, and some of the ideas of this series have gone over your head, you need an area where you can lay out your ideas for automated checks, It doesn't have to be complex or coded, but as we've talked about it needs to simplify to a simple yes or no.

The developers left alone would come up with a shallow example, and maybe that's okay for some features, but features like login it helps to check a bit more. You need to be able to specify scenarios such as,

Correct details logs user in

Enter a username and correct password.

You are logged in

Incorrect details do not allow user in

Enter a username and incorrect password.

You are not logged in.

Entering incorrect details consecutively three times locks an account

Enter a username and incorrect password three times.

Enter a username and correct password.

The system warns you your account is locked.

Only three consecutive failed login attempts lock an account

Enter a username and incorrect password twice.

Enter a username and correct password.

User is logged in.

Log out.

Enter a username and incorrect password.

Enter a username and correct password.

User is logged in.

Exploring passed and failed checks

In automation we tend to be very obsessed "looking at the green", ie the automated tests which pass. If we see a sea of green, everything's good. And we tend to focus on building a system especially in trials which focuses on showing us that everything's working.

Trish Khoo has a fun talk on YouTube here about intermittent issues she explored in an automation suite, it's well worth watching it all, but if you're busy find time for just the first 3 minutes where she talkesabout a company she worked where every build passed it's checks. Then one day they noticed how fast the test suite ran (30 seconds) ... turned out that the automation suite used only skeleton scripts, which were only coded to return passes (not to actually, y'know, check anything).

This talk was quite influential in our thinking, and reminding us that we needed to look into any automation run and work out actually work out what was happening. Not quite as easy as it seemed! For instance, we started using a numerical structure of "script 1 / 2 / 3" ... which meant we needed to cross reference the Wiki to translate "script 5 failed which is the one where ...".

So we started to use meaningful titles which linked back to the Wiki such as "Only three consecutive failed login attempts lock an account". Each log had screenshots, which helped, but the logs were a bit too technical, so it was hard sometimes to work out exactly what was going on, and why the fail. So we aimed to make the logs more readable especially to the non-technical, and would include the information used.

As all this started to fall into place, the testers also (much like in Trish's talk) started to notice that some of the scripts which were always passing weren't quite right. Sometimes they'd miss a vital final assertion step (the thing that was actually supposed to be checked), or weren't doing the final commit from a vital back-office review. Some of these things were found quite late, with testers thinking "oh, we have a check for that scenario, we're covered" - but sadly there was a defect lurking hidden in that last, missing assertion (we'll come to this again later).

Rerun scripts

Finally and simply, if a tester noticed something simple like "notice these checks failed due to a network problem, it's back up now", they needed to just be able to rerun a suite (or part of it), not have to wait for a developer to do it for them.

All these experiences became the centre-piece for our defined tester role,

If you're a tester on a project, particularly an agile project, and your automation process does not allow you

To define checks you want to be scripted up

To view whether checks have passed or failed, including the detail of what the automated check does

To be able to rerun the automation when it fails

Then you need to have conversations with your team about this. It's really important that testers are not shut out of the process, even if they're not in the automator role. Because being able to input to this process, and see what's going on is pivotal to allowing manual testers to make decisions on where to focus their attention when they do manually check.

An automation process where manual testers perform the same checks as the automation has failed to bring any efficiency to the testing process. Manual testers need to work with and beyond the automation - something we'll look into much later in the series.

Wednesday, June 22, 2016

I was initially somewhat amused to find my Facebook account suspended on Tuesday. Basically it was telling me "someone has complained ... you must use your REAL name".

So I dutifully entered by real name, Michael Talks. Each time it rejected it, reminding me to use the name I was born with. Erm, yeah - people have been joking about it since school! In fact it's so bad that a simple word like Talks, I have to spell it to everyone who I have to give my name to. Because they just don't believe it. This bring out my inner Samuel L. Jackson, and I often look a little like ...

Finally on the third attempt at using my real name it kind of went "okay - prove that's really your name". So I had to upload a copy of my driving license, smudging critical numbers, and Facebook promises me if they validate me they'll delete all copies of that document.

Until I'm authenticated, I can't get in and all I can see is ...

That just left me with the question of who complained. Most of my Facebook friends are from school, university or work - they know it's me. Most of my posts of late are cute dog pictures to my wife especially, who I'm trying to convince to let us have a dog.

As you can imagine it's been a week of people trying to re-evaluating the facts about gun violence, especially in America, A lot of people want to have a discussion about what should be allowed in terms of guns. No-one's talking about banning them entirely, just about "why do you need that assault rifle?".

So on once such post like this on Betty Bowers,

Talk about bringing out the crazies! One person posted this rant about "I need my assault rifle to protect my family from tyranny". A kind of abusive, somewhat white supremacist stance.

My first reaction with my ironic British humour was, "you don't think Red Dawn was real do you?" (sometimes I'm not sure). My response given their temperament was much worse "Martin Luther King took on tyrant without needing a weapon".

I just painted a target on myself with that one. Maybe calling the Civil Rights movement one against tyranny is a bit much - but you know those on the side of Civil Rights movement often had riot police and National Guard called to stop them. They were intimidated, and frequently put in jail. Their supporters were occasionally murdered and subject to terrorist bombings - many of which were not initially investigated.

Anyhow, before I knew it, there was a whole load of abuse being put at me for that comment. That's pretty much where I made my second mistake, I ignored it.

It all looks like I managed to infuriate someone really bad, because the next thing I knew on Tuesday, Facebook was investigating my account with it suspended. Because I ignored their other intimidation, they had to find a way to "get me" and remove my inf amatory comments. Not only did I remind him of racial equality, I said that "guns don't stop tyranny". Especially in an age where there are remote controlled drones which can eliminate you by remove control from the other side of the world if that "tyrant"(I think what you and I would call Obama) in the White House wants. You also live in a country where your nation spends more on defence and armament than most of the world combined. But you're going to keep them at bay with that assault rifle you got on special at the Ammo Hut?

In the scheme of internet hostility and troll-dom it's not a big deal. Even if Facebook never lets be back on, I'll survive.

What it is though is a nasty reminder of human nature. We've talked a lot this year about fallacies and facts. It's easy to get into fallacious thinking - in the light of the latest shooting, some of the same facts have been circulated again.

But we sometimes make a mistake thinking people want to make rational thinking. Having an assault rifle makes me feel powerful and safe, so don't you dare use facts on me. When you try and force any kind of conversation about this on a rational level, the lack of facts on the fallacious side erupts into outright hostility and hatred. I was saddened last week to be reminded this isn't a uniquely American problem, as MP Jo Cox was gunned down in the UK "as a traitor" for trying to persuade people to vote to remain in the EU in a current British referendum on the subject.

It's sad, but very human, that we cling to our fallacies so violently. Today, I have no easy answers, but to say it's a thing.

Previously we've been looking at ways we can use APIs to aid our scripted automated testing. There are two additional ways APIs can be helpful which I want to cover.

The first, using them to perform load tests, I'll cover much later in this series.

The second, using them to aid manual testing, will be our focus today, and really be a little off-road from the them of automated checking scripts, but worth covering for completeness.

Thus in our series on APIs we've been talking about the architecture at our example institution, SimpleBank,

With the testing we're doing, we've been taking a top-down approach. Replacing a module at the top with an API call, and using it to drive the behaviour of our system, and compare it against expectations. Pretty much like the ultimate remote control of the bank systems.

It's a pretty important approach. Unlike unit checking, more of your system is being checked - your system under test resembles more that of your final system than with unit testing. And yet you'r avoiding a lot of overhead of sending complete web pages of information back and forward across a connection. Sending requests is pretty light work as we've seen from our example API calls.

However we can also choose a bottom up approach. Let's say for example that we're radically reworking the internet banking function, and it's going to take a while to redesign the pages. This modification will also change the API calls on which the pages are dependent - so we can't test until both are done?

API tools allow you to run a mock service,

This allows you to fool your web project that it's connected to a banking system. When your internet banking application makes a call, your API tool will return a predefined response, "a mock reply". You set up in your API tool a series of mock calls, which will typically always return the same response for a function call (although you can customise them a little).

Hence for instance, you can do a balance request, commit a payment transaction, then check your balance again (which is static and unchanged). Your balance always returns the same number you've defined when you set up the mock service.

The focus here isn't on the change of numbers, but that entering your data, and pressing buttons, the requests are fulfilled as expected, and it doesn't fall over in the browser side. This should make for less problems once you do link your new internet banking pages with the modified back end, because you'll have removed obvious problems easy.

Tuesday, June 21, 2016

Last time we took an initial look at API's and how to do simple checks with them.

Today we're going to continue to look at our sample application, SimpleBank, and come up with some more checks we can do which will explore more what you can do.

In most API checking tools, you can daisy chain a number of requests together, and perform an assertion on it. We were doing this in a way before, when we'd do an initial load of credit onto an account, and we'd expect to see the TransactionSuccess field return as 1 or True.

Ideally with any automated check you want the system to have an expected return, and to check the system response against that pre-programmed and expected. Soap UI, in line with JUnit calls any check which results in either a pass or a fail interpretation as an assertion.

Use of assertions allows you to run a huge volume of checks, but you only have to worry a human being about the ones which come out different as expected.

This is in essence the basis of our First Iron Law of automation,

If you rely on a human interpreting everything, you've got yourself a nice aid for testing, but you're not really embracing the power of automation. You've got a superfast machine, that relies on human interpretation. Worse still, you're tying up time of a human operator who could be better used doing manual exploratory testing.

Add-in another API tool layer

Last time we'd replaced our inter-bank service with an API tool layer we could drive our checking from.

Today we're going to simplify it a bit more, because it's only really the service layer and below we're interested in checking...

This time we've added another API tool substitution which replaces the internet banking service. Here we're again interested in only one function today, BankBalance, which looks like,

Where,

BankAccount is the account number we want the balance of

Where,

TotalBalance is the balance of the account requested (negative if overdrawn).

FailCode returns 0 if account number was found, else 888

We now have the potential to run some more interesting checks on our system combining last time's BankTransaction call with today's BankBalance call.

Most importantly, we can do what Soap UI calls "property transfer" or using the response from one function elsewhere.

I'm going to specify two such scripts, which use a combination of these API calls, together with transferring values.

Check 1: Apply a credit

This is what you can put together as a scenarios to be automatically checked,

If any of these assertions is not true, the script will fail, and alert the user.

Again, you don't have to feel absolutely comfortable writing all that up in Soap UI - look back to our series on automation as a service. The person who writes these kind of scripts we defined as the automator ...

But don't forget we separately defined the role tester, and their role was important too,

An important part of the tester role in automation was being able to specify a good script that could be automated. We essentially did that both today and yesterday. However today we did it without using any pseudo-code, and that's okay as long as our intended sequence is totally clear.

Monday, June 20, 2016

Previously we were looking at unit checking using frameworks such as Junit - they allowed us to perform very rapid testing on a small component. Today we're widening our scope looking at API automation tools.

Now I really don't want to seem to be old, but back when I started programming as a kid, most "systems" we programmed were a single application/program.

These different layers communicate to become "one system" through API or application programming interface protocols. Before you know it, you've created a monster ...

Let's consider a simplified bank, SimpleBank ...

Lets start from the bottom and work up, as with most multi-layered architectures. Typically in such structures, your bottom is your most secure layer, which contains your data, and your top layer is how everyone accesses it.

Data access layer - this is the database, which keeps record of customer details, individual transactions, balance.

Business layer - this one works out thing such as interest, works out if you have enough in your balance to pay a payment request and then debit your balance, applies a credit to your account etc. It's essentially where all the bank's business rules are applied linking requests from the service layer to records from the data access layer

Service layer - here the requests from all the different interfaces to be processed are managed.

Payment gateway - this is where payment requests (EFTPOS in NZ) are recieved from

ATM - where balance and withdraw cash requests come from

Internet banking - where balance, transaction details and payment requests can be sent from

Inter-bank connection - where once an hour credits are received from other banks, and payments sent out to other banks.

I've worked on similar systems, and believe me, they can get scary. How do you get such a complex system into a test environment to test? For instance, I've only done one project where the "inter-bank connection" was actually connected to other banks - because think about it, that means you need another bank's complete application linked/part of your own test environment.

API tools allow us to select a point in a system, send mock messages to the incomplete environment, allowing us to mimic a completed environment. We're able to run our checks against a much more completed system than in our unit tests previously, where the checks were run only against each code component.

All that might sound confusing, especially if you're new. Which is why we're going to explore it with an example.

So take our simplified bank - we want to test the bank system really from the service layer down. As we said, we don't have inter-bank connections to replicate other banks in our system, so we'll need to mock that as a service.

Now between the inter-bank and service layer there is only really one key message we're going to worry about in this series, BankTransaction, which allows flows of money to and from SimpleBank. This inward message (to SimpleBank) looks like this in our tool,

Where,

BankAccount is the bank account number of the relevant SimpleBank customer for whom the transaction applies.

CashAmount is the amount of money involved

TransactionType is either CREDIT or DEBIT.

Once received and processed, there is a response send back,

Where,

TransactionSuccess indicates if it was applied alright, 1 for success, 0 for a fail

FailCode if the transaction fails, it provides a code to give more information

Typically within most API automation tools, you can configure a suite of API call checks, and determine the set responses you want.Let's test some simple business rules at SimpleBank

Here are some basics we'll check today. At SimpleBank, all bank credits and debits are applied immediately, with no clearing time at all!

However customers are limited to being able to withdraw more than $1000 a day (that's NZD - not that it really matters, but I'm a Kiwi). If you break this, you get an 888 fail code.

Scenario 1: Maximum withdrawl

Pretty simple and obvious, but we'll first apply a credit of $1010 dollars, then withdraw $1000. This should go though (only just).

Scenario 2: Withdrawing too much

Don't we love testing a boundary - so this time is the same as the above scenario, but this time we try and take out $1001, which should fail.

Scenario 3 - lots of transactions

A favourite check of mine. Give an account $1010 as before, then try two $500 withdrawls, followed by a $1 one. The first two should clear, the third should cause an error code of 888.Scenario 4 - I can spend more money tomorrow

Not everything is suitable to automate. Are you thinking of a check which sees it you can spend more tomorrow if you've already spent $1000?

This is actually a bad choice of check to use the API for. We can't control the time, so it would be a 25 hour test. Would be better as a unit test, if we can mimic a change in time, or alternatively as a manual test. But a test which takes so long to run, esp when it's past of a suite of tests, isn't great.

Next time

We'll be looking at more in depth things we can do with such tools in our SimpleBank example. We will then round out looking at API's by asking that question "I'm not technical, how much do I need to know?".

If this is worrying you, look at our scenarios 1-4, and realise at their core, they are just sensible scenarios we'd want to do manually. It's just we're using a mechanism. We don't always have to understand the mechanism as testers, but we do need to know what makes for a good check.

But that's in the future, and ...

Extension material

I've been reacquainting myself with SoapUI for this series, and noticed after 3 years, some of it has changed. So I will be avoiding slaving myself to the how-to's of just that tool. The aim of this series is an introduction to the concepts and thinking about what makes good checks.

You can download SoapUI here and play around with the sample API service to get the feel. There is a great series of instructions here, including installing, setting up your first project and some of the useful features. SoapUI also have an introduction to testing API's here.

As Lisa Crispin mentioned, you might want to try out other tools - so Google around.

If you want to delve more into the theory of API web services, Katrina Clokie has a useful article here which collects resources she's found useful.

Thursday, June 16, 2016

We've previously been looking at performing some checks on a simple dice rolling function.

I love using random numbers - they're very simple to understand, but hide very complex behaviour.

The examples we used flushed out some huge problems - automation works best when there's an indisputable condition to check. If X happens, it's a pass, if Y then it's a fail.

This is how many see testing ... but in truth this is really a check.

When test automation tools started appearing at the end of the last century, people knew they would be useful, but that they also had limitations. It was really Michael Bolton and James Bach's writing on testing vs checking in 2009 which was the first time I'd seen this put into a model.

In this work, they looked into the core of what we do when we perform testing. And the truth is manual testers typically go way beyond "just checking".

For instance any good tester when doing their job never really says "fail", it's almost always more "well that's odd ... THAT shouldn't happen". Rather than just fail a step and continue to proceed, there is almost always some investigation to understand and clarify the behaviour seen. Likewise, even though an action is as written and expected, there might be something outside of the script or original scope of testing which the tester notices is odd.

Testing is an exploratory and investigative act. This is how testers dig deep to find and investigate oddities they encounter - the process looks much more like this ...

[By the way - thanks to Michael Bolton and John Stevenson for feedback on this diagram]

Okay, the process of checking is at it's core of this action. However, automation does not replicate the whole detailed process in red - it's not able to go beyond the parameters it's scripted with.

We saw that with the random dice test previously. Take a moment to think about the ways we suggested to test it. In reality as a human tester you'd probably keep rolling a dice until you'd see all the numbers 1-6. How many dice rolls would be too many for that? As we said, you wouldn't have a hard rule, you'd just raise an issue if you think you'd rolled to many. An automation script has to be given a limit.

As a human tester you follow the #testing process above, you start not knowing what number is too many, but as you go along, you might flag an issue if you think it's too many "I've rolled the dice 20 times, and not seen a 6 yet".

Likewise, if I asked you to roll the dice and tabulate the result for 60 rolls, we'd expect to see 10 rolls each of each number. The reality is that's very unlikely - what will you do if it's only 9 ... or 8? Deciding whether to flag an issue, or try another test is an application of human judgement.

In my initial attempt to do random distribution for my Java function notice how I tested it - I ran 6 million dice rolls, expecting a clean million each. I didn't get that, ergo FAIL. So I tried increasing to 600 million. Then I tried keep running the test - some numbers came above the magic 100 million figure, some below. What I did over multiple runs was check to see the same numbers weren't consistently coming out as either above or below a million.

The problem was though - this wasn't an automated check, I was using the automation to run a mass check, and using manual judgement. Whilst it's useful, that's not the purpose of a unit test - it needs to reduce to pass or fail.

In the end it comes down to alarm thresholds - I run six million dice, I expect them to be a million, give or take a fraction. If I choose this threshold too low, it will always fail, and hence people will ignore the test after a while. If I set it too high, it's really unlikely to fail, and hence it's not really checking for anything useful, so is a waste of a check (worst still, it might give me false confidence).

Designing a good automation test then is an exploratory act, as you find out if it's suitable. It even requires a certain level of testing skills to refine and fine-tune. But in the end,

But it will never find problems beyond what it's programmed to check for.

Some people see that as a threat to testing as a career and livelihood. I don't - it's actually an incredibly liberating thing, it means my time as a tester is best used exploratory and imaginatively as previously discussed, rather than running through dull checklists of items I seem to check off every two weeks, some of which I almost tick off by autopilot.

It allows us to push for testing to be a challenging, stimulating, intelligent career. Anyone can check. Automation can check. I am a tester, and I'm so much more.

Wednesday, June 15, 2016

Previously we were looking at testing random number distribution, which was a little complex. Before we wrap up on unit testing, I have one more test to include before we look at summing up unit testing.

Anything else worth testing?

So far we've checked that each number 1-6 occurs when we define a 6 sided dice. We've also checked over large data sets that each number comes up about 1/6 of the time within a tolerance.

Today's test is less obvious, simple, but really useful. Do only the numbers 1-6 come up? We want to check there's no occasional 0, 6 or anything else?

The way to do that is a modification of our test from yesterday - we'll run 600,000 times, and confirm all the values for 1, 2, 3, 4, 5 and 6 add up to 600,000.

JUnit is really happy with that - which is probably the simplest test we've done so far!

When it unit checking appropriate?

Yesterday we were running a check that used 600 million dice rolls. Let me Bibically put that into perspective ...

In the Bible, it's said that soldiers at the feet of Jesus's crucification threw dice to divide up his possessions. Let's suppose for a moment that Jesus cursed them (although it would throw out all Jesus's teachings if he was that kind of a guy) to stay there until they'd rolled their dice 600 million times.

Well, they'd be there, assuming a roll every 2 seconds (I'm a wargamer, I know how long dice take to roll), then they'd be there for 40 years. With no sleep or toilet breaks.

But all this shows how in 2 seconds of unit testing we've achieved more than many years of real life experience. Okay - now let's assume we wanted to do the same test with a GUI check system. I know we've not really covered GUI checks yet. But as I've said a limiting factor vs unit checks is screen refresh.

Let's say we're able to reduce the screen refresh problem down to even 0.01 seconds. We're still looking at about 70 days to run the same test which it took 21 seconds on my machine. And believe me, my machine isn't state of the art, but you're welcome to donate me a Macbook Pro if you want.

What this does show though is how much faster unit checks can be than GUI checks. If they're appropriate you can do checks which that are order of magnitudes faster than what you can achieve with GUI checks.

This leads back again to the third iron rule - we want always to make a check we want to run as simple as possible. If we want to run a lot of checks in the volume we did previously, it makes sense to run them where they are simplest and fastest - in the unit check area,

But there's a drawback

How many of you testers out there can see what's in your developer's unit tests? They're kind of squirreled away - you assume if it built, it must have passed, right?

We don't tend to make these things visible, but we should. Because as we've shown they can be useful at checking an aspect of the system in depth, and to a volume that's ordinarily unfeasible.

Extension material

Ask around your current project about the unit tests used. Can someone help you to see them? Can you suggest some?

If you're following with Eclipse, try writing some similar tests for 8 or 20 sided dice.

If you want a real challenge, create a parent class which rolls multiple dice, and create some JUnit tests for that.

We're looking at unit checking a Java dice rolling class - see last time for more information.

Previously by necessity we occasionally had to show some technical details, and raw Java code - I'm hoping those of you less technical are still with me.

Today will have less of that - I'll specify parts of code, so those of you experimenting with Eclipse yourselves can keep up and experiment, but it will be less important.

Distribution of numbers

Okay - I've got a test which assures me all the numbers 1-6 come up. But that doesn't address my major concern, which is I'm worried that the number 1 will come up significantly less than numbers 2, 3, 4, 5 or 6.

How would I manually test this? Well as before, I'd keep pressing "roll again" and keep a table of the results. I'd expect them to look evenly spread, 1/6 each. Ish.

Again - great test, but there's a clause there like above "I'd keep doing until I'm satisfied / dissatisfied".

Automation needs a hard and fast rule. Welcome to the world of computers - things have to either be a 1 or a 0, a pass or a fail. And "until I'm satisfied" is not hard or fast.

Now ... there's a something which is often quoted about random numbers. If you sample over a large enough sample set, then you should see on your dice rolls a perfect split of 1/6 for each number using a 6 sided dice.

When I've manually tested this before (believe me I did statistics at University and I did this one night with my best friend David Farish), and notice that the results do smooth out a bit. However I'd usually get bored at about 200 dice rolls.

However in my automated unit check, I have the power to sample over 6 million dice rolls. Over such a huge number, everything should even off to a cool million each. So I'm going to rename and reuse that code I had yesterday.

Oh - that failed. I thought over such a large number, it'd all even out. When I ran this, I genuinely thought it might be within 1 or 2 of the target number.

Maybe I need to roll more. Let's set our total rolls for 600 million, and see what happens?

It takes about 21 seconds to run, but ...

Maybe if I try again?

Well, obviously I've not run enough dice yet, let's set it to 60 billion!!! And that dear readers is where I break my software completely because

Basically I've pushed the number of rolls so high, that Java can't handle unless I change a lot of the data types I'm using.

I could ... but I'm reminded of the third iron law,

And I'm starting to go down the path of making this check way too complex. Just adding more dice rolls doesn't seem to help.

Here's the frustrating thing - look at the results from these three runs

Run 1

Run 2

Run 3

In Run 1, the dice rolls for numbers 1, 4 and 6 fall just under the 100 million "perfect" number. In Run 2, the dice rolls for numbers 1 and 6 are just above 100 million. Whilst on Run 3, the dice rolls for the number 4 is above 100 million.

Intuitively, using manual judgement, that says to me that I've seen numbers fall above and below the average, but only a little bit. So that looks okay to me.

I've tried this test several times - sometimes it take 2 goes, sometimes 3, sometimes 4. Fundamentally though the challenge is "how do I create a hard and fast rule for this to test?".

At which point, I remembered that I've not done "hard maths" for a while, so maybe Iain had a point.

What I've decided to do is to modify the test. Obviously running 600 million rolls doesn't superbly define the problem for me, and at 21 seconds, takes a bit too long.

What I've decided to do is to test a threshold of plus or minus 1%, which should be good enough. And I'll do it over 600,000 dice rolls, which will be much faster (couple of seconds including compilation), but still nicely divisible by 6.

Okay - that's a lot of maths and thinking. We still have a little more to play with before we sum up, for now, we'll wrap up.Recap

So we managed to create an important automated check - but it was a little harder than initially expected, because as the first iron law said, we needed to summarise what we wanted to check into a hard pass/fail criteria. And that turned out to be harder than we intuitively first assumed.

Extension material

Have a think about what other tests you think might be worth running. If you're using Eclipse, have a go with it.

About Me

I am a tester & critical thinker. This blog is where I write about and explore the things that matter to me, in all their weird and wonderful forms ...
The views inside are my own, and don't represent those of any company I've worked for.