Up till this point the testing of MWiF has been somewhat semi-random ("Click here and see what happens") and guided mainly by the previous WiF experience of the test team. This model was considered adequate for the initial "alpha" stages because it was not possible to do more sophisticated testing until the basic MWiF code infrastructure was in place.

We will soon need to be testing against a more sophisticated test plan. This will require the combined experience of WiF rules lawyers and experienced software testers who are familiar with the discipline of writing a test plan against a given specification (in this case, the World In Flames rulebook). I would like to call for two or three volunteers who are happy to share this load.

I am interested in hearing from people who: * do software testing for a living. * are willing to accept the shared responsibility of writing a test plan for MWiF. * are willing to spend a few hours a week in helping us develop and manage this test plan.

These volunteers will not be downloading the software. Like myself, their primary involvement will be to ensure that the other testers are helped to use their time most effectively. If you've ever been a test team leader then you know what I mean.

Remember that this role does not involve actually testing the software (although I am sure your dedication will be noticed ).

I do not know the rules well enough to be a good candidate for this myself, but I suspect you'd get a better response if you nominated a small group of testers to do this as well. I understand the desire not to have too many active testers bickering and contradicting one another. But at the same time, it is important to understand the motivational drivers of a volunteer workforce. I don't want to discount the desire of everyone here to help ensure this is a quality product, but even with access to the software you tend to see a degree of turnover among volunteers. Without it, I suspect turnover might be even higher. And if the intent of these positions is to plan and monitor the testing, then I would expect this might be the area where turnover would have an even more pronounced impact. Furthermore, I would suggest that communication between the planners and the testers would be paramount. To be effective, the planners will certainly need access to the development forums, and differentiating them as non-testers just builds artificial divisions among the community. Finally, allowing the planners to have access to the software would give them additional perspective that would greatly enhance their ability to understand and communicate with the testers.

For a lot of reasons, I would suggest that these planners should have access to the software. Since you have openings now, I would encourage you to fill at least some of those slots with people who would be able and willing to commit some additional time and serve as both tester and planner (and I am saying that as someone who admits he isn't qualified to be a planner and therefore could be dislodged as a tester by my own suggestion).

I agree with jchastain. Why not let some testers have a dual role, being regular testers and also writing test plans etc.? Taking part in the former shouldn't exclude you from the latter and vice versa.

I think it would be hard to make test plans for a program you haven't even seen. A person dedicated to test planning and maybe even writing test cases for the others to test would have to work closely with Steve and also have access to the program so he can write better what they should test. Steve can provide documentation of the program structures etc.

At my work we always hire some people being responsible for organizing the testing when we develop new versions of our programs, especially when those developments are project related. These test people make test plans, gather all the bug reports from the testers and send the relevant ones to the programmers to fix. They even organize the test runs of value chains etc. writing test cases for the testers to perform. The people responsible for all the testing don't have to be experts on the program or regular users, but they need to have access to the program to verify bugs reported etc. and become familiar with how the program works so they know what's needed to be tested.

MWiF is an extremely complex game and unlike other complex games (think of Grigsby's War in the Pacific) it is being developed against a specification. Grigsby and his team had the luxury of being able to define the rules as they went along and if a feature didn't work then they could always delete that feature and the enduser would be none the wiser. The game was defined by the software and that was the end of the story.

MWiF isn't like that. The computer game is defined by the boardgame and apart from documented exceptions (such as changing the scale on the Pacific Map) everybody expects the computer game will adhere to the boardgame rules. In any complex software environment the scope of the application passes beyond the ability of any one individual to comprehend the entire model. Even the WiF lawyers (that doesn't include me) can make mistakes in their interpretation of the rules.

So... 1. How can we make sure that everything is tested unless there is someone with a list of tests asking for a volunteer to (for example) "Run test 1047. Execute an amphibious landing against an empty hex . Confirm that the notional defender contributed to the defense. Run the same landing against an occupied hex. Confirm that the notional defender did not contribute to the defense"? While there will be plenty of invasions executed during the testing cycle it may well be that no-one stops to check that the notional defender only appears when he is supposed to appear... unless we are working from a Test plan 2. Given point 1, how can we create a list of tests unless someone volunteers to do this? Writing a test plan takes time and concentration. Any management role in any profession (medicine, engineer, armed forces, you name it) involves passing the pointy stick to the crew on the ready line and devoting yourself to doing your management job properly. Like everyone else who jumps into the software I am tempted to 'play' with it and just report bugs. Unfortunately I can very easily chew up all of my MWiF time in doing just that and we wind up being no further along in being able to say MWiF has been properly tested.

Consider the mindset of the testers (and this includes me when I am in Tester mode - this is an observation, not a criticism) * Do they play with every optional rule or do they avoid optional rules they dislike? * Do they play every country in every scenario or are they focussed on one or two favorites? * Do they try every different strategy or have they settled on one or two optimum approaches to victory and pursue those at all costs? Even testers who make a concious effort to pursue a balanced approach with their testing will find it difficult to claim they have completely tested the software unless they are working from a documented plan.

I don't see the planner position as being a short cut into testing or the planner might find themselves doing too much testing and not enough planning. It is frustrating watching others 'playing' the game while my time is spent dividing the rule set up into discrete tests and considering how to sweettalk the testers into running these tests... but someone's got to do it.

I'm not an expert at game / software testing, but I have a feeling about what Greyshaft just wrote :

quote:

"Run test 1047. Execute an amphibious landing against an empty hex . Confirm that the notional defender contributed to the defense. Run the same landing against an occupied hex. Confirm that the notional defender did not contribute to the defense"?

Problem is with a game like MWiF, that to be able to "run test 1047", you need to start the game from one of the scenario starts, setup the units, go through all the steps, phases for HOURS until a situation where you can "run test 1047" happens.

What I mean is that you can't jump to a specific situatin to test and test it. You have to actually play the game.

I think that we will be more efficient in testing MWiF in maintaining a common bug list that each of us can access, so that each of us knows if a bug was reported or not, he can append to the list with new details, check that the bug is solved or not when the situation happen to him, but I do not believe very much in a test plan. At least, I do not believe in such a detailed test plan. Now, if the test plan is more broad, and ask players to focus on the naval aspect of the game, or on the resource transportation, etc... but specificaly on one step I think that this is not possible.

Moreover, I'm quite uneasy at speaking about this on the public list. Also, I think that if ever a person did this, it MUST be someone from the playtest group, as I think that this goes against the non disclosure agreement to talk about this test business here. Could we continue this on the testing forum ?

Patrice, We are talking about the philosophy and practicalities of software testing. None of this is exclusive to MWiF or even Matrix and does not refer to any information provided by Matrix. There is no NDA issue here. The discussion is in the public forum so we can seek assistance from the wider MWiF audience.

If anything I think readers would be impressed by knowing that the MWiF test team is using formal software testing techniques.

I am not qualified to be a test program designer, but I would love to see it done. Right now I just randomly start a game whenever I feel like it and select any old scenario, some group of options and whatever other choices exist. I then set-up without much of a plan and wait for something to happen.

I tend to wander aimlessly through the process until some error message pops up and then I report in and start over.

I would feel like I was contributing more if I were given orders to test specific scenarios, options combinations, etc. and told to take particular actions.

It also would help me be more disciplined if I had a particular number of assignments to complete within a certain period of time.

So, kudos to the test design officer corps. As a beta test soldier , I really look forward to a more structured mission.

keep wandering aimlessly for the moment but rest assured that orders... er... um... I mean 'requests for asistance' will soon be forthcoming. For example you might be asked... 'Excuse me Tester pak, would you be terribly kind and have a look at what happens when Germany tries to DOW Britain in Nov/Dec 1939 after Britain has already done a DOW on Germany in Sep/Oct 1939? Thanks awfully.'

keep wandering aimlessly for the moment but rest assured that orders... er... um... I mean 'requests for asistance' will soon be forthcoming. For example you might be asked... 'Excuse me Tester pak, would you be terribly kind and have a look at what happens when Germany tries to DOW Britain in Nov/Dec 1939 after Britain has already done a DOW on Germany in Sep/Oct 1939? Thanks awfully.'

Wow pak, You nailed this one on the head for me. I do my testing in exactly the same way that you described. Although this can work and does have its uses, I find it to be really ineffective. My brain just wants clear goals, objectives and timelines (20 years in the military will do that to a guy). I feel that by the time I sit down to try something out, another tester has already tried it. I would really like to see a comprehensinve test plan. That way the testers can look for very specific things, and we can test with confidence that everything is being tested.

I agree that we will all benefit from a test plan. But Patrice has a point that you can't just jump to a specific point and test what's written in a test plan. You can have the test plan in front of you and check some issues when you have advanced far enough in your game to try it. E. g. if you want to test something specific about Sea Lion then you need to play some turns so you have enough German units and have placed your units in position for Sea Lion. So it takes some hours to test those specific things.

The state of the beta game will also decide whether you can test via a test plan or not. If the game is in early beta it means the game will stop with error messages even before you get to the point you can test e. g. an amphibious on an empty hex. So the beta game needs to be pretty coherent and stable so you can expect to play several turns in succession without seeing crashes.

But you can have a test plan for the early beta stage as well. That test plan would be to try the most basic aspects of the game. E. g. DoW, movement of naval units or land combat. Then you can report the errors you find when you try to do specific actions.

It's very important to have a database, Excel file or whatever where you report new bugs and can check if some of the reported bugs have been fixed and retested. This would also prevent people from reporting the same bugs several times. At work we use the software Test Director or Bugzilla. That helps a lot.

Having a test coordinator / supervisor is good idea. This person is responsible for the test plan, but also responsible for collecting all the bugs and sorting out which ones goes to the programmers to be fixed. He will also be told by the programmers which bugs have been fixed so the coordinator can inform the testers to retest the fixed bugs. The coordinator (in addition to the programmers) will be the one with a good view about which areas of the program is buggy and which areas are stable. So he can focus more effort into those areas with problems.

The bottleneck in debugging software will often be the programmers. They are the most crucial people since only they can actually fix bugs. So it's therefore vital the programmers spend their time upon fixing new bugs instead of having to spend lots of time testing whether a bug report is real or not. This is where a testing coordinator can be valuable. He can try to recreate all the reported bugs and when he realizes the reported bug is a new one he can inform the programmers about it. That means the programmers spend their valuable time for what's most productive, i. e. fixing actual bugs. It's nothing more frustrating to a programmer than to try to recreate a bug reported by a tester and not be able to recreate it only to discover later that the bug is not a bug. The tester simply misunderstood the rules and believed it to be a bug. So the testing coordinator should know every little detail of the rules so he can tell whether a reported bug is not according to the rules or not.

If the programming project is large you can actually have several coordinator persons. One for writing test plans etc. and one for coordinating the bugs being reported by the testers.

ORIGINAL: Borger Borgersen ...if you want to test something specific about Sea Lion then you need to play some turns so you have enough German units and have placed your units in position for Sea Lion.

There's no reason that Germany can't try an amphibious invasion against the Polish coastal hex in September 39. Even in Barbarossa the Germans can try an invasion on the first turn (then again, so can the Russians ). I'm interested in testing the mechanics of the game. Whether or not the move makes strategic sense is irrelevant.

quote:

The state of the beta game will also decide whether you can test via a test plan or not. If the game is in early beta it means the game will stop with error messages even before you get to the point you can test e. g. an amphibious on an empty hex.

Quite Correct. Steve and I discussed that point quite early in the piece and agreed that we'd leave a test plan until the game was stable enough to support it. We're getting there now (stability-wise) so its time to start discussing how we will implement it.

(BTW: Testers who want to just poke and prod without a test plan are free to do so. Working within the Test Plan is entirely voluntary.)

quote:

At work we use the software Test Director or Bugzilla. That helps a lot.

Steve is in negotiation with Matrix to implement bug tracking software.

quote:

The coordinator (in addition to the programmers) will be the one with a good view about which areas of the program is buggy and which areas are stable. So he can focus more effort into those areas with problems.

Correct again... sounds like you've played 'Bug Hunt' before

quote:

If the programming project is large you can actually have several coordinator persons. One for writing test plans etc. and one for coordinating the bugs being reported by the testers.

I used to take bug reports from support, reproduce them, and then document the steps how to exactly reproduce the issue for the developers. However unless the GRL is failing (is that right?), it's likely reproduction may be already documented. I can say I've had good success reproducing the nastiest types of errors; but this capability depends on experience and familiarity with the product.

I'd be happy to help here by either contributing with Bugzilla or developing a format that helps Steve the most in prioritizing bugs the testers come across. I'd like to say I'd volunteer to triage all the reports, but I'd like to know how big a scope that is before I jump into it.

_____________________________

Most men can survive adversity, the true test of a man's character is power. -Abraham Lincoln

I would be willing to participate, but as I am in the throes of finishing a very large (300,000+ lines) project myself, I am not sure I can commit time every week. After November or December, I plan to be able to become more involved.

I would recommend highly the following. There may be portions of the testing that can be automated (and repeated ad infinitum after every change to the code base)

Many moons ago I started building a Test plan for MWiF in Excel. Bear in mind that a Test Plan is NOT the same as bug tracking software - the documents are complementary but not identical.

The Test Plan tests for application adherence to user requirements by doing the following: * define the user requirement - (in this case, the WiF:FE ruleset) * break the WiF ruleset into Application Modules - (logical groupings of functions such as 'naval combat' vs. 'land combat') * define the expected Behaviours for each Application Module - (what do I expect to happen in 'naval combat'?) * lists the Test Cases for each expected Behaviour - (how do I check if my expectations are fulfilled?) * provide an overview of the Test Script for each Test Case - (what are the specific mouse click or menu-driven instructions for performing these tests?) * record the PASS/FAIL results for each Test Case - (did it work as expected?) * summase the PASS/FAIL result for each Application Module - (which parts didn't work?) * summarise the application adherence to user requirement - (how bad is the problem?)

Bug Tracking software performs the following tasks: * record errors that occured when executing a Test Script previously defined within the Test Plan. * record general errors which fall outside the Test Plan - (eg. MWiF won't load a saved game).

(... then, regardless of which type of error occured...)

* record sufficient information to permit the developer to duplicate the error. * be categorised in a manner to allow the developer to see all similar errors. * record additional developer comments * be flagged as 'open' , 'fixed - not retested' , 'fixed & retested' (or similar categories)

Experienced project managers may quibble about the definition of some of these items (eg whether testing for the ability to load a saved game should be part of the Test Plan) but please remember that this is "Big Picture" stuff to assist the team to understand the difference between the two documents.

A section of the Test Plan spreadsheet is included below. Note that the blue text is actually a quote from the WIF:FE rules.

I think it would be nice when people report actual bugs mentioned in the test plan to add columns with the following information: 1. Tester signature or name 2. Date the test was performed 3. Severity of error.

At work we label the severity of each error with one of the following letters. A - Critical bug. Causes the program to crash or hang B - Severe bug. Does not cause the program to crash, but the intended function doesn't work as intended and results in wrong data being stored etc. E. g. if you make a land attack and get a breakthrough result and you're not offered the option to advance after combat, maybe even the shattered units don't appear on the production spiral etc C - Moderate bug. The function doesn't behave as intended, but it doesn't alter data in a wrong way etc. E. g. if you make an air strike and notice that the tank buster aircraft didn't get the intended bonus for attacking an enemy armor. The air strike went on like it was a normal tac bomber. D- Minor bug. These bugs are only cosmetical. The function behaves as intended, but it may not be shown the way it's intended. E. g spelling errors, dialogue boxes with combat results missing some data, wrong status symbol shown etc.

We have a rule at work we're not allowed to release new software versions where we know about A and B errors not fixed. The quality of the release is afterwards checked towards A and B errors reported by the users after the release.

I think such a distinction makes it easier for the programmers and bug tester coordinaters to understand the severity of the bug and make priorities to which bugs to fix.

It also makes it possible to report internally about the status of the testing. E. g. the report can be like this at release date: Detected and fixed errors: A: 5, B: 15, C: 42, D: 56 Known errors at release: A:0, B:0, C: 3, D: 4

We also use index numbers with decimal digits to show the progress in bugfixing. ID's ground be grouped so you can immediately see from the ID numbers what kind of bug it is. E. g. 216 = the first test, 216.1 is the first retest, 216.2 is the second retest etc. This way you will see all the former attempts to fix the bug and why they failed.

I think it would be nice when people report actual bugs mentioned in the test plan to add columns with the following information: 1. Tester signature or name 2. Date the test was performed 3. Severity of error.

At work we label the severity of each error with one of the following letters. A - Critical bug. Causes the program to crash or hang B - Severe bug. Does not cause the program to crash, but the intended function doesn't work as intended and results in wrong data being stored etc. E. g. if you make a land attack and get a breakthrough result and you're not offered the option to advance after combat, maybe even the shattered units don't appear on the production spiral etc C - Moderate bug. The function doesn't behave as intended, but it doesn't alter data in a wrong way etc. E. g. if you make an air strike and notice that the tank buster aircraft didn't get the intended bonus for attacking an enemy armor. The air strike went on like it was a normal tac bomber. D- Minor bug. These bugs are only cosmetical. The function behaves as intended, but it may not be shown the way it's intended. E. g spelling errors, dialogue boxes with combat results missing some data, wrong status symbol shown etc.

We have a rule at work we're not allowed to release new software versions where we know about A and B errors not fixed. The quality of the release is afterwards checked towards A and B errors reported by the users after the release.

I think such a distinction makes it easier for the programmers and bug tester coordinaters to understand the severity of the bug and make priorities to which bugs to fix.

It also makes it possible to report internally about the status of the testing. E. g. the report can be like this at release date: Detected and fixed errors: A: 5, B: 15, C: 42, D: 56 Known errors at release: A:0, B:0, C: 3, D: 4

We also use index numbers with decimal digits to show the progress in bugfixing. ID's ground be grouped so you can immediately see from the ID numbers what kind of bug it is. E. g. 216 = the first test, 216.1 is the first retest, 216.2 is the second retest etc. This way you will see all the former attempts to fix the bug and why they failed.

Yes to all of this. I am using almost the identical system - except for the release stuff.

The codes I have are: Fatal, Critical, Bad, Minor, Cosmetic, and Suggestion I date each attempt to correct a bug with information about what was tried and what effect it had (if any).

All the talk of bug reporting, spreadsheets, google, etc. here and on the test forum is blowing my mind! Would it be possible for Steve or his delegate to provide THE definitive format/system for the testers to follow? I'm so confused, I don't even know if this has already been done! I know there was a really long format proposed recently so a spreadsheet can be built. Should I use that? But, then there is this google thing.

The grunts just want the bosses to give us a form to fill in and not have to think about it too much...

All the talk of bug reporting, spreadsheets, google, etc. here and on the test forum is blowing my mind! Would it be possible for Steve or his delegate to provide THE definitive format/system for the testers to follow? I'm so confused, I don't even know if this has already been done! I know there was a really long format proposed recently so a spreadsheet can be built. Should I use that? But, then there is this google thing.

The grunts just want the bosses to give us a form to fill in and not have to think about it too much...

Pvt. Pete

For now, the 20 item list would be best to use (some of those fields are filled in by people other than the beta tester). That way the information can later be stored so it matches other reports about the beta (not everything is a bug, so I am using a more genereal term here).

It is quite possible that the format will evolve as we go along, and I feel no presssure to lock ourselves into anything. This is merely a tool, not a goal in and of itself.

All the talk of bug reporting, spreadsheets, google, etc. here and on the test forum is blowing my mind! Would it be possible for Steve or his delegate to provide THE definitive format/system for the testers to follow? I'm so confused, I don't even know if this has already been done! I know there was a really long format proposed recently so a spreadsheet can be built. Should I use that? But, then there is this google thing.

The grunts just want the bosses to give us a form to fill in and not have to think about it too much...

Pvt. Pete

For now, the 20 item list would be best to use (some of those fields are filled in by people other than the beta tester). That way the information can later be stored so it matches other reports about the beta (not everything is a bug, so I am using a more genereal term here).

It is quite possible that the format will evolve as we go along, and I feel no presssure to lock ourselves into anything. This is merely a tool, not a goal in and of itself.

For whatever it is worth, I’ll share a few of my opinions on a controlled testing process. First, let’s talk about the development lifecycle for a moment. In discussing professional software development frameworks, it is easy to get bogged down in the details of RUP and use cases, but I don’t believe that level of complexity is necessary for this discussion so I will instead use the less cumbersome V-Model.

As is shown in the diagram below, we might think of the software development lifecycle in terms of the letter V, where we start at the top left and then just trace the letter V (going down the left slant and then back up the right). The left side represents the construction activities. As we progress down the left side, we get to lower and lower levels of detail. So we might start with business level requirements, then progress to specifications and detailed design, and finally get to the coding and development of individual modules. The right side represents the testing and again, as we move up the slant, we progress from lower layers of detail up to greater abstraction. The key to the model is to show that not only do work products progress directly through the path, but also across to the testing functions that verify similar levels of abstraction. The business requirements drive not only the detailed specs, but also the acceptance testing. Nothing Earth shattering I know, but this is just for level setting.

So, the final testing activity should be focused on confirming that the software meets the objectives set forth in the business requirements. Integration testing, where development modules come together, should focus on enduring that the software performs according to the specifications and design. While the unit test activities confirms that the individual module was developed properly.

With that as the basic framework for the discussion, let me first say that the conversation thus far has focused on the integration testing and confirming that the software performs according to the predefined rule book or specifications. That is not unexpected as the developers are likely performing some level of unit testing themselves before publishing builds out to the beta test group and therefore integration testing is the next required function. Some have discussed the potential for automated testing. In my experience, test automation isn’t effective for this type of integration testing. It is really more appropriate for regression testing as part of the acceptance process for software that is expected to have multiple releases. I do not see this project being near the point where automation is possible.

As others have indicated, it is necessary to “stage” certain conditions in order to permit the required testing. That is one of the key reasons why test scripts are developed. A test script is simply a series of actions that are taken in order to test a progression of functions. So, for example, we might test taking a navy to sea. Then we might test moving it while unloaded. Then we might test loading a land unit onto a Navy unit. Then we might test moving the loaded navy. Then we might test naval bombardment of an enemy unit on shore. Somewhere else, we better test bombarding a neutral unit just to ensure that doesn’t work. Then we might test unloading the naval unit as part of an amphibious assault. Then we might test allowing that unit to not receive resupply to ensure supply values are dropping appropriately. Then we might test to see if the AI properly attacks the invading unit or at least makes some type of defensive adjustment. Then we might test to see if the unit is properly removed from play once destroyed.

Not to belabor the point, but a test script is a series of events that can be tested together that often allow you to test events in the back of the script that require setup through events that are being tested at the start of the script. Here’s the key – to be successful every spec has to be mapped to at least one script. You literally must pour over the specs and ensure that everything that is supposed to happen is scripted into one of the test scripts. The real work then is defining the sequence of events that test the whole set of specs in as few steps as possible. It is a fairly laborious process, but it ensures full coverage.

In order to ensure that everything is tested, you then report bugs with an indication of the script that was run and the step of the script that failed. In that way, hopefully devs will have more success in duplicating bugs. But equally importantly, the planners can then note that the script has not been successfully executed and can ensure it is rerun once that current issue is addressed (which is necessary to ensure all the other items being tested later in that script are then tested). Once all the scripts have been entirely completed at least once, then you run through them all a second time just to confirm that none of the fixes in the later parts of the tests broke anything that was tested earlier.

That is a fairly lengthy description of something that many if not most here already know. So now, let’s get to the real point I am trying to make. The rulebook is only one set of the specs. It forms one set of the functional requirements for the game (though admittedly it is the most important one). But there are entire algorithms of the AI that determine what the reactions to certain events are. How are those to be tested? Is it also to be a systematic test to ensure the computer is evaluating systems properly or is it just tested in a haphazard way and only the game rules are tested systemically? If the AI is to be tested systemically, are there specifications for it or how would the test planners be able to script various possibilities? Then there is sound, game options, and a whole host of other functions that also need to be tested.

Beyond the functional requirements, there are also many non-functional requirements that really should be tested. How are errors to be handled? What about the UI and human factors engineering? Are there performance related requirements such as how long it should take for the AI to complete a turn or for the opening screen to load? What files are supposed to be editable by the player base and does editing them allow the software to still function properly? I absolutely understand that these are of lesser importance than just getting the game working, but it is more effective to do test planning if you understand the scope of your testing prior to constructing the plan.

Of the non-functional items above, UI and human factors testing will likely have the most significant impact. There really should be some focus on this area and planning these types of tests in an art unto itself. One of the most effective things you can do is develop scripts especially targeted at newbies that simply instruct them to perform various basic actions (move a unit, determine who is at war, change sound preferences, etc) and then ask each of your new testers to attempt to do each of those items and report back on whether or not they were successful and how difficult it was for them. Ideally they should do so without reading any manual or quick start guides to help determine whether or not the UI is intuitive. In high end studios, they actually bring people into labs and record HOW their attempts to perform each action. That is a good deal more useful than self reported actions if there is an ability to use friends and family members as guinea pigs.

Anyway, I continue to believe you would have greater participation, higher quality scripts, and better communication were you to give the test planners access to the software, but that is merely my own opinion. Hopefully some will find these musings helpful. Good Luck!

All of what you say is valid. There are other issues which need to be considered.

1. I agree there is a need to ensure that the testing of MWiF is comprehensive and appropriate to a project of this scale. There is also a need to provide direction for our motivated volunteer force. The posts from pak and Griffitz62 are evidence of the importance of this second goal. By the testers own admission their experience at formal testing vary considerably (thats an observation - not a criticism) so we have an immediate need to provide direction in this area. Experience in software testing tells me that those testers most in need of using a test plan are those least familiar with how it works.

2. Our budget for the test plan is precisely $0. It is all volunteer work and there are absolutely no resources to create stand-alone usability labs or to send the testers on 'Intro to Software Testing 101' type courses. We need to work with the material at hand, and if we aim to create a 600 page test plan and process flow then we will lose the hearts and minds of our testers real quick. Since I also have a wife, child and full-time job I am currently averaging only about 10 hours a week with my involvement with MWiF. In summary, there is no point in planning to build a Saturn V if you only have the skilled technicians and budget for a V2 or perhaps only a Goddard-type 'Nell'. To paraphrase Robert Watson-Watt, 'Let's have second-best tomorrow rather than perfection next year'. (BTW I'd be interested in learning of any other wargame development team which used a formal test plan...)

3. Test plans should be created after the specification but before the coders start work. In this case the specification for the test plan is the WIF:FE ruleset (RAW) which is available for download from www.a-d-g-.com.au so I don't see why the Plan creators need access to the MWiF software. Sure its great to have a copy and I'm sure Steve would look kindly on a request to issue the Test Planners with the code, but I'm looking for people who want to work on the organisational side of things. If we have spare time then lets refine the Test plan some more rather than double-clicking on MWiF.exe. That's what the Testers are there to do.

4. I can agree (or at least strongly sympathise) with all of your comments in the second part of your post. At this point I am more concerned with providing on-going direction for the testers rather than whether our Test Plan complies with IEEE 829. If we get enought resources into the test planning side of things then we might be able to achieve both objectives.

Thanks again for your post. I'm gratified that this topic has received such strong input.

As for letting testers look at code - the Delphi development system pretty much requires using the Delphi envionment (IDE) to examine the code. That's because the layout for forms (103 at last count) have almost all their data stored in binary and presented to the developer to review and modify via the IDE. The Pascal source listings omit 95% of the form data settings - instead those are stored internally (in binary).

Even so, the Pascal listings are over 250,000 source lines and climbing daily. Virtually all of that would be pure gibberish to the testers who are not programmers. Even for testers with extensive programming experience, many accompanying diagrams and other supporting documents would be needed to understand the code listings. To say nothing of Matrix's security concerns.

Unit test as it is usually defined is not possible either. There are simply too many variables that have to be set before doing virtually anything in MWIF. On a positive note, I think of most of the forms as comparable to unit tests. That is because most forms are used for a specific task. The performance of the task can be tested by testing the form. There is still a ton of other stuff, but it is nice to think that 70 or so forms can be 'unit' tested.

Another source of 'specifications' are the optional rule writeups that I did. I started with RAW but from there many changes/clarifications were made in response to feedback from forum members. The final set of text (a 60 page PDF file) goes through each of the 81 optional rules and says precisely what it suppose to do. These writeups are part of the program and accessible during Start New Game. I will someday also make them accessible during game play. Once that is done, the testers will be able to refer to the 'specifications' for the optional rules while they are testing them.

I think we are essentially in agreement, but let me touch on a few of your points.

First, I agree that you don't want to target boiling the ocean. My intended point was not that you have to meticulously script everything, but rather that you will be more effective if you define your scope of testing prior to beginning your planning. Perhaps the rules is the only thing you want to script out. That's fine - that's a valid decision. But I was suggesting that you consider all of the testing needs and ensure you are making that as a decision instead of just ending up there from failure to consider the broader picture. The other items will still need some type of testing and you'll want some type of feedback mechanism to encourage people to be looking at those things at some point in the cycle.

With regard to a "Testers 101" course, part of the reason I chose to detail out the basics of testing in a rather pronounced manner was to allow less experienced testers to follow the discussion as I hoped it might provide them with some additional perspective. As for access to the software, I believe you and I were both referring to the executables rather than the raw code. I agree that it would be crazy to make the code available to volunteer testers (except for snippets posted in response to specific questions). I continue to believe that the test planners should experience the software, its mechanics, and its UI. But as I have indicated that is just a part of the rationale behind the recommendation. Let's face it, volunteers want to be on the "inside" and WANT to experience the game. That's part of the "payoff" to many if not most. In the real world, test planners make more money and being a test planner represents career progression. But in terms of game testing, you are asking people to do something that is more work without offering them the "reward" afforded the "lesser" position of tester. Even if you said that this was a pathway to becoming a tester, I could see the deferred compensation model working. But to say this is something different that requires more work, does not provide the perceived benefit, and never leads to that benefit, just doesn't seem to provide the proper motivational incentives in my mind. I understand why you don't want people "promoted" to being testers as you don't want turnover in these positions, but I really believe that without any payoff you'll end up with more turnover rather than less. And I understand why you do not want to make the planners testers as well for fear they will ignore the planning and be drawn more and more into the "fun" of testing. But I continue to believe that managing that dynamic is easier than managing retention in a role that never gets to the "fun" part. Again, that's just my opinion on the matter and I'll drop it now (as continuing this discussion in public does not aid your cause).

Of interest, you also asked the question of what other games have attempted such a structured testing approach. Over the past 15 years I have been a volunteer tester for more than a dozen projects and I have never seen any of them even attempt this type of professional organized approach. As you suggest, you cannot turn it into "work" so that the testers feel as if they are operating within a straight jacket and not having any fun. But I would expect you to get some real benefit out of this approach if you can get it properly organized and then execute it in a properly relaxed manner. When you think about it, it is not entirely dissimilar to the write-up activities where people are assigned units of work and then perform them at their leisure. And that is why I continue to believe that Mziln had pretty solid insight in his recommendations to Graf regarding eventual beta participation for those who "prove" themselves through the write-ups effort.

Finally, I am afraid you may have misunderstood my points with regards to UI testing. Though I continue to believe that UI testing is important, I was not suggesting that you have a full blown UI lab. Instead, I believe your first opportunity is new testers. Whether now to WIF or not, that will be new to the UI and their initial impressions should be captured. I was suggesting you give them a list of 10 or 20 basic functions to try and perform without reading any manual or notes. Tell them to take no more than 60 seconds attempting each action and then report back that doing so was:

1 - Very Easy. I was able to perform this action on my first attempt with little forethought.

2 - Relatively Easy. I was able to perform this action fairly quickly but had to think about it a bit.

3 - Somewhat difficult. I was able to perform this action but it required several tries and/or UI exploration.

4 - I was not able to complete this action within 60 seconds.

While valuable, self reported statistics such as these have 2 drawbacks. First, we humans have ego and often want to ensure we aren't seen as the dumb one who couldn't figure something out. This will sometimes cause you to get a picture that might be more rosy than the reality. But also, you don't get to see what people actually try. If everyone puts down Relatively Easy for a given function, you might think you are in pretty good shape. But by being to watch them, you might find they all tried the same thing the first time, in which case you may want to change the UI to make the software work in the way people are intuitively attempting to use it. You only get that information when you can watch their progress against the tasks. Obviously you do not have the ability to rent out a full-blown UI lab. That is why I suggested friends and family. If you know people locally who play games but are not yet accustomed to this title, be they friends, family, neighbors, unaffiliated matrix people at an event, forum members who happen to live nearby, whatever, then you may want to have them come by and just watch over their shoulder as they attempt to do the actions. It isn't a lab, but it is free and might provide some useful insights.

Great discussion! Until now I was more a consumer of the forum but with some experience in testing and creating testplans I would like to bring in some additional points.

1. Patrice mentioned the problem to just jump into a specific situation of the game (I call it Test Setup): This will be one of the most time consuming tasks for testers without (or with only little) output to the programmers!

Possible solution could be - to have some kind of Testsetup-Editor (yes I know, this means effort) but it will pay back more than this considering some 10 - 20 testers. - a pool of saved games within the testgroup to have lots of starting points for different tasks (e.g. Naval Landings, ...)

2. Bug-Reporting My experience is that once I find a bug the description for the programmer takes quite a time. Some Configuration-Export function (Unit-Setup etc.) will help progammers understanding the problem and reduce the non-testing time for testers.

3. Bug-tracing A tracing tool is essential! Even if it´s just an Excel-Sheet (better would be a small Database).

4. What about different Hardware- and Software-Configurations among the testers? (RAM, Harddisk, Operating System, ....) Is it already in test or is it part of the coming integration tests?

5. Test-Tracing A database for all testcases and their performance is as important as for bug-tracing (again, Excel could be sufficient with the problem of multi-user access)

But I was suggesting that you consider all of the testing needs and ensure you are making that as a decision instead of just ending up there from failure to consider the broader picture. The other items will still need some type of testing and you'll want some type of feedback mechanism to encourage people to be looking at those things at some point in the cycle.

Hence my call for assistance. If I have to do it all myself then I will focus on the WIF rules since: * I would rather complete one out of five tasks than pretend I was doing all all five tasks but only get them partially done; * in a very real sense, the rules are the 'specification' of MWiF; * the gaming community will forgive a problem with loading the game (c'mon the patch!) but if MWiF makes a mistake with the rules then the game will be seen as a failure (IMHO); * testing specific rules is the most effective way to give direction to the testers. It also helps the testers educate themselves about the rules.

quote:

As for access to the software, I believe you and I were both referring to the executables rather than the raw code.

Agreed. Steve lives in a world of binary so I understand his alternative interpretation

quote:

Let's face it, volunteers want to be on the "inside" and WANT to experience the game.

I completely agree with you. It's a very human reaction. What I want to avoid is the scenario of a person with testing experience signing up as a test planner with the deliberate intention of becoming a Tester further down the track. I'll be on the Test Plan side of things till the game ships and I'm looking for people who are willing to join me here. If no-one is available since everyone wants to be a Tester rather than a Test Planner then that's OK - I completely sympathise with their motivation, but they should then apply to be a Tester rather than apply to be a Test Planner. At least this way we know where we all stand.

quote:

I understand why you do not want to make the planners testers as well for fear they will ignore the planning and be drawn more and more into the "fun" of testing. ... that's just my opinion on the matter and I'll drop it now (as continuing this discussion in public does not aid your cause).

Au contraire! I believe that by discussing this issue publicly we leave people under no illusions as to what is involved in being a Test Planner and that is a good thing.

We'll discuss the UI test lab in another post down the track

Nebert, Thanks for your input. I have to go and feed two-year old Greyshaft Junior but we'll talk more later.