I find in my NES development that I often make changes that break some feature that used to be working. Particularly edge cases that are hard to notice in normal quick game play. This got me thinking about automated testing strategies. I'm wondering if any of y'all have solutions for automated testing?

I'd like to be able to do "make test" and my rom will be run through a suite of tests, and let me know if they all succeed.

1. load up this test level (maybe by entering a cheat code, or manually poking a certain address that triggers test mode)2. hold right on the controller for 30 frames3. press A4. wait 10 frames5. press B6. hold left on the controller for 30 frames7. Check the memory address that corresponds to a particular label in my game (since addresses of variables might change during development), ensure that it had the value I'd expect.

Most of this could probably be done using lua in fceux or Mesen, but I'm wondering if (or how) any of you have rigged it up. Particularly, launching the emulator and test script for each test (bonus points if it's run headless so I can automate it on a build server), capturing the test result, having enough lua harness code that makes it easy to write the tests and check known variables, etc.

That's what I normally do. But it can often be helpful to discover breaking changes early instead of later, when it maybe harder to diagnose what broke it. (that said, I'm not 100% convinced that it's going to be worth the effort to do this.)

It's hidden in the release builds, but Mesen does have a way of recording a movie + a hash for every frame in the recorded movie. You can then play it back & have Mesen validate that every frame is identical to what it was when the movie was recorded (this can also be done via command line without any video/audio output)

This works well to validate test roms, but for on-going development, it's probably not really ideal (since you'd probably end up having to re-record test movies over and over). I think it also allows starting a test from a save state, but, again, if you're changing/recompiling the original ROM, probably not very reliable.

The desire here is understandable, but it's not really practical. My sysadmin nose *sniff sniff* is getting a strong hint of someone being pressured (either through work or general online/developer/social peer pressure) to "focus on test-driven development". TDD is more for things that have strict specifications that must be complied with -- an example could be a 6502 CPU emulator, where you know definitively how registers and addressing modes work, plus test for quirks like the indirect jump page-wrap bug. A more proper example would be a protocol that has strict specifications (ex. ATA, SCSI, or even something like an IC).

That said, let's cover the non-TDD testing type:

What you're wanting is a kind of "analysis suite" to help with what sounds like "general game testing" -- normally something a person in a QA would be doing physically. The problem has to do with that real-world testing of gameplay/mechanics cannot be done in most emulators in an automated way. The only way it could would be through some incredibly ridiculous (read: complicated, highly convoluted) "test harness" methodology. Rephrased: your game would literally need a way to interact with the emulator (heavy reliance on debugger features), which then would in turn requires some kind of test model framework. (I'm using terms a bit vaguely here, sorry) Forcing an emulator to try and do this... well... the developers of the emulator could spend literally years on this. I'm not joking around. It's painful. Very painful.

The situation is different if you were focused on testing the internals of, say, a NES game engine you were writing from scratch (say, in native 6502). It is certainly possible that you could write tests for those subroutines, through use of a 6502 emulator (read: not a NES emulator), to ensure that said subroutines always behave in a consistent manner. For basic subroutines that work natively with numbers (e.g. no reliance on NES behaviour -- a native 6502 emulator doesn't emulate the NES), this would be fine and could actually be useful. You'd effectively need a 6502 emulator that could "return values" somehow to a piece of software that GNU make would call. It can't handle all the situations though, especially if you do something that's very NES-oriented (ex. read from MMIO registers). In those situations, you have to use what's called a "mock" or "mocking" to "emulate" what, say, a NES MMIO register could return. A good example would be if you were spinning waiting for bit 7 of $2002 from the PPU -- you could "mock" that to return $80 in one test case, and in another $00. You get the idea.

Another approach that's possible is to use some kind of intermediary PL (programming language) that generates your 6502 for you, and then see if that PL (or software that uses said PL) offers some kind of test suite. I think things like LISP offer this, but I'm not sure. Again this doesn't help for the NES system as a whole, but mainly for just the code side of things. I know on the C side (speaking generally, not about the NES), the most simple "test" there is at run-time (it's a macro) is assert(), which does a simple truth test (the test operation should always prove true; if it doesn't, an assert is generated, which can be treated however you wish (warning, error, etc.). The problem with this on the NES anyway is that -- you guessed it -- it wastes precious CPU cycles.

One thing I want to make clear here, because for whatever reason people with hard-ons for tests always overlook (or ignore) this: excessive focus on tests often ends up wasting TONS of time time that could be better spent actually working on the actual thing (game). Say for every 50 tests you write (and each take 2 hours), maybe 1 will end up "catching something" a few months down the road that saves you pain (i.e. making the tests worth it).

In summary, and a strong IMO: dougeff's response is absolutely the correct way to do this. It's the most time-effective, KISS adherent, and realistic approach.

I'd absolutely love a response from rainwarrior on this one, since he's one of a few guys here who has professionally worked on video games (other than NES), and might be able to share some general tidbits of what some workplaces do with regards to testing. In general though, as I understand it (from having roommates in the VG industry), QA teams of human beings are usually the main source of testing. assert-based tests I'm sure are used throughout engines or game code where applicable too, but I can't imagine developers spending thousands of hours on tests in that industry. It just doesn't seem worth it. For a game engine (read: 3D engine), it may be more worth it simply because that may be an engine the company uses for many years to come -- but that's also for a console or system that's not the NES. :-)

It's hidden in the release builds, but Mesen does have a way of recording a movie + a hash for every frame in the recorded movie. You can then play it back & have Mesen validate that every frame is identical to what it was when the movie was recorded (this can also be done via command line without any video/audio output)

This works well to validate test roms, but for on-going development, it's probably not really ideal (since you'd probably end up having to re-record test movies over and over). I think it also allows starting a test from a save state, but, again, if you're changing/recompiling the original ROM, probably not very reliable.

That's pretty neat, but more hassle than what I'd want (having to record test movies). Using a lua script would probably be a lot closer to what I'm thinking about. (using command line flags to load a rom and a lua script, then have the lua script terminate the emulator when finished and write text result to the console)

koitsu wrote:

The desire here is understandable, but it's not really practical.

I imagine you're probably right, which is why I've never bothered in the past.

Quote:

My sysadmin nose *sniff sniff* is getting a strong hint of someone being pressured (either through work or general online/developer/social peer pressure) to "focus on test-driven development".

Really it was more based on the fact twice this week discovered that I've recreated bugs that I'd already solved before. It got me thinking about automated tools to help me discover as soon as I re-cause them.

Quote:

The problem has to do with that real-world testing of gameplay/mechanics cannot be done in most emulators in an automated way. The only way it could would be through some incredibly ridiculous (read: complicated, highly convoluted) "test harness" methodology. Rephrased: your game would literally need a way to interact with the emulator (heavy reliance on debugger features), which then would in turn requires some kind of test model framework. (I'm using terms a bit vaguely here, sorry) Forcing an emulator to try and do this... well... the developers of the emulator could spend literally years on this. I'm not joking around. It's painful. Very painful.

Actually, the test case I described isn't all THAT complicated. I could easily write a lua script to make mesen do most of what I described, but it wouldn't be as streamlined as I'd like. (still, I agree that it's questionable whether it would be worth the time in the long run)

Quote:

One thing I want to make clear here, because for whatever reason people with hard-ons for tests always overlook (or ignore) this: excessive focus on tests often ends up wasting TONS of time time that could be better spent actually working on the actual thing (game). Say for every 50 tests you write (and each take 2 hours), maybe 1 will end up "catching something" a few months down the road that saves you pain (i.e. making the tests worth it).

I've always felt this way, and generally been fairly dismissive of the TEST-ALL-THE-THINGS approach. I think you're spending a lot of time trying to convince me that I'm wrong.

Automated testing and human / QA testing are not mutually exclusive things. You can do both. The questions are how much, and how, and those are very subjective questions that don't have easy answers.

Ron Gilbert wrote a pretty good article about it a little while ago that aligns pretty well with how I feel about it: Unit Testing Games

Basically, I think some amount of automated testing is worth doing, but games are a fairly unique area of software development where most of the meaningful testing work cannot be automated, so there is not nearly as much value in unit tests etc. in games as it is for most other types of software. (I wrote an an article about testing Lizard, but I didn't say a whole lot about automation for that reason.)

The other thing is that automated tests can help a lot more when more people are working on the code. You can decide how stable the part of the code you're responsible for is, and document it, etc. but most things aren't isolated, and other people will be changing things you don't expect. Unit tests are often a good way to set up a canary for yourself to tell you if something that was out of your hands changed unexpectedly. When it's just me working on the code, though... I usually know what I'm changing. I mean, sometimes I don't know, sometimes I forget stuff, but it's a world apart from working on a big team. Unit tests simply have more value in collaborative programming, and this should be a factor in deciding the "how much" question.

As far as what I did for automated testing in Lizard: not a lot. I had a lot of compile/assemble time asserts. I also had runtime asserts on various things. On the NES version I actually used BRK to implement runtime asserts. These things aren't as automated as something like unit testing, but they're still doing a significant amount of validation and testing more or less automatically while I'm already doing manual testing or other compiling. The runtime asserts were very liberal on the PC version, and very sparing on the NES version though. The PC ones get automatically removed in the release build, but the NES ones mostly stayed put. (Most problems affected both versions, so verification in one still benefited the other platform.)

Many times I recorded a TAS movie in FCEUX, and on the PC I made it automatically log input each time I played. When something tricky to execute would come up in testing, this made it much easier to reproduce and debug the problem. I could go back, set up breakpoints etc. and then just replay the input movie. This was a semi-automated testing that was incredibly useful on a few specific occasions, but not something I was doing regularly.

If I manage to make another large NES game I'll probably do some sort of automated unit test, actually. Not sure what form it will take. You can run FCEUX with a TAS and a Lua script from the command line, and have the lua script output results to a file. (Hmm, does FCEUX needs a "terminate via Lua" function?) Most likely I'd have a separate unit test mode, not running TAS on the regular game. Ideally something I could remove easily if I needed the ROM space. You might remember that arcade machines tend to have a series of tests on power on, and often have debug test modes.

In my professional game work, every major project I was on had unit tests in some form. It was in no way a substitute for QA / manual testing, but it certainly had enough value to justify doing. You'll write 100 unit tests to catch 1 bug. Most of them are never going to fail, but you won't know which is going to fail. That's why you write them. If you can set up a framework that's easy to add tests to, the reward to added work ratio can easily pay off, here.

Actually, similar to the "rubber duck debugging" idea, often the act of writing a unit test for some new code will immediately find bugs in it. I almost want to say that this collateral benefit is even more valuable than the rare cases where a unit test fails due to later changes. Learning to write good unit tests takes experience, both in terms of knowing what is most useful to test, but also just how to write them quickly and effectively, so it also pays to get in the habit of it just to gain practice here.

As an example of what a unit test might look like: testing the inventory system for an RPG might involve generating a random list of items, adding them to the inventory, making sure they're all there and in the expected quantities, maybe another test for removing items, etc.

Unit tests tend to be only good at finding very isolated bugs. They tend to be very bad at finding interactions between systems. Like if changing a few frames of a character's swing animation makes a boss on level 4 impossible to beat, there's probably never going to be a practical automated test for that kind of effect, but human testing will find it. Remember also: to catch this 1 break you would have written 99 other useless tests. It's really very difficult to capture all the constraints needed for any complex task in a game.

Similarly there's all kinds of bugs where, e.g. maybe treasure chests and torches share code in an unexpected way, so that opening a treasure chest accidentally lights a torch elsewhere or vice versa. This isn't an intentional change, but it is the kind of bug you see very often in games. A unit test would just test the torch, and just test the treasure chest, and find no problem. If they have to be used together to verify, it tends to be out of scope for automated testing, but QA on the other hand can be fairly good at spotting this kind of thing. In general unit tests are much better at warning about deliberate functional changes than accidental ones.

So... anyhow I think it's potentially worth doing, but really depends on the scope of your project, and your level of experience, etc. It's not high enough in value that it can't be ignored, really. Lizard turned out fine (I think?) without unit testing, but I actually wish I'd implemented some. For smaller projects I probably wouldn't bother, but maybe it's good practice-- possibly the most important thing is just learning how to write unit tests quickly and effectively. If they're too big of a work drain, they don't pay off.

If I manage to make another large NES game I'll probably do some sort of automated unit test, actually. Not sure what form it will take. You can run FCEUX with a TAS and a Lua script from the command line, and have the lua script output results to a file. (Hmm, does FCEUX needs a "terminate via Lua" function?) Most likely I'd have a separate unit test mode, not running TAS on the regular game.

Yeah, I the existing lua support plus the "terminate via lua" option would be all the features I'd really need, although I think I'd want to build up a few lua routines to make it all easier (ie assertEquals(addressLabel, value)). But a headless run mode (which could run at 1000% speed) would be awesome also, for running a suite of tests quickly.

Unit Tests and TDD are bad, usually fail to make anything better and miss the whole point of testing. They have their place, for example if one is testing Manchester encoding and decoding then sure. But in a game you especially want to test the behavior. To which I think it would solve Ron Gilbert's case, I mean of all the game types in the world, a Text adventure game we be one of the best cases and easiest cases to test. I mean you could write a solver in prolog and just let it test the each keyword on each object it has in the inventory in each screen it can get to and make sure it gets to the end screen. However more to his direct points BDD on a door for the elevator would be.

Code:

Given I'm in Room YAnd the Player is standing in front of "switch"And The Action button is pressedThen I expect the "Lift door is open" flag to be True

You don't want to unit test per se, but test the behavior, although this can also "do unit" testing. I also find it the best way to dev a complex system and having a tight test case and its enhanced logs make it easy to track down where it went wrong. I've replaced some of the illegal opcodes ( which it doesn't support) to allow you to test if A,X,Y remain the same between two points. Which is for making sure some function you assume doesn't change X or Y truly doesn't change it.

For example from Squid Jump, to test that the player collides with a solid platform as I would expect I have

Code:

Feature: Test how Squid interacts with Platforms

Check squid against a bunch of platform types and make sure the right things happen

Scenario: Simple Solid Given I have a simple overclocked 6502 system And That does fail on BRK And I load prg "squid.prg_test" And I load labels "squid.acme" And I load bin "testLevels\simpleCollision.bin" at LevelData # a simple level that just has a solid platform in the middle And Joystick 2 is NONE When I execute the procedure at $810 for no more than 800000 instructions until PC = GFSM_Game #let the game set up the level and sprites etc And I write memory at playerY with 158 #move the player to a known height above the platform And I continue executing until $403 = 150 #run the game for 150 frame to be sure sure, using the debug frame limiter Then I expect to see playerY less than 200 #make sure the player's fall was stopped Then I expect to see playerY greater than 158

Now that you can collide make sure you can't move while you are on the platform

Code:

Scenario: Can't move when on a platform L Given I have a simple overclocked 6502 system And That does fail on BRK And I load prg "squid.prg_test" And I load labels "squid.acme" And I load bin "testLevels\simpleCollision.bin" at LevelData And Joystick 2 is NONE When I execute the procedure at $810 for no more than 800000 instructions until PC = GFSM_Game And I write memory at playerY with 158 And I write memory at playerX with 152 And I continue executing until $403 = 150 And I write memory at $403 with 0 And Joystick 2 is L And I continue executing until $403 = 25 And I expect to see playerX equal 152

I've just added a Print IO device which lets you map a device into the memory map that lets you print out values of strings from with your program to help test and debug. Very handy for testing a Sorted Double Linked List system I was making recently. As Martin and I are C64 devs primarily it is has basic C64 support, if you feel there is something need to make NES life easier, let me know and I can add it.

Yeah, I the existing lua support plus the "terminate via lua" option would be all the features I'd really need, although I think I'd want to build up a few lua routines to make it all easier (ie assertEquals(addressLabel, value)). But a headless run mode (which could run at 1000% speed) would be awesome also, for running a suite of tests quickly.

Tinkering a bit with Mesen's code, I'm able to get it to run a rom+lua script, and have the script call emu.stop([exitcode]) (this doesn't exist in the Lua API at the moment) and get that exit code as the executable's exit code in the command line (which should be enough to run it via make?)

At the moment it's part of the C# executable (no window ever gets shown, though), which might not be ideal for startup performance, but would allow me to ship this as a feature pretty easily. I guess performance would really depend on whether or not you would be planning on running dozens vs hundreds/thousands of tests (where each test would be a different Lua script). And of course, Mesen being Mesen, it will probably only run games at like 150-250 FPS with a Lua script loaded.

For readers familiar with the TAS scene, the half-hour GDC video can be summed up in three letters: fm2. Record an input log on one platform, with a checksum of the game state every few frames, and play it back in fast-forward at low graphical settings on other platforms using a batch file. Then you can use the checksum to tell exactly when the replay desynced and track down the cause. This also lets you take more detailed "heatmaps," or analytics on play testers' input logs, to find poorly balanced content.

One thing it didn't cover is how to keep replays working even across changes to content.

One thing it didn't cover is how to keep replays working even across changes to content.

Yes, I feel like that was a bit of a glaring omission, but I suspect a few things in regard to this:

1. After finishing the game he ported it to like 15 different platforms. At that point content wasn't changing anyway.

2. I think something might have been done to make the game more replay-able in segments, e.g. sending a seed to PRNG after room loads, or something like that.

3. This probably wasn't nearly as useful at earlier stages of development as it was for later ones when less systemic things were changing.

I would say that it is still very useful whether or not you can use it for long replay tests like that. Testing very short sessions makes for better "unit test" kind of things anyway, and it's fantastic for reproducing bugs at all stages of development.

What he failed to mention was how he got his game running 100% the same every single time from just input data..(across multiple machines even) because that is a real pain in the neck to get right. Either you don't do anything random at all, i.e make sure you random is 100% fixed order with a 100% guaranteed result. Make sure you sync every timer, every value is initialized correctly at the same time, and there are no variables like loading times off a disk, or OS stealing CPU time causing some event to just slip to the next frame or if you are using C# or some other managed language making sure it just doesn't decide to garbage collect. This of cause gets worse when you do Debug -> Development -> Release -> Gold Master -> Retail builds as that will all change the timings and memory layout. If you can get it to work with small amount of pain, "its the way to do it", but having seen games that the demo mode works fine on Gold Master and then watch it fail horribly in the in game demo from retail, and having spent weeks on my Hunter's Moon Remaster trying to get demo mode perfect. 1 frame delay is the difference between player zips around level looking awesome, and player comically crashes straight into enemy. I see it as "chances are you will spend more time debugging the playback engine - and testing playback scripts to make sure they didn't need the bug you just fixed to playback correctly"

Who is online

Users browsing this forum: No registered users and 10 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum