This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Design 101: Playtesting

The following blog post, unless otherwise noted, was written by a member of Gamasutras community.
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.

Welcome!

Hi, welcome back to Design 101. Last time we talked about How to Balance Your Games. Today we're going to be talking about the foundation of every great game design process: Platesting. Many new designers assume that playtesting is simple. Don't you just play the game and see what happens? There’s a lot to get into, so let’s get started.

The Power of Playtesting

Games are complicated. Really complicated. You take intricate systems of interlocking parts and then add the human factor. Theory and thought experiments simply aren’t sufficient for predicting systems this complex. Many designers support moving to an initial playtest as fast as possible. I’m one of them.

Theory can give you a strong foundation, especially when it’s using solid principles based on demonstrated psychological tests (which are basically science’s version of playtesting). But actually seeing the game in action is going to be a better picture than all your theoretical models. It’s the real thing.

So how do you run a productive playtest? Like everything, it depends on what your goal is. In my experience, I’ve found there are five different stages that each have specific goals.

Stage 1 - Concept Testing

In the earliest stages of a game’s design, you want to quickly test if your core concept is fun. This is often easy to do even in an incomplete form.

I’ve previously mentioned an experimental project where I used an auction system to avoid having to find balanced costs for every card in the game. Instead, players would bid on cards as they were drawn - setting the prices they felt were worth paying. This would free up my team to design anything we liked, without worrying about how hard it would be to balance. After all, the difficulty of figuring out the right cost of the card would be the core element of the gameplay.

But before we started making cards, we needed to test if this gameplay would actually be fun. As a proof of concept, I modded the rules of Magic: the Gathering. Replacing drawing cards from your deck with a mechanic where players bid on cards presented at auction, we were able to start testing the core concept in the same day it had been conceived. Obviously, Magic wasn't designed with this concept in mind. Even so. players would go through the core experience of figuring out the right price to bid for each card.

The result? It ended up being so much fun that I’d go on to develop the modded game into a Unique MTG format. By finding these issues early in the Concept Testing stage, you can create your core mechanics to avoid the problems in the first place.

Stage 2 - Scattershot Testing

How would you solve child malnutrition? Better figure it out fast, because you’ve been dropped into Vietnam with that mission. You have a minimal staff and even smaller budget. Oh, and you’ve only got six months to make a difference.

This sounds like a desperate pitch for a reality TV show, but it’s the situation that Jerry Sternin found himself in when he was working for Save the Children back in 1990. The list of the problems communities faced was staggering. Poor water sanitation, government roadblocks, rampant poverty, lack of education… The list went on and on.

Here’s what he did. Instead of trying to solve these impossible problems, Sternin sought out the bright spots in this dark situation. He looked to see if any kids in these horrible circumstances somehow weren’t suffering from malnutrition. Was it possible that some kids, despite all these problems, were actually doing alright?

The answer turned out to be, “Yes”. Some poor families had well-nourished children. So... What were they doing?

The community soon identified the common practices that led to these bright spots. It involved different feeding practices, as well as a different diet not previously thought appropriate for children. You can read all about the story in the excellent book Switch. The Heathe brothers really know what they’re doing.

So do we. Instead of hunting for the answers to your game’s giant problems, it’s far better to focus on those small moments when the game is going great. Don’t try to solve problems. Replicate successes. Clone the bright spots.

This obviously applies to all playtesting. However, you can accelerate the process with the Scattershot technique. A Scattershot Test is helpful early in an iteration cycle. It also runs directly counter to the instincts of many designers.

It makes sense to try and make your first build as close to the final experience as possible. If you know that a certain faction is only going to have a single unique mechanic, to give it a strong identity, your impulse is going to be to try and design the best mechanic possible and test that one. You might come up with tons of possibilities during the planning phase, but you’ll be tempted to only implement the best one during the playtest.

This is a mistake.

It’s pretty hard to look for bright spots when you can only examine one type of family in the village. A Scattershot Test involves putting in a whole bunch of different mechanics into that faction all at once. While the final mechanic might be intended to show up on 20 different cards, try five mechanics that each show up on four different cards. Then you can figure out which of the mechanics were the brightest spots in gameplay. This is still possible even if your genre doesn't allow cramming in this much diverse content into a single test (such as puzzle games). You can rapidly prototype the mechanics through a procession of bare-bones tests before trying to refine any one of them.

The next test you’ll narrow the mechanics. Once you have your final contender you’ll end up doing the same thing with the ways you implement the mechanic. If you're designing 10 different enemies that will use the mechanic, focus on designing them so they explore the mechanic in different ways. If 8 out of those 10 of those enemies are terrible, but the last 2 are great, you might have an awesome mechanic on your hands. You just need to figure out if you can copy the successful ideas across the other 8.

Scattershot Tests do have a drawback: they usually involve a massive amount of complexity. It’s often hard for players to learn a single mechanic, much less five different ones. Scattershot Tests require heavily enfranchised players that already understand the rest of the game on a deep level.

If your game has yet to be released, this normally means that that you’ll be doing all your Scattershot Tests internally. This is usually a good idea in any case, as playtesters often hate seeing content they like get cut. This is especially true if it gets cut for a reason hat doesn’t affect that playtester's fun, such as the mechanic being too hard for players to learn.

Stage 3 - Experience Testing

Experience Tests are the next stage. Once you’ve settled on your game’s bright spots you can design something closer to the real game experience. The game your playtesters play now, mechanically at least (art assets and other peripheries may be non-existent), will be close to what they’d be playing in the final version.

Now you’re looking at your game more holistically. Scattershot Tests can be successful even if none of the games played in the test were fun. That’s because your goal there is often to find the best versions of different pieces of the game. You’re looking for the most fun abilities, the most interesting characters, the most exciting mechanics. Experience Testing is figuring out how the player experience feels when all these different pieces fit together.

It probably won’t go well. Think of your game like a machine. Each each system, each ability, each item, each enemy, these are all parts that make up the gameplay experience. If your parts are out of alignment, working against each other or are out of balance - then the machine is going to provide a poor user experience.

Naturally, things will likely need to be changed. However, your goal here isn’t to gather constructive feedback or suggestions. Your goal is to figure out what your players are feeling as they play.

Game design is like chemistry. You’re trying to produce a specific chemical reaction in your players’ brains. Naturally you need to know if the experience being produced is in line with your design goal. Experience testing is designed solely to tell you this, without the benefit of brain scanning equipment.

There are several tools you can use here. The best feedback tends to come in real time, allowing you to figure out which parts of your game are causing which reactions. If you’re working on a digital title, getting your playtesters to record Let’s Play videos at this stage is a great tool. Ask them to talk through their thought process while playing. Ask them to include a window showing their webcam feed so that you can watch their reactions. Setting this up is really easy and often free. OBS is easy to use, while YouTube offers a way to set videos as unlisted so only those with the links can find them.

Unfortunately, talking through your internal thoughts isn’t something that comes naturally. Even people that are good at it often obscure their real reactions. If your game allows it, and a surprising number do, you should have players test in pairs. I don’t mean two players on different computers, I mean two players in front of the same computer. Yes, even if the game is single player. This technique was used by the developers of Myst to great effect. You can hear all about it, and other gems of insight, from the GDC Postmortem.

Try sitting two players in front of Dark Souls and you’ll see players discussing where to go and what to do. They’ll ask questions of each other with regard to the interface and react much more audibly when things happen. A strategy game is even better, as players try to figure out what the best move is.

While there have been many solo Let’s Plays of Five Nights at Freddy’s, the Rooster Teeth Video is the one I keep coming back to. Watch the pair's transformation as they start by letting their minds wander and mock the narration. Then things start to shift.

Soon you reach a moment where one of the players says, “Why are you only checking these bottom rooms?” Naturally, the other player answers. If they were both playing on their own, the reason behind their room-checking patterns might seem entirely clear to them. They probably wouldn’t have bothered explaining it. Together, they’re motivated to talk.

If you can’t get live reactions, whether through video or in person, give each player a sheet of paper and ask them to separate it into three columns. Have them label one column “Good”, another column “Bad” and the third column “Meh”. In the Good column, playtesters should write any moment in the game that results in an extremely good feeling for them. Just for them. If they lose the game to a glorious combo that feels awesome even to lose to, that moment should still go in their Good column. Those rare games where you lose on the first turn in Pandemic are so shocking that they usually make players laugh.

On the other hand, if something happens that feels seriously negative players should put that in the bad column. It might even be good for the game overall, by creating way more fun for the player using it than bad feelings for the player getting beaten by it. That’s completely fine. If it feels bad for you, it goes in your Bad column. The other player can put it in their Good column. Designers can figure out whether it was a net win or loss later.

Finally, the “Meh” column. A lot of times playtesters get into situations that feel dull, repetitive, boring or underwhelming. This sounds like it should be taken care of by the Bad column, but the feeling isn’t what players tend to interpret as bad. It’s the type of feeling you get when you save up for a dragon, summon it, and it barely impacts the board in a meaningful way. Where you have a huge build-up to a boss fight and then kill it in just a few pathetic seconds. That anti-climax might be bad for the game, but players often don’t feel comfortable labelling that feeling as “bad”. The Meh column is the net that snares these moments.

Stage 4 - Gameplay Stress Testing

The principle here is similar to seeing what happens with thousands of players on your server. In that case, it’s asking if your game can handle the stress of massive activity. In Gameplay Stress Tests, you want to know what’s going to happen when your beautiful game is subjected to the mad genius of the world’s gamers. Your goal here should be to recruit players that see your game as a puzzle to be solved, a riddle to be cracked. The speed-running community tends to be a goldmine for this sort of mind-set.

Players will tend to pursue the optimal strategy, even if it results in a less fun gameplay experience. You need to ensure that some crazy strategy doesn’t develop that encourages players to do things that undermine your design goals. Just make sure you think before fixing something. On occasion, your stress testers discover something you never intended that adds a lot to the experience.

When doing Gameplay Stress Tests, you don’t actually care if the testers are having fun. Neither do they. Their fun usually comes from the puzzle itself, and the satisfaction of breaking it. Make sure you stress this aspect when recruiting. While your Experience Playtesting can be pitched as a fun event, Stress Testing should mainly attract the hardcore optimizers. If you portray Stress Testing as hard, repetitive, detailed work in attempting to break your gameplay… The only people that will volunteer are the people that look at that and go, “Awesome”. These are exactly the kind of people you’re looking for.

However, emotions get tangled up in anything. A player might, through sheer randomness of matchups, end up losing to the same card far more than any other. This can create a sense of frustration and make the tester believe the card is overpowered. Stress Testers also often identify the most obvious card in their reports, which sometimes is different from the real problem.

In a famous MTG example, a card called Hypnotic Specter was a terror of the early tournament environment. Many complained that it was overpowered. In later years the card would continue to show up in sets, but was rarely played at all.

This is because Hypnotic Specter wasn’t the problem. The problem was a card called Dark Ritual. Dark Ritual allowed you to generate resources so quickly that you could play a powerful creature before your opponent was ready to deal with it. The creature itself didn’t actually matter too much. Once Dark Ritual was removed from the environment, Hypnotic Specter all but faded into obscurity.

This is another reason why having your testers record videos of their gameplay is so valuable. You can watch what actually happened in a game, rolling back to see which cards actually caused the problem (or if there even is one). You may even spot a crucial mistake or a missed opportunity which could have turned the tide for the losing player.

When you can’t get video evidence, ask your Stress Testers to focus on providing feedback on exactly what happened in the game - not just their conclusions about it. Often the circumstances will directly support their conclusions, but sometimes you can waste time chasing a Hypnotic Specter when the real culprit was the Dark Ritual that summoned it.

Stage 5 - Accessibility Testing

You’ve got your game now. It feels great to play. It’s weathered the fires of gameplay stress testing. It’s ready to ship, right?

Almost.

Even now, there can be completely unpredictable stumbling blocks that ruin the player experience. My favorite example comes from my own experience with the king of great gameplay wrapped in impenetrable layers. Let’s talk about the Firelink Shrine in Dark Souls.

It’s the first real area of the game. The player, having just completed the utterly brilliant tutorial section (as discussed in Design 101: The Structure of Fun), is dropped into this area with nearly no direction. Talking to a nearby figure repeatedly can yield some sparse hints, but players are mostly on their own. If the player takes a slightly obscure path up to the right, they’re treated to well-balanced enemies and a satisfying rate of progress.

If the player explores the main ruins placed directly in front of the players’ view, filled with treasure and several characters, they can easily wind up in the path down to the left. This path leads to the graveyard. The skeletons there are designed to be too powerful for a new, inexperienced player to overcome. Killing them takes a very, very long time.

I wound up with the skeletons. Having heard that Dark Souls was insanely hard and frustrating, both from other players and the game’s own title (I was playing the “Prepare to Die” edition), I assumed this was the standard encounter I’d be dealing with. The difficulty didn’t bother me at all, I actively seek out games with difficult reputations. The problem with the skeletons was that they felt like they took too long to kill. The game seemed to be too slow, too boring. I imagined hours of this, carefully navigating tiny encounters with every single foe in the game. If it took this long to fight a skeleton, how long would it take to fight a boss?

I stopped exploring and put the game down. It was only while watching a Let’s Play nearly a year later that I discovered this wasn’t the path I was supposed to go on. The moment I found out there was another path with much better pacing, I dove back in and enjoyed myself immensely.

There was nothing wrong with the gameplay here. I feel like Dark Souls was made just for me. The atmosphere, environmental storytelling, difficulty and core gameplay all felt excellent. What drove me away from the game was a tiny misunderstanding. I wasn’t stuck, I didn’t even know I was doing something wrong. If I’d simply known I was going in the wrong direction I’d have quickly explored and found the other path. At worst I could have looked it up. But I didn’t. Because I didn’t know.

These tests are important to run without any designer input. In the earlier tests you can explain rules and offer suggestions. In Accessibility Testing, this defeats the entire purpose (unless you’re planning on packaging yourself in the game box). It’s only once you’re sure you need to fix something that you can give players new info, to help get them to the next part of the game.

Here you should intentionally be recruiting players that are as inexperienced as possible with your game, preferably with your whole genre. If your game works for these players, it should definitely work for your target audience.

An excellent example of Accessibility Testing can be found in Dave Grossman’s article, A Journey Across the Mainstream. In this piece he tests an adventure game on his mother in law. The results are both funny and informative.

Wrapping Up

Playtesting is one of the most important tools in a designer’s toolkit. Improving your playtesting process can provide invaluable information. There’s a lot more I couldn’t cram into this article, and I highly recommend you seek out further information on the topic.

I should be clear that the testing stages rarely roll out this smoothly. Often you’ll get deep into Experience Testing only to realize that you’ve got a fundamental problem with a core mechanic. This change might involve moving back through rapid Scattershot Tests to look for a solution. You’ll also run into accessibility problems long before you’re at that final stage. Some genres even require approaching the stages in a different order. An atmospheric horror game often relies on the final art assets being in place to evoke the perfect feeling. In that case, it pays to start Gameplay Stress Tests before moving to Experience Testing.

You might have noticed that there’s been little discussion of one of the most important aspects of testing: How to Evaluate Feedback. Guess which topic we’ll be tackling next.