User Stories Make For Better Consensus – Game Planning With Science! Part 6

There’s a saying in data science: Garbage In, Garbage Out (or GIGO, if you prefer). The most advanced formulas and models won’t provide outputs worth a dead cat if you don’t have high quality inputs. When it comes to something as difficult and uncertain as feature planning and estimation, that’s quadruply so. In this post I’m going to walk you through the system I’ve used successfully, how it works, and why. And it’s all based on the counter part to the story points from Part 5, user stories.

The article image for “User Stories Make For Better Consensus – Game Planning With Science! Part 6” is from GraphicStock. Used under license.

By Reading This Post, You Will Learn:

Why investing a little time now is better than wasting a lot of time later

The value of discipline in feature design

The definition of Omitted Variable Bias, and how to use it to your advantage

How to use Planning Poker to estimate scope

The perils of Social Proof when it comes to estimating scope

Why consistency is critical to effective scope estimation

Preparing To Estimate: User Stories

There are multiple approaches to estimating with story points, and the best system is subjective based on your needs and MO. But this is the system that works best in my experience, and what I think represents a good starting point for experimentation.

Due Diligence: Take The Time To Detail Your Feature Requests in the Form of a User Story

Before you have anybody work on a feature, or even estimate scope, make sure you’ve thought it through. In my experience, the best feature specs include three key elements:

A user story: who wants this feature, what do they want, and why do they want it? A user stories is a super-quick way to explain why you need to develop a feature. This is another scrum construct that happens to work well. It follows an easy template: “As _________, I _________, so that __________”. For example:

“As a player, I jump, so that I can traverse the environment”

“As a technical director, I have continuous integration, so that I can streamline the build process”

“As a combat designer, I blend combat animations, so that I can refine the combat experience”

Technical requirements: what does this feature need to do, what systems does it need to touch or communicate with, what limitations does it need to observe, etc.

Acceptance criteria: what does the person reviewing this feature need to see to consider it complete? In other words, if your creative director is the person who will sign off on a feature, what does he need to see in order to consider the feature complete.

The term “user story” is a synecdoche in scrum: it refers to both the user story part of the feature spec and the entire spec, interchangeably.

Management Sticker Shock

Yes, this takes time and protracts the turnaround for seeing your ideas in the build. But so does a feature that a developer mis-executed because of poorly explained expectations. So do fire-drill feature requests that hose the build because nobody thought through the technical impact. You know you’ve been there. I have too.

One of the most important lessons I took from business school is that you can’t just look at the cost. You also have to look at the payoff or, in this case, the savings. An ounce of prevention is worth a pound of cure. By taking the time to think through and define user stories up front, you are reducing the likelihood of a disconnect between what the director or designer wants to see and what the developer actually makes.

On the topic of discipline

This grooming process serves as a gut check against feature creep and impulse driven design decisions. Tim Moss once said that he implemented a “three times rule” to control feature creep when he was Lead Programmer on the original God of War. The programmers would wait until Game Director David Jaffe requested a feature three times before taking the request seriously.

Disciplined grooming of user stories in a consistent manner is a more productive version of this kind of buffer. If you aren’t willing to invest the time to explain who needs a feature and why, think through the technical impact and caveats, and specify what you expect to see, then its worth asking whether the feature is worth the time it takes to implement.

Some of you are indubitably screaming at the screen: “I live in the real world, you poncy academic! I need to get shit done!”. Hey, I’ve been there. In the heat of crunch, when Beta is bearing down on you and the publisher is screaming, and marketing is convinced that the game will fail unless you slam in some random idea from a recent focus group, you need to make tough decisions. I get it. But, I’d like to suggest a different way to look at it.

Scylla and Charybdis: Picking Your Poison

In The Odyssey, Odysseus and his men must sail through a narrow, dangerous channel on their journey home from the Trojan War. On one side of the channel is Scylla, a six-headed sea-monster that will grab and devour a sailor with each pair of jaws. On the other side is Scylla’s sister Charybdis, whose whirlpools could possibly kill everyone. They cannot pass safely between them, so Odysseus choice is simple: lose six men or risk losing them all. He chooses Scylla.

Project management is not about “good/bad” or a “right/wrong”. Management is a series of trade-offs and priorities. My argument isn’t that grooming is “good” and rushed feature requests are “bad”. There is a trade-off between planning and speed. But, clear direction should be a higher priority than expediency. That doesn’t mean you shouldn’t have expediency, just that it shouldn’t come at the cost of clear instructions. Taking the time to establish clear direction ahead of time will result in a net-positive use of man hours. You can avoid botched executions, new bugs, broken builds, and other issues that can consume far more man-hours than it takes to write a well groomed user story.

The greatest pressure to rush features often occurs when you can least afford wasted time: the end of the project. So think very hard about the best net use of your time. Do you want the Scylla of definitely spending a finite amount of time to groom a feature request or the Charybdis of possibly losing a lot of time across multiple people and borking your build? Which is the greater risk? That’s not a rhetorical question. There may very well be times when Charybdis’ side is worth it. But make sure that path is a decision and not a reaction.

What To Estimate: Only Apply Story Points To User Stories

You can point whatever you like, but I recommend only pointing user stories. Remember: estimates are time-consuming and non-value adding activities. Rather than investing lots of time and energy in estimating the scope of every piece of work that your team needs to process, just worry about estimating the features.

But what about the time needed to fix bugs or deal with other tasks? Don’t waste your time. Your feature development velocity will account for that just fine on its own. How? Omitted Variable Bias.

What Am I Missing?: Understanding Omitted Variable Bias

Imagine you are trying to estimate the impact that striking-out has on a professional baseball player’s salary. You grab some data and examine the relationship and learn, to your astonishment, that there is a positive correlation between striking-out and salaries. The data would seem to imply that the players who strike out the most, on average, also get paid the most, on average.

What’s going on? Are all of the owners crazy or just stupid?

Neither.

Let’s add a third data point: home-runs. If you create a new model that accounts for the impacts of both strike-outs AND home-runs on salary, you’ll see a new relationship emege. Home-runs positively correlate with salary while strike-outs now negatively correlate. This revelation lends itself to an entirely different interpretation than the one above: players who hit the most home-runs get the biggest checks, but, because home-runs require heavier swings, they also strike out more often.

I Was For Striking-Out Before I Was Against It: Correlation Flipping

Let’s unpack that a bit. If your model only includes one variable (strike-outs), the results imply a positive correlation with salary. But, if you also include the impact of home-runs – what, in statistics speak, is known as controlling for home-runs, or holding home-runs constant – the correlation between strike-outs and salary flips from positive to negative.

In simple terms, if you hold-home runs constant – if you assume that every baseball player got the same number of home runs – then strike-outs are negatively correlated with salary. But if you don’t hold home-runs constant, the highest-paid/most-home-run-hittingest/more-striking-out-than-average players bias the salary curve in such a way as to make it look like a crappy bating average is a good financial move.

This phenomenon is called omitted variable bias (OVB). Simply put, OVB is the impact the variables you ignore have on the variables you track.

The Baby and The Bath Water: Omitted Variable Bias Can Be Your Friend

So, OVB is bad right? Not exactly. For one thing, it’s omni-present. As I pointed out in Part 3, you can’t possibly assess the impact of every variable at play in the models you build. You can’t control for the impact of thermally driven changes in air-density when it comes to home-runs and strike-outs. You can’t control for which baseball players suddenly had bad gas when they stepped up to the plate.

And in this regard, OVB is actually your friend: instead of accounting for all of that minutiae, all those individual elements you either can’t control or can’t track (or both), just let it “float” in the OVB ether. In other words, instead of stressing about accounting for every little variable that might be at play, just accept that the impact of those variables will show-up in the variance of the variables you do track.

This aspect of statistics – accounting for OVB – is where the science stumbles into art territory. You need to decide which variables to track, and which variables to let float. Which forms of OVB are acceptable and which are detrimental. And part of that decision boils down to having a model that you can actually describe to people. A model of ball player salaries that controls for the caloric content of their breakfasts, their stool consistency, and whether they are getting divorced is not going to lend itself to a meaningful interpretation.*

A More Relevant Example

You are tracking your team’s development velocity. You can meticulously track sick days, days devs had to leave early to take their kids to the doctor, delays on commuter trains, and major pile-ups on the highway as a way to model future development speed. Oooooooor you can simply track the story points closed per week, with the understanding that the collective impact of all those other variables is floating in your velocity’s variance.

A good statistical model strives for parsimony: the number of variables necessary to provide a reasonable understanding of the real world, and no more. And a good rule of thumb for deciding what to include is whether or not you can control the variable in question. You can control how many meetings you have or how much time your team members spend on activities other than feature development. You can’t control when their kids get sick or when there’s a wreck on I-94.

The Art of the Science

At the same time, that’s not a hard and fast rule. Perhaps public transportation breaks down so often that you need to control for its impact. Again, statistical models are a highly scientific art. Or a very artful science. In the end, your choice boils down to the story about the real world that you’re trying to tell with your model and what variables matter to that story.

OVB will also ensure that the fluctuations you see in your velocity will account for the time taken away from feature development by other forms of work. Unless you specifically need to understand the impact that non-feature work as on your velocity, don’t worry about tracking it. You have enough going on.

How To Estimate “Planning Poker”

Once you’ve scoped out and specified your features, you need to review them across disciplines. How many people you pull into these sessions is up to you. I recommend pulling in one representative from each discipline, preferably more senior folks. But if you have a small studio or dev team, it might make sense to pull everyone in. Or maybe your studio is divided up into small strike-teams, in which case those groups are the logical contenders.

Once you have the necessary people in a room together, give everyone a “deck” consisting of 7 cards, each containing a number in the Fibonacci sequence from 1 to 21. You can make these yourself or order them online. You can also download apps like Scrum Planning Poker or one of the multiple planning poker plug-ins for Jira.

How to Play

Move through the groomed user stories one at a time, starting at the highest priority. Have the author of each user story describe the it, the tech requirements, and the acceptance criteria. Let the folks in the room discuss it until all have a consensus about what’s required. This may mean an adjustment of the requirements or criteria, or some re-prioritization of other features to account for dependencies.

Once everyone understands the user story, each person picks a card from his/her deck (without revealing it to the room) corresponding to his/her estimate of the scope of the story. When everyone has an estimate, then all reveal their cards at the same time. If team members have different scope estimates, talk through them until you have a consensus on what the actual estimate should be, and then move onto the next story.

In my experience, these sessions are worth the time because they facilitate cross-disciplinary discussion, coordination, and understanding. They surface a lot of problems before anyone starts coding, which is when those problems are typically the cheapest to solve.

Why can’t we just say our estimates outloud? Do I really need cards?

YES!

YES YES YES YES YES YES YES YES YES YES YES!!

For two vital reasons if you actually want estimation sessions to be useful. First, think back to anchoring from Part 5. The first person to rattle off a number will anchor everyone else around that number.

The second reason not to just say the estimates is plain and simple peer pressure.

I must obey…I must obey…: The Inner Borg of Social Proof

When I was waiting tables, I learned quickly that I didn’t need to get everyone to buy drinks to up my tab. I just needed to get the first person I spoke to to buy a drink. Everyone else at the table would, almost invariably, follow his or her lead. And if that person said “Just water”, I was screwed, because (again, almost without failure), everyone else would say “Just water.” This outcome was so consistent that I dubbed it “the waterfall”. This was the first time I became aware of a phenomenon called social proof.

The Asch experiment

In 1951, a psychologist named Solomon Asch performed a now famous experiment at Swarthmore College. He brought a small group of people into a room and had them look at pictures of a reference line and then three separate lines marked A, B, and C. The group was supposed to verbally identify labeled line that was the same length as the reference line through multiple rounds of such cards. They experimenters designed the cards in such a way that it was blatantly obvious which line was the same length.

The twist in the experiment is that all but one of the group of people were working with the experimenters. For the first few rounds, these confederates would pick the correct answer. Then they would deliberately pick one of the wrong lines. If the correct line was B, they would all say A. The purpose of the experiment was to see what the one person who wasn’t in on the gag would say. This person was seated in such a way that the confederates would go first and thus influence his answers.

The Impact of Social Proof

The result: sure enough, the odd man out succumbed to group dynamics and gave the wrong answer. In almost 37% of the trials, the test subjects would pick the answer that was obviously wrong. Sometimes it was because they didn’t want to stand out. Sometimes it was because they thought they must be wrong if everyone else had a different answer.

Psychologist named this dynamic “social proof”. And if it can drive 37% of people to make the wrong choice in an objective situation, think how much it can bias peoples responses in subjective situations. Like estimating scope, for instance.

Picking Numbers Ahead of Time Drastically Mitigates Bias

When you’re playing planning poker, you want to hear from the outliers. If someone has a drastically different viewpoint on the complexity or scope of a user story, that is exactly the perspective you want to hear.

If you just let people verbally rattle off story point estimates one at a time, everyone will almost invariably parrot whatever the first person says. In which case, you may as well just have one person doing all of your estimates, because you’re not getting any value from a cross-disciplinary review.

Consistency is the Hallmark of a Champion

Consistency is the key if you want to get the most out of the system.

Consistently Play Planning Poker

Even if you don’t have a lot of user stories to go through, make a point to have a session at some regular cadence. Once a week or bi-weekly.

Point All of Your User Stories in a Consistent Manner

It doesn’t matter if this is when you first specify a user story, when you tag it to a milestone, or when a developer actually pulls the user story to start working on it. But pick one stage and do it consistently, so that you are always comparing apples to apples. I think the earlier the better, but pick one recurring point in your dev cycle, and stick with it.

Decide What Types of To-Do Items You’re Going to Point and Which Ones You Aren’t, and Stay Consistent

If you want to point everything (bugs, tasks, user stories, character models, etc), that’s fine. Or, if you just want to point user stories, that’s fine too (and preferred from my perspective). But, again, the key is to stay consistent if you want meaningful forecasts.

Never,Ever Change the Point Value of a User Story After the Task is Complete

If you estimate a task a 13 points and then complete it only to realize it was actually closer to a 2-point task, it’s tempting to downgrade the story. Don’t. Again, for this system to work, you need to compare apples to apples. You estimate scope before development, when you know the least about the story. If you change the point value of a task after the fact, you are now comparing minimum-knowledge apples to full-knowledge oranges. Just leave the story at 13 points and move on. Even if you severely under-estimate a few stories, you will overestimate others just as badly. If you are chronically over- or under-estimating, then the issue is how you assess scope.

Next Up Forecasting!

In my next post, I’ll show you how to use story points and user stories to forecast development over time. So click here to read on!

Key Takeaways

User Stories are a useful method for clarifying the intention of a feature

By grooming user stories, you invest some time up front to avoid potentially squandering a lot of time down the line

You can estimate whatever you want, but I recommend sticking with the user stories

Omitted variable bias will account for everything else

Cross-disciplinary games of “planning poker” will bring a holistic perspective to user stories and can catch a lot of potential problems before they happen

*It also jacks up the individual confidence intervals of of the variables in your model. I’m not going to get into that math here, as these kinds of models (called regressions) require heavy duty statistical software to develop. But the short version is that the more variables you include/control for, the wider each of their confidence intervals gets.