Organizing An Agile Program, Part 5: Measurements That Might Mean Something to a Program

This is the last in this series of organizing the agile program management posts. It won’t be the last of all the posts, it’s just the last in the series of organizing posts. If I’d realized how many words I was writing in each post I’m not sure I would have started the series. Yes, these are the essence of my book, Agile and Lean Program Management: Collaborating Across the Organization. At the end of this post, I’ll list all the posts so you can see how they all come together.

How Complex Is Your Program?

Before we start the measurements themselves, let’s review the Cynefin model, because the size of your program might well change what and how you measure.

(Updated to use the real Cynefin framework.)

An agile project, one co-located team, is a complicated project. We understand in most circumstances how to do these projects. You still need to sense what is going on, which is why we are going to discuss measurements. You still need to analyze what is going on, and you still need to respond.

However, once you move past one team, you move past complicated, into either complex or chaotic. This is why I asked you how many teams you had, way back at the beginning of this series.

Small programs of two or three teams, may be able to use the same measurements as one agile team, and extrapolate from there. No, you cannot add the velocity of several teams (more on that later), but with a product backlog burnup chart, you can get pretty close to all the data you need. With only two or three teams, you should be able to get to continuous integration. Sure, you are still complicated, but you should be okay.

Once you hit the four team mark, and especially once you hit the nine team mark, you hit the complex and chaotic places in the Cynefin model. If you are able to integrate all the time, and if you have geographically distributed teams, you may need different measurements. You need to act differently.

What I am proposing here is not team-based measurements, either for the feature teams or the core teams. The feature teams are still each responsible for their own measurements. They have to be. The core team is still responsible for making sure they deliver their deliverables, and I have not addressed that at all yet. (It’s on my backlog, never fear.)

But imagine you have 25 or 30 or 40 feature teams, maybe all over the world. Imagine they actually are delivering features into the code base, chunk by chunk on a regular basis, and they are very close to or are using continuous integration. Maybe they don’t always finish everything they commit to, because they are human, but 9 iterations out of 10, they do. I’m talking about a real program here. What should these people measure? Let’s discuss that.

What Measurements Will Mean Something to Your Program?

Measurements come in two flavors: predictive indicators and lagging indicators. Of course, you want predictive indicators. You have a large effort.

What do you have for a one-team agile project? You have velocity, burnup charts, and cumulative flow. Can you use these measures at the program level?

You cannot use velocity at the program level. It is not a predictive measure. Here’s why.

Velocity is Personal to a Team

One of the mistakes I see a lot is that managers want to take one team’s velocity and somehow add them together. Don’t do that. Velocity is personal to one team. It’s only predictive to that team and it’s only predictive for estimation purposes. It’s not predictive to any other team.

Velocity is Not Additive

You can’t add velocities together, either. Why? Because as the product owners get better at making the stories smaller—and you want them to become better at this—the number of stories may well increase per iteration. Or, remember back in Organizing an Agile Program Part 4, Start From the Teams and Work Up, I mentioned that a team might move from one feature set to another in the middle of a program? And that was okay? That will change a team’s velocity.

So, we have team measurements that don’t add up. What can we use? Feature-based measurements.

Measure by Features, Not by Teams

Programs are all about features. That means you want to measure by feature. Sometimes, multiple teams work off the same shared backlog. Does it matter if you measure what a team does? It matters for the team. It does not matter for the program.

If you buy the self-organizing argument, that the program manager manages by results, then the program manager does not care about the team’s measurements. The program manager assumes the team is smart enough to care about the team’s velocity and to look at it and know if it’s a hockey stick and fix it.

But, the program manager really cares about the program‘s progress on features across the teams. If the entire program has a hockey stick, that’s a problem. It means that there are too many features in progress, that there is no continuous integration, and the work-in-progress in the teams is too high.

So, how do you measure this? You could measure “Number of features completed per iteration.” Why not story points? Because customers don’t buy story points. They don’t buy architecture frameworks. Customers buy features. They only buy running tested features. That’s it.

This is where the program bumps up against the project portfolio.

Now, it happens that each feature is some number of story points. And, the way to make more features is to break them down into smaller slices. Yes, this is gaming the system. All measurements can be gamed. All of them. However, especially in a large effort such as a program, you want small features. You will see progress faster with minimum marketable features. So, it’s okay if your product owners start making smaller features.

Measure Features Complete

Now, I don’t just measure #features done per iteration, because you and I both know that someone is going to add features during the program. The roadmap will change. That means the backlogs for the teams will change. You need to make that visible to everyone. Otherwise, everyone wants to know,”Why can’t you predict this program’s cost?” So, this chart will look a little like my velocity chart for teams. You’ve got to allow for change.

With this kind of a chart, you can discuss the inevitable changes. And, when people ask you, “How much will this program cost,” you can ask them, “When would you like us to re-estimate the product backlog?”

Because if your programs are anything like my programs, they fall into a couple of categories: the first is the kind we have to do, because they are critical to the organization’s success, and the only question is when we will release the darn thing, so we will play with the roadmap and the backlog; or the second is when we want to release the darn thing as quickly as possible because we need the money, darn it, so we have to play with the roadmap and the backlog to get the most value out of what we’re doing. (Do you have a third?)

Did you notice the common theme here: we will have to play with the roadmap and the backlog to get the most value out of the program? Notice I did not say ROI (return on investment). You cannot calculate ROI until after you release, so there’s no point in trying to measure that right now. I also don’t believe in earned value for software. Not when you can change it.

Earned value is a surrogate for features complete. Why not measure features complete? Oh, and I mean complete. As in running and tested and integrated and demo’d and anything else you need to do to release. Done. Done-done-done. However many dones you need to be complete. Okay? Complete.

Measure the Product Backlog Burnup

The next chart is the product backlog burnup chart. This is where you take all the feature sets, plot them next to each other on one chart and show how they might be done near each other.

This is product backlog burnup chart is what we might have had for a product I worked on long ago, a voicemail system.

Each part of the system has a different team working on it. The Alarms team was different from the CO Admin team. Okay, so there was only one CO Admin team in the program when I worked on it, but if we had separated it into two teams, or two feature sets, this is what it would have looked like. And, maybe Forward Message and Delete Message might have been part of one Message team. But this is the idea. Believe me, I was racking my brain for an example 🙂 Darned NDA agreements.

Measure the Time to Your Releaseable Deliverable

Since what you care about is releaseable product at the program level, measure the time to achieve it. Measure the time between releaseable deliverables. Does the product go from releaseable to not releaseable?

This is a predictive indicator, believe it or not. Because you are accustomed to releasing on a rhythm, your past time-to-release is a predictor of your future time-to-release. Unless you know how to shorten that time, it’s not going to become shorter without significant effort from you. You can’t get feedback in less time than this time. This is important to the program.

If you have quarterly release trains, it’s 90 days between releaseable product. That means it’s 90 days from feedback cycle to feedback cycle. For me, that’s too long to be agile. Maybe you are on your way to being agile and that’s as good as you can get for now. Okay. But measure it.

Other Potential Measurements

You might decide to measure run rate, the amount of salary and overhead the program uses every iteration or every month. That might keep the accountants happy. That also provides you the excuse to ask everybody, “You’re not working on any other project or program, are you? Because I’m taking your salary hit on this program. So if you’re working on something else, let me know!” Multitasking is a disaster on a program. It’s insidious and slows a program to a crawl faster than anything I know.

Depending on the number of teams, you might be able to look at cumulative flow or work in progress. The more teams, the more this can get out of hand quickly.

If your program feels stuck, you might want the teams to measure their Fault Feedback Ratios. If, as I suspect the Compound Annual Growth Rate is small for an agile team, (want to help collect data?) and your feature teams don’t have small growth, that’s a problem. Again, these are team data points. Although, you can measure CAGR for the entire code base. You would want to see where the code growth is larger than expected to do some problem-solving—not blaming! People grow code fast when they are under pressure. You want to learn why they are under pressure, not blame them for being under pressure.

You might want to track defects/feature, although if you have many, I suggest doing something about that sooner rather than later. Your technical debt will get out of control very fast, and reduce your momentum.

How to Answer the “When Will You Be Done/How Much Will Your Program Cost” Question

You can see that with the approaches I’ve suggested, you always have something to show your sponsors. You always have a walking skeleton. You always either had a demo within the last two weeks, or you have one coming up. You have a roadmap with the releases planned. You have a program product owner who continues to work on the roadmap.

Armed with the roadmap and historical data about what the teams have delivered, and the product backlog burnup chart, you can ask the people who ask, “When will your program be done?” this question, “What is of most value to you? Once we know what is of most value, we can work on that first. If the backlog is already ranked in value order, we can either take the time to estimate it, or I can provide an estimate with a confidence level in a couple of days.”

It’s the same kind of answer with the cost question. The only problem is if these people ask before you’ve started anything at all. That’s really a project portfolio question.

When I’ve been a program manager, I’ve asked this question, “Why do you want to know the answer to this question before we started the program? What information do you need?” I’ve often answered this with “5” followed by “many zeros” for cost and “Christmas, but don’t ask me which year.” This might be why I am a consultant—because I am the Queen of the Career-Limiting Conversation.

The problem is that the very beginning of the program is when you know the least. I don’t have a crystal ball. I bet you don’t either. I don’t even gamble in Las Vegas, never mind win. I bet your managers don’t either. Why do they want you to stake your good reputation on a number or a date that can’t possibly be true?

Now, if they want you to spend time estimating instead of delivering, you can do that. I have explained that we can spend two weeks estimating, but we will learn nothing from that. We can spend two weeks delivering, and we will learn more, and we will have a walking skeleton that we can demo. Now, the teams have to deliver a walking skeleton.

This is a measurement. It’s a qualitative measurement of your influence on the teams. It’s a measure of how much your management trusts you. Better you should know now, when you can do something about it.

Summary of the Principles of the Measurements for an Agile Program

So, what does this all mean?

You want to measure at the program level, not at the project level. Do not even think about taking project team measurements and adding them together. That’s like taking my husband’s weight and mine, adding them together, dividing by two and determining our average jean size. Yeah, I just made your eyes bug out, right? It makes just as much sense to try to determine “average” velocity when you do that with teams.

Measure what you want to see. You want completed features. You want releaseable code. What else do you want? Measure that. Do not measure surrogates. The more you measure surrogates, the less you will get what what you want.

The feature teams are also responsible for their measurements. They need to measure what they do to know if they are going off the rails and if they have risks in their projects. See my guidelines in Manage It! Your Guide to Modern, Pragmatic Project Management for those of you who are agile, and those of you on your way.

Ask for demos. Remember in the manifesto, on the principles page, there is a line that says,

“Working software is the primary measure of progress.”

Believe it. The more your product works, the fewer measurements you need. It’s not that you don’t need measurements. It’s that you don’t need too many at the program level.

10 Replies to “Organizing An Agile Program, Part 5: Measurements That Might Mean Something to a Program”

Johanna,
I realize that you included the disclaimer that this is *your* version of the Cynefin model, and not Dave Snowden’s original, but I think there is insights in adhering to the original. The most relevant here is the Rule of 5, 15, and 150.
First off, Chaos is not a state that any human endeavor can sustain for more than a short period. It’s that moment after the bomb blast before some yells “duck”. The moment people start to react, we are moving toward Simple. When you see a large team “in Chaos”, what you’re really seeing is Disorder–you can’t recognize what domain you’re in because they’re teams are split, not being led, and all over the map.
In true Chaos, emerging from crisis situations, we use teams of 5. Five is the number of thoughts you can hold in your head at one time. Five is the size of a fire department crisis leadership team (or a Marine Corps fire team). You can predict what the five on your crisis team are going to do, because you have an intuitive connection to them. (The leadership team is 5, they may have many people working for them). A team of more than five probably won’t succeed in creative out of the box thinking.
In Complexity, we use teams of 15–the number most cultures have in their immediate family. This is the number of people you can intimately trust–in a given context (your project, for example).
In Complicated, we can have teams of up to 150–Dunbar’s number, the number of acquaintances you can have in a given context. If your team is about 150, you can know everybody’s name. Once you go over that approximate limit, bureaucracy rises, rules replace relationships.
In Simple, there’s no limit to how many can be on the team.
In your conversation about measurements, I would say that Agile is a very good method for managing complex projects. Your team of 5-15 can understand that emergent system with those typical measurements, probably because of the level of trust on a team that small. A larger team (or team of teams) needs measurements suited for communicating across 150 people. Features works because it’s more objective and less vulnerable to the dynamics of the small team relationship.

Andy, I was attempting to duplicate the model and not violate copyrights for images 🙂

Using your numbers, a small project would live between Chaos (5-7 people) and Complexity (up to 15).

A small program of up to three teams, say 15-18 people would be at the edge of Complexity. I think we are in agreement here.

A medium program is four to nine teams, 20-45 people, and at the outside, 28-63 or so people. This is still Complexity in your terms?

The problem is when you have a large program, where you start at nine or ten teams, at say, 50 people and up. Maybe you say, it has to be 150 people. Okay, I can live with that. Maybe my ten teams is just what I’ve seen. Maybe when you throw in geographically distributed teams, it starts lower, because it’s so hard to build trust. Certainly once you have 150 people, you’ve got to think about Chaotic. I suspect it occurs earlier.

When I’ve worked with my clients on agile programs, it starts earlier. Maybe it doesn’t start at 50 people, but that’s because they always have larger than five people on their teams. They have 7 or 8 people on their teams. So the ten-team programs have 70 or 80 people on them. They don’t hit Dunbar’s number, but these programs are never all on one campus (is it just me??), and they have little way to build trust. BTW, my experience is that it takes about 90 people to start to lose people’s names, not 150. I realize Dunbar’s Number is famous, but I see something smaller in practice. Just saying.

I don’t claim to have all the answers yet. I’m still working with these clients. But I do have some of the answers. And one of the answers is that using some of our old thinking doesn’t get us to where we need to be. I would be happy to be wrong.

I’m not a theorist. You know me. I’m a practitioner. Let’s hear from some other practitioners.

I would argue against measuring the number of features. That is a misleading indicator. Not every feature is equal. If team A delivers 10 features and team B delivers 5 and you measure # of features then team A must be better, right? Well, what if those 10 features were worth $500K but team B’s 5 features were worth $800K. Who is better now?

But even that is a little misleading — well maybe not, maybe its points out an issue and then you have to look into why the issue exists. Such as, who decided which team would work on the features they worked on? Is the overall effort the same? Etc.

In the end, you have to be careful what you measure. As they say “tell me what you measure and I’ll tell you how I’ll behave.”
Great post!
/Julia

Maybe you missed my post where I said you can’t compare teams. I totally agree with you that measuring features to compare teams is a bad idea. Bad, bad, bad.

However, looking at the total number of features complete, especially as a burnup chart for the entire program, that is still useful. Remember, this is the entire program. That’s what that first chart is.

The second chart is by project inside the program. Maybe that’s what you object to? That’s why you don’t measure it as a burndown. That’s why you don’t measure it as velocity. That’s why you measure it as an aggregate.

I still say working software is the best measurement of all. I agree with you: Tell me what you measure, and I’ll tell you how I’ll behave. If we measure working software, as in running tested features, don’t I have the most incentive to produce running tested features?