Effective software development and delivery

40% to 99% of your team’s effort is wasted (give or take a bit)

Recently I noticed a post on Twitter that referred to this article by Eric Barker. Barker, in turn, shares information he learned in a conversation with Po Bronson, an author (with Ashley Merryman) of Top Dog: The Science of Winning and Losing. Now, notwithstanding the word “science” in the title, this is a “pop science” book, not a science book. It’s based in part on the authors’ “research” (reading statistical studies and so forth) and in part on popular assumptions – what we might call “leprechauns” or “urban legends.”.

The article rubbed me the wrong way, so I’m going to indulge myself with a bit of a rant.

40% of your team’s effort is wasted

It was the title of Barker’s article that caught my eye. The assertion that “40% of your team’s effort is wasted” falls so far out of line with my own professional experience that it drew me in. I followed the link to the article.

No doubt Barker intended the title to be an attention-grabber, but I suspect it grabbed mine for a different reason than he might have expected. I think the title was meant to evoke a response like, “Oh, my God! We’re wasting 40% of our effort? That’s horrible!” In my experience, when I ask people how efficient they think their software development process is, they guess it’s probably in the range of 80% – 90%. Sometimes, perhaps in an attempt to appear modest, they’ll say 70%. So, yeah, they’d be a bit shocked to hear it’s “only” 60%.

I had precisely the opposite reaction. Maybe it’s because I’ve been getting into Lean Thinking the past few years. Lean Thinking offers an interesting way to look at time. When we map the value stream of an organization or team, and we start to track the proportion of value-add time to total lead time (a metric called process cycle efficiency, or PCE), we find that most teams actually waste around 98% – 99% of their time. The article’s title caught my eye because 60% efficiency seemed shockingly optimistic.

And it gets worse. If you can achieve PCE of 5%, you’re doing really, really well; four or five times better than a typical IT organization. You can forget about 60%. A creative product-development endeavor like software doesn’t lend itself to standard, predictable “production.” It’s more an exporatory process. It’s still useful to avoid rework, defects, waiting, and the rest of the Lean wastes, to the extent feasible. It’s also a good idea to stay out of creative workers’ way once you set them on a path toward a goal, and not pull them into meetings all day. But if you think you can be 90% (or even 60%) “efficient,” then please send me a text message when you get there. I won’t hold my breath.

Projects are late and over budget because teams are dysfunctional

The article goes on to connect two orthogonal, unsupported assumptions as if, together, they explained something. To be fair, I should remember that it’s possible to arrive at a true conclusion based on two false premises. That’s just basic logic, after all. Anyway, the assertion I find questionable is this: “…unfortunately, so many teams are dysfunctional: 49 percent of software projects are delivered late, 60 percent are over-budget.” Oh, yeah, that’s the teams’ fault, for sure. Everything else in IT organizations is just fine.

I’m often engaged by companies to work with their development teams to help them become more effective at delivering quality software on time. More often than not, I discover that the teams’ problems are not due to their own “dysfunction,” but rather to organizational constraints they don’t control. There’s a school of thought called Systems Thinking that explains how organizational forces conspire to prevent competent, motivated professionals from doing much of anything. So much for the first premise.

The second premise reflects the “leprechaun” that was reported by the first Chaos Report from the Standish Group. The report has since been criticized for its research methods, as well as for basing conclusions on a pre-selected set of projects that were already known to have been troubled. The percentage of “challenged projects” (Standish’s term) is substantially lower than they reported, and typically the reasons have nothing to do with team dysfunction.

Projects exceed their fixed budget and timeline because they have a fixed budget and timeline. It’s the habit of planning creative, exploratory work as if it were routine, assembly-line work that’s to blame for missed expectations, not the “dysfunctional” teams that did their utmost to meet unrealistic goals. But that’s Management Science for ya…still shiny after all these years.

Teams are over-rated…just work from home

The idea that it’s organizations and not teams that are dysfunctional may also explain another phenomenon Barker mentions in his article. Citing Daniel Pink, he mentions an “IBM telecommuting study which showed that, telecommuters were actually more productive, not less.” Well, duh.

It’s the difference between the maker’s schedule and the manager’s schedule, as described by Paul Graham. When you’re working from home, you’re not continually interrupted with questions, phone calls, and meetings. You actually have a fighting chance to maintain a train of thought for more than nine seconds at a time. How could you not do better work under those conditions?

And IBM? It’s one of those large organizations that runs everything on manager’s time. Of course their top software engineers aren’t as productive at the office! How could they be? And would it occur to them to question their own organizational structure or management practices? No, the only variable of interest is whether the person is working at home or in the office. After all, nothing else is going to change.

Functional silos over cross-disciplinary collaboration

“The science of teams in a business context says that pretty much the number one thing you can do to improve team performance, is to clarify roles.” There’s the S word again. Like any other word, I guess “science” means, as Humpty-Dumpty said, “just what I choose it to mean – neither more nor less.”

I’ll grant that if your goal is to maximize resource utilization, and you think humans are resources, then functional silos are a great way to keep everyone busy and to limit their contributions to a narrowly-defined job description that never taps into more than 2% of their intelligence and creativity. It’s also a great way to ensure PCE can’t climb above 1% – 2%. But that’s okay, because you can just blame your teams. They’re dysfunctional, you know. That’s science.

The omega wolf

Barker observes that “a really successful team needs at least one person who is not a team player. Someone who’s willing to stand up to authority, to rock the boat. To not make everybody happy. To not pat everybody on the back.” It’s pretty common these days for people to pursue “continuous improvement” or “continual improvement” (as you prefer). That means all team members question the status quo and offer their ideas for alternative methods that might yield better outcomes. To do so is to be a team player, not the opposite.

There is another situation in many organizations, and I wonder if that’s what Barker is getting at here. Sometimes, organizational constraints place so much pressure on teams that there has to be someone on each team who questions the rules, procedures, and assumptions that prevent effective work. I like to call this phenomenon the Omega Wolf Anti-Pattern, based on an analogous facet of wolf behavior.

Every wolf pack has an omega who bears the brunt of pack members’ frustrations. This individual functions as a sort of social glue for the pack, defusing conflict and aggression before it harms the group’s cohesion. When the omega dies, another member of the pack moves into that role. It’s normal for wolves.

When organizational forces create a need for this sort of role on technical teams, someone on the team will speak out, often in a politically-risky fashion. This helps defuse the frustration created by the dysfunctional organization, and acts as a sort of social glue for the team. It’s not normal for humans. It’s an indicator of organizational dysfunction.

When managers apply the traditional “cure” for this, and fire or drive away the troublesome individual, but they do nothing to correct the organizational dysfunction that creates the necessity for the omega role, then another team member will step into that role. To the extent this helps teams cope with institutionalized organizational dysfunction, it can be seen as a hallmark of a “successful” team. Teams that have no such member often behave as victims. I don’t have to tell you which teams are which in your organization. You already know. I have to say it’s a rather sad definition of “success,” though.

Your resume doesn’t matter – you gotta play

One of the more puzzling assertions in the article is this: “For orchestras, the better they sounded during performance, the more chaos there was behind the scenes.” Granted, I don’t know where Barker plays. Personally, I’ve never seen this phenomenon. In any orchestra on a level from advanced amateur to professional, every member can more-or-less sight-read new music of arbitrary difficulty. When it comes to standard repertoire, players know most of their parts before they even join the orchestra; they had to play several of those parts to pass their audition. There’s no chaos behind the scenes. They always play. Sometimes there’s an audience.

Software development teams always write code. Sometimes there’s a release. They can’t get away with being incompetent developers for three months and then suddenly put on a good performance at release time. It just doesn’t work that way. The quality of their code directly reflects their competence. Where the musical analogy breaks down is that in the software field a lot of people have an impressive resume but aren’t actually very good at software development; in music, your resume doesn’t matter – you gotta play.

Whither goest thou, O Pilgrim?

I could be wrong, but Barker and those he cites seem to want to set aside lessons learned in IT over the past 20-odd years and drive us back into the nasty, horrible 1980s, that Dark Time when middle managers ate the souls of software developers for breakfast and danced upon their graves at moonrise. If so, Barker will look down one day and see only one set of tracks in the sand…his own.

I have a question on Process Cycle Efficiency. When measuring PCE, do you account only for potentially available working time or total time?

In other words if cycle time for a task was 4 days and touch time was 8 hours would PCE be 25% (assuming 8h workday) or 8,3%?

I tend to do former but I find the latter being pretty common when someone brings the argument of (lack of) efficiency. Especially when it comes to report very low PCEs, like ones that you mention in the article (1-2% PCE).

I guess you mean the task took 4 days from start to finish and people were working 8-hour days. When you say “touch time,” do you mean the task was deemed “in progress” for 8 hours out of 32? If so, then that would yield a crude measure of PCE of 25%, just as you say. But that’s not the true PCE, because it doesn’t count the times when the task was “in progress” but in a wait state for one reason or another. It’s really hard to collect the raw data at that level of detail, so I don’t usually recommend doing it except for a limited time as a way to expose time sinks in the process.

Let’s say the person working on the task had 5 tasks “in progress” at the same time. They could directly add value to only 1 task at a time. The other 4 were waiting at any given moment during the 4 days. In addition, there were times when the person was in meetings, answering emails, dealing with colleagues’ questions, waiting for information, waiting for resources (such as test environments, etc.), and performing administrative tasks (like entering their time in a time reporting system). Then you have context-switching overhead when the person turns their attention from Task 1 to Task 2 (or whatever). It takes 10-20 minutes to ramp up to a state of “flow” and be fully focused on a task. We can say on average we lose 50% effectiveness for 15 minutes (just to round everything) with every context switch event. So, juggling 5 tasks would incur some context-switching cost we need to consider. When you add all that up, you see that you aren’t really adding value to the task for the full 25% of “touch time.” The low percentages you see reported are not merely a result of using 24 hour days as the baseline.

However, there’s yet another angle on this, IMO. The customer is waiting 24 hours a day for the product. They don’t care why it takes 4 days to complete a task. If the team can reduce mean cycle time to less than 1 day, then the 24 hour vs. 8 hour question doesn’t arise at all. They might want to think about ways to achieve that.

Touch time the way I use it assumes “time actively spent on building a task / adding value.” The reason why I use touch time to discuss efficiency is exactly because the argument you’ve brought. If I have that I don’t have to worry of how much WIP there was in the process, because it takes that into account.

Simply counting time spent in “active” stages of the process typically doesn’t work because of lots of work in progress and heavy multitasking–exactly the reasons you point.

It occurs to me there’s another angle on the question of whether to use 24 hour or 8 hour days as a unit of measure. If you’re tracking cycle time in hours, then count “work” hours. If you’re tracking it in days, it doesn’t matter how many hours there are in a day.

That’s right. Unless you’re counting one thing in days and the other in hours, which I found several times. In such a case one has to understand how to do standardization, thus the whole 24 vs 8 hours discussion.

I think of PCE as a measure of touch time over the entire lead time of a given work item. My main purpose for making that ratio visible is to show the cost of long queues and handoffs. Ultimately the greatest gains in a system usually are in managing the flow of work reducing lead time, minimizing excess handoffs and long queues.

A current agile fallacy seems to be that creating hyerproductive teams will cure all ills. Or even that teams have the capability to unilaterally and significantly impact the organization’s success. Some folks even seem think their 2 week iterations reflect total delivery time. When I’ve mapped value streams for even agile teams we often find lead times are in the 4-6+ month range. This is not from a statistically meaningful sample size but I am betting this would be the case if we mapped out lead time for organizations using agile methods. Of course these are much shorter lead times than larger batch “waterfall” organizations but still far from optimal.

The problem I have with that approach to PCE is that the same 2h long task started at 10 am and completed 4 working hours later will have PCE of 50% while if it was started at 4 pm it will have PCE of 10% (as we have 16h of non working time included in lead time). The only difference is what time of a day a task was started, so why such huge differences in PCE?

The example is a bit extreme but only a bit. For longer tasks the PCE difference would be threefold (8h long working day versus 24h day).

Also, we optimistically assume here that everyone is a full-timer which in many cases simply isn’t true.

Yeah I see your point Pawel. I think it’s maybe a bit more granular than it needs to be. I agree if we ensure we’re using the same units of measurement we’d reduce inaccuracy but even your extreme example is 10% most people fallaciously believe it’s 60, 70 or even 80%. To me the value is in showing, even roughly, what the relative improvement potential exists in focusing on team level efficiency/productivity vs managing the entire flow of value reducing lead times by managing the system.