As features and products get larger more than one team needs to collaborate to deliver. This is often a tough problem to plan, co-ordinate and deliver effectively. We are sharing our tools that help understand how team load is balanced during the planning process with three goals in mind –

To visually document what work is on teams or skillsets with current planned features in mind

To highlight when load exceeds a teams or skillsets capacity (including the impact of in-planned work)

To help co-ordinate more optimal start dates for pieces or work to avoid saturating any one team or skillset

The tool is a “simple” spreadsheet that helps build a cohesive portfolio plan, and documentation about how we use it. Its pretty general, its main intent is to show the load on specific teams or skillsets over time interval (day, week, sprint, etc). You enter specific pieces of work for specific teams and the effort involved. You define what units “effort” is measured in. With a low and high guess about how long each work item will take (again in days, weeks, months), it can visually sum up utilization in those periods for each team. A simple heatmap shows load on each team. When you reach a limit you define, its clear to see.

Here is how we use it for monthly or quarterly planning –

We enter a list of teams or special skillsets

We ask for specific important dates, vacations, shows, release, etc.

We have the assembled group identify the MOST IMPORTANT feature (based on cost of delay optimally, they use our COD spreadsheet for that)

We break out what work is involved and what teams will do that work, with broad range estimates of time (weeks normally)

We set the start dates to align dependencies

We look at the heat-map. Is there more capacity? Yes, repeat from step 3

Yes, we focus the group on one feature at a time. When we reach capacity, we look for ways to resolve the constraint –

Shift start dates to spread the load on the constraining teams or skillsets

We add more capacity to the constraining teams or skillsets

We avoid a plan that is overcapacity at any point. If there is still reserve, we might look for ways those teams can assist the constraining teams, rarely do we suggest just add lower priority features just to keep them busy.

I just release the Traffic Light Simulator. This spreadsheet shows the eventual travel time for cars traveling along a commute to work that has five traffic signals. Playing and varying the probability of hitting each light, and the delay time incurred if that happens, the simulator shows a histogram of how many cars achieve the allowable travel times. It also shows a detailed map of all of the possible situations cars may encounter. The lucky ones get all green lights, the unlucky get all red.

Set the probability of up to 5 delays and see the impact on the cycle-time (travel time) distribution

Understand how many cars are impacted by red and green traffic signals, and how this plays out into different probabilities

Exercises to learn how different process decisions about delays impacts cycle time

Many real world processes, like car travel, follow this general pattern. A general amount of expected time if things go well, plus a number of possible delays. Possible delays are expressed as probabilities, 0% never, to 100% always. Software development work is one such process. Work we undertake has a hands-on time, plus a number of possible circumstances that slow that work down. By understanding how delays cascade into eventual cycle time, we can make smarter decisions about what improvement ideas are more likely to work than others.

This is an active area of exploration for me in 2017. My hypothesis is that given just the evident cycle time distribution currently exhibited by teams, the process can be identified. This spreadsheet has five other hypotheses, and I’m interested in hearing reasons why they are wrong.

For now, I’m just starting to fill the spreadsheet with interesting exercises. There are two at the moment. One gets you to find the delay probabilities that cause an Exponential distribution common to service and operations teams. The second exercise gets you to define delay probabilities that causes a moderate Weibull distribution common to software development teams.

Exponential style distribution – common for operations and support teams

Weibull style distribution – common for software development teams

Learning why cycle time distributions match either Exponential or a skewed Weibull gives solid evidence what improvement factors might work. For example, If the distribution looks skewed Weibull, it’s likely that the estimates of effort WILL NOT correlate to the cycle time. This is because the process is dominated by delays, and the amount of time spent actually hands-on the work in minor in comparison to idle time. Solving the reasons for the delays is the best avenue for improvement. If the current cycle time distribution is Exponential, then work time dominates the cycle time. Automation and more people are the better ways to decrease cycle time and throughput in these cases.

There is significant debate about whether estimates are waste. Too little debate as to whether (more correctly when) they are misleading.

When asked the questions, “Are estimates waste? Are they harmful?”, my answers are “Sometimes, and sometimes.” Situations of never or always are dangerous. What determines a definitive yes or no are the pre-conditions required to sway the balance one way or the other. This post is about what pre-conditions make estimates useful and beneficial and the conversely – what pre-conditions make estimates not just wasteful but misleading. This is all very new material, and likely not correct! I want the conversation to start.

NOTE: Nothing in this article says you should stop or start estimating or forecasting. This article is looking at the reasons why you should trust an answer given ANY forecasting technique or tool. If its working keep doing it until you find something cheaper that works just as well.

Why are size estimates used?

When Story Point estimates are used for forecasting the future delivery date or timeframe, a sum of unit of Story Points is converted into calendar time. Most often dividing a sum of unfinished work by an average velocity number (sum of completed points over a period of time, a sprint for example).

The same transformation occurs for people using Story Counts (no size estimate of each item is attempted other than splitting, just a count of items). In this technique, the count of unfinished items is divided by the average count of items finished in some period of time (a week for example).

There really isn’t a massive difference. Each technique is a pace based model for converting an amount of remaining work to calendar time, by simple division of some measure of pace. If you have used a burn-down chart, burn-up chart or cumulative flow chart to extrapolate how much longer, then you have seen how ongoing progress is used to convert a unit of un-finished work into how long in calendar time that work would take to complete.

Given that background this article will assume, “The goal of software estimates is to convert “size” into “calendar time”” – this is true if using Story Points or Story Counts. Sure there are other uses for estimates, but the purpose of this post is to discuss whether estimates can cause poor decisions and why.

The six requirements for estimates to be useful/reliable time forecasters

I commonly see six main reasons that cause estimates to degrade in useful proxy measures for converting size into time. The six are –

Estimable items: The items under investigation are understood and can be accurately sized in effort by the team (who has the knowledge to estimate this work)

Known or estimable pace: The delivery pace can be accurately estimated or measured for the duration of the work being delivered

Stable Estimate and Time Relationship: There is a consistent relationship between effort estimate and time

Stable size distribution: The items size distribution doesn’t change and is consistent over time

Dependent delays are stable: Delays likely in the work could possibly be known in advance don’t change

Independent delays are stable: Delays not due to the item itself but other factors like waiting for specialist staff don’t change

It’s unlikely any software development system of complexity fully satisfies all six assumptions. Small deviations from these assumptions may not matter.

How small is small enough to not matter? This is an area too little research has taken place. We know it occurs, some teams report managing to hit estimates. Others report failing. A way to know in advance if the odds are stacked against estimates will be a reliable predictor is needed.

Note that five out of the six reasons have nothing to do with the items estimated themselves, they have to do with the delivery system and environment.

This is an important point – even if the estimates themselves are PERFECT, they still may not be good predictors of calendar time.

For some contexts common in larger Government Aerospace and Defense projects, most of these assumptions are covered through rigorous analysis, which is why estimates are seen to be of benefit. In other contexts, teams are asked to give estimates when all six assumptions are violated. These teams are right to assume estimates are waste.

I want teams to say, the estimates aren’t just waste but are misleading and have the evidence to prove that.

To this ambition, I’m working on simple diagnostic worksheets to determine how likely your estimates are impacted by these factors. The goal is to show what system areas would give the biggest bang for the buck if you wanted to use some unit of size estimates for future calendar time forecasts. If we need to use calendar time in decision making (not saying we always need to, but sometime we do), then lets understand how exposed we are to giving a misleading answer even given due rigor.

Please vigorously attack these ideas. Here is what I want –

I want to move the conversation away from waste into usefulness.

I want people to understand that similar poor assumptions will apply to story count forecasting techniques, and to know when.

I want people to go one level deeper on the Never Works / Always Works arguments into the contexts that cause this to happen.

I want to learn!

Troy

Disclaimer: I strongly AVOID story point estimates for forecasting in ISOLATION. I use throughput (delivered story counts over time) primarily, BUT USE story points and velocity as a double check at least once every three months. So, I work fluently in both worlds and think you should never throw away a double check on your mathematics until it’s too costly for the benefit it provides. I also think for the part the team is responsible for, they can get better at that – estimation is a skill worth learning.

It seems common knowledge that measuring teams and presenting data is an Agile dysfunction. I disagree. But can see and have participated in abusive metric relationships in the past. I think we need to discuss better ways of achieving an evidence based Agile approach; without those participating feeling (or being) abused.

Here are my top five list of traits that make metric dashboards useful –

Measure competing things – its relatively easy to game a single metric, so its important to measure the impact of moving one metric by showing the others. Help teams target moving one metric and observe any negative impacts on others.

Make informed and smart trades – trading something the team is better than other teams in similar circumstance for something they desire to improve. Help teams identify what metric category they could trade (be less good) to raise another metric (become better).

Trends not numbers are important – observing unintended drifting over time of metric averages. Its about understanding something has changed, not how good or bad. Help teams react earlier to often slow moving regression in a metric or two. Less effort in correction the earlier it is detected.

Look for global or local trends – Comparing trends across teams is key to spotting system level opportunities (every team is impacted) versus single team opportunities. Help teams target improving things they can do without fighting a system level factors they are unlikely to solve.

No team will be good at everything – If a team is adversely trending on one metric, point out they are above average on another. Pick competing metrics so that no team will be great or terrible at all of them. There will always be a mix.

This list borrows heavily from the work of Larry Maccherone who correctly observed that a balanced blend of metric types gives the most information for identifying trends and improvement opportunities. His advice is to measure at least one thing from four broad areas –

How much

How well

How responsive

How repeatable or reliably

An implementation of this was recently made available in spreadsheet form. Driven from work item Start date, Completed date and Type, the spreadsheet builds a dashboard page in Excel. The choice of the four metrics was somewhat from experience, and there are plenty of alternative that might fit your context better. The advice stands though, pick a metric from the four areas.

To help roll out the dashboard, we have created a single page cheat-sheet to educate the teams on what each metric means and what to expect if that metric is overdriven. The goal is to stable in all four, not excessively good at any one.

There are many engagements where I work alongside very smart people. From leading coaches and trainers in the Agile world, to smart teams committed to delivering quality products that solve customer problems. In the trenches there is a constant feel of improvement and curiosity for doing better tomorrow what ailed us today. And a constant enquiring mind as to “why?”

I don’t see the same vigor in Agile conferences. I see a narrowing of ideas presented. I see risk adverse programs that cater to a simplistic mass message.

This is a similar predicament that technical journals in other fields have faced for years. A huge bias exists for publishing experiment results where an outcome was positive and expected, with rare publication of studies that failed to show expected results (ironically, often more or at least equal learning in failures). The pressure to cater for readers need to see articles from luminary known personalities versus taking a risk on currently unknown often polarizes work in the “old way.” Not to mention the commercial pressure of advertising and sponsorship concerns that shouldn’t influence editorial, but survival to offer anything is dependent on making sure they get value for money and continue support.

The dumbing down process starts silently. Commercial frameworks stifle innovation and polarize messages. Add certification, and you accelerate that stifling of new ideas freely emerging. This is a sure fire way to extinction.

To avoid this plague, here are a few suggestions for balancing a conference program –

Blind submission process – hard to do in reality, but most academic programs are build absent of the authors name or affiliation. The topic is discussed at length. Not, we can’t knock them back.

Conferences should publish in advance the allocation of subjects they want covered. This is crudely done at some conferences by having tracks, but even within a track they should say the percentage of topic allocation they want. E.g. 20% ideas for managing dependencies, 20% ideas for creating safety in teams.

The abstract should be brief during the submission process and then upon acceptance constructed into what the track chair and program desires in collaboration with the speaker (just like an editor in journals and book publishers commonly do)

For abstracts that are important but from a first time speaker, pair the science expert with a luminary speaker and have them co-present or work together. TED talks have shown that given coaching ANYONE PASSIONATE about a topic can make a compelling talk.

At the start of the conference, each track chair should present 10 minutes about the program they have assembled, and help the attendees understand why they should attend each talk. Often, the abstracts are too abstract for people to bother reading, and important sessions don’t get attended because people don’t know what they are about. A good topic title wins out over good content every time.

These are just a few ideas. I want to keep the Agile community vibrant and on a quest for learning. I think Agile conferences are a leading indicator of how new ideas might be lost, and want to avoid that. Not every conference is bad, but some are.

Here is a list of the top 10 tips i find myself giving out. Its not in any particular order of importance, just the order they come to my head. Its a long weekend, so writing things down helps me relax. Would love to hear yours, so please add them to the comments.

1. If two measures correlate, stop measuing the one that takes more effort. E.g. If story counts correlates to story point forecasts, stop estimating story points and just count.

3. Measure the work, not the worker. Flow of value over how busy people appear. Its also less advantageous to game, giving a more reliable result in the longrun. Measuring (and embarassing) people causes poor data.

4. Look for exceptions, don’t just explain the normal. Find ways to detect exceptions in measures earlier. Trends are more insightful than individual measures for seeing exceptions.

5. Capture at a minimum, 1- the date work was started, 2 – the date it was delivered and 3 – the type of work (so we can see if its normal within the same type of work).

6. Scope Risk play a big role in forecasts. Scope Risks are things that might have to be done, but we aren’t sure yet. Track items that might fail and need reworking, for example server performance criteria or memory usage. Look for ways to detect these earlier and remove. Removing isn’t the goal – knowing if they will definately occur adds more certainty to the forecast.

7. Don’t exclude “outliers” without good reason. Have a rule, for example 10 times the most common value. Often these are multiple other things that haven’t been broken down yet so can’t be ignored.

8. Work often gets split into smaller pieces before delivery. Don’t use the completion rate as the forecast rate for the “un-split” backlog items. Adjust the backlog by this split rate. 1 to 3 times is the most common split rate for software backlogs (but measure your own and fix).

9. If work sits idle for long periods waiting, then don’t expect effort estimates for an items to match calendar delivery time. In these cases, forecast system throughput rather than item sizes (story points).

10. Probabilistic forecasting is easier than most people expect. If average are used to forecast (like traditional burndown charts) then the chance of hitting the date that gives is 50% – a coin toss. Capture historical data, or estimate in ranges, and use that.