I just release the Traffic Light Simulator. This spreadsheet shows the eventual travel time for cars traveling along a commute to work that has five traffic signals. Playing and varying the probability of hitting each light, and the delay time incurred if that happens, the simulator shows a histogram of how many cars achieve the allowable travel times. It also shows a detailed map of all of the possible situations cars may encounter. The lucky ones get all green lights, the unlucky get all red.

Set the probability of up to 5 delays and see the impact on the cycle-time (travel time) distribution

Understand how many cars are impacted by red and green traffic signals, and how this plays out into different probabilities

Exercises to learn how different process decisions about delays impacts cycle time

Many real world processes, like car travel, follow this general pattern. A general amount of expected time if things go well, plus a number of possible delays. Possible delays are expressed as probabilities, 0% never, to 100% always. Software development work is one such process. Work we undertake has a hands-on time, plus a number of possible circumstances that slow that work down. By understanding how delays cascade into eventual cycle time, we can make smarter decisions about what improvement ideas are more likely to work than others.

This is an active area of exploration for me in 2017. My hypothesis is that given just the evident cycle time distribution currently exhibited by teams, the process can be identified. This spreadsheet has five other hypotheses, and I’m interested in hearing reasons why they are wrong.

For now, I’m just starting to fill the spreadsheet with interesting exercises. There are two at the moment. One gets you to find the delay probabilities that cause an Exponential distribution common to service and operations teams. The second exercise gets you to define delay probabilities that causes a moderate Weibull distribution common to software development teams.

Exponential style distribution – common for operations and support teams

Weibull style distribution – common for software development teams

Learning why cycle time distributions match either Exponential or a skewed Weibull gives solid evidence what improvement factors might work. For example, If the distribution looks skewed Weibull, it’s likely that the estimates of effort WILL NOT correlate to the cycle time. This is because the process is dominated by delays, and the amount of time spent actually hands-on the work in minor in comparison to idle time. Solving the reasons for the delays is the best avenue for improvement. If the current cycle time distribution is Exponential, then work time dominates the cycle time. Automation and more people are the better ways to decrease cycle time and throughput in these cases.

It seems common knowledge that measuring teams and presenting data is an Agile dysfunction. I disagree. But can see and have participated in abusive metric relationships in the past. I think we need to discuss better ways of achieving an evidence based Agile approach; without those participating feeling (or being) abused.

Here are my top five list of traits that make metric dashboards useful –

Measure competing things – its relatively easy to game a single metric, so its important to measure the impact of moving one metric by showing the others. Help teams target moving one metric and observe any negative impacts on others.

Make informed and smart trades – trading something the team is better than other teams in similar circumstance for something they desire to improve. Help teams identify what metric category they could trade (be less good) to raise another metric (become better).

Trends not numbers are important – observing unintended drifting over time of metric averages. Its about understanding something has changed, not how good or bad. Help teams react earlier to often slow moving regression in a metric or two. Less effort in correction the earlier it is detected.

Look for global or local trends – Comparing trends across teams is key to spotting system level opportunities (every team is impacted) versus single team opportunities. Help teams target improving things they can do without fighting a system level factors they are unlikely to solve.

No team will be good at everything – If a team is adversely trending on one metric, point out they are above average on another. Pick competing metrics so that no team will be great or terrible at all of them. There will always be a mix.

This list borrows heavily from the work of Larry Maccherone who correctly observed that a balanced blend of metric types gives the most information for identifying trends and improvement opportunities. His advice is to measure at least one thing from four broad areas –

How much

How well

How responsive

How repeatable or reliably

An implementation of this was recently made available in spreadsheet form. Driven from work item Start date, Completed date and Type, the spreadsheet builds a dashboard page in Excel. The choice of the four metrics was somewhat from experience, and there are plenty of alternative that might fit your context better. The advice stands though, pick a metric from the four areas.

To help roll out the dashboard, we have created a single page cheat-sheet to educate the teams on what each metric means and what to expect if that metric is overdriven. The goal is to stable in all four, not excessively good at any one.

We put a lot of effort into our free spreadsheet tools. We want our spreadsheets to be usable by anyone who needs an answer to their questions. Some good practices have become common in our tools and we want to share them with you as well.

No Macros or Add-Ins

A lot of what we do would be easier if we used Excel macros or add-ins. We resist the urge to do this. By including macros or add-ins we are shipping code and Excel gives all manner of warnings to the user that the spreadsheet may be a security risk. It also inhibits the spreadsheets being compatible on Google Sheets. We haven’t needed to use macros, and compete feature for feature against many paid add-ins. We are extremely proud to be performing Monte Carlo simulation and distribution fitting algorithms using plain formulas that YOU can see. Nothing is hidden. But gee, a lot of sleepless nights in this goal…

Title and Read me worksheet

Always have a title page worksheet that describes the spreadsheets intention, and who and how to contact us with ideas for improving them. Document the main steps in getting a result. Some of our spreadsheets are four or five worksheets deep and we want to avoid people becoming immediately lost in the appropriate workflow.

Data entry is by offset reference

Initial versions of our spreadsheets had a column for user data to be populated. What we found we needed was the ability to copy, paste, drag, delete, manipulate this data in the spreadsheet directly. However, whenever a column or cell was moved, our formulas referencing those cells broke. We solved this problem by using an indirect reference to “duplicate” the user data on another worksheet using Excel’s INDEX function, for example:

INDEX(OriginalCompletedDateValues,ROW(A2))

This formula references another cell by row number, and the row number doesn’t change with any user action you can do with the clipboard, dragging or manipulation. We write all formulas against the indirect copy of the original data in a worksheet commonly called “Calculations”. We copy down the formula as shown, where the A2 will be a series, A3, A4, A5 …

We standardized on a worksheet called “Your Data” which no formula every directly references. We find ourselves dragging columns of different data into this sheet and everything keeps just working.

Column References and Capacity Growth

No matter how big we make the spreadsheets support input data, someone always emails wanting more. We structure our formulas now to limit the places where an absolute row count is needed. Here are the ways we tackle this –

Use column references A:A rather than A2:A5000 in formulas wherever possible

Use names ranges, Formulas-Name Manager to define names for ranges. We use the names in formulas where possible

When we need to handle a date range, we do 250 weeks or 5,000 day dates individually

We always set the background color of a formula cell to the calculation cell type, so we can visually see where we stopped copying a formula row down. We also try and put a comment at that point saying how the user can expand it.

Top Left to Bottom Right Flow

We design our worksheets to be read and populated from the top left to bottom right. We also try and number each input sell title, and add a tip nearby. When we get a question we try and add more documentation to these tips. We would love feedback on where people get stuck. This is pretty standard user design guidelines.

Auto Chart Axis Growth & Shrink

A very hidden feature we utilize is how the charts automatically grow and shrink the axis values to match your specific data. By default, you specify a pre-determined range for chart data, but we don’t know in advance how many days or weeks your data contains. To auto-grow/shrink, we use a named cell range that starts at the first row of data for the column being chosen, and stops at the last row with valid data (not blank or zero is the normal rule). We bind this named range to the chart axis, and Excel takes care of the rest. For example, for cycle time a range formula CycleTimeRange is defined as –

=OFFSET(Calculations!$L$2,0,0,COUNTIF(Calculations!$A:$A,">0"))

This decodes to a range Calculations!L2:Lx where x is the last row before any zero date (excel counts empty dates as a zero integer value, we just know this by experimentation). In any chart data series, you reference this range like this (it has to be fully qualified) –

This technique allows us to handle any amount of user data and have the charts auto-resize. Its a little cumbersome to get working, but works great one you get it right.

Documenting Complex Formulas

On our more complex calculation sheets we try and add a not about each formula and how it works. Nothing is hidden in the formula bar, but some aren’t even clear to us after we have written and debugged them the first time.

Notes document formulas in calculation sheets.

We continually learn new things, and will post more tips over time. Please give us feedback on what we can do to make these spreadsheets easier to use.

Forecast what features will make a release deadline and experiment with scope and start order. Do this early in the planning process to make better decisions about feature scope and start order by showing what will make a target delivery date with a simple tick (yes) or cross (no) for a set of desired features. Often referred to as the cut-line, this spreadsheet shows what will make the cut and what won’t. Often seeing what won’t make a release helps everyone look harder at whether the order matches business need.

Although we have kept this spreadsheet simple on the input side, it performs a Monte Carlo simulation for each feature (1,000 hypothetical deliveries) just like the Single Feature Forecaster spreadsheet. This spreadsheet uses the completion date for each feature as the start date for the next. It then shows which features will be delivered on or before a target date.

The Forecast Inputs

1. Start date: The calendar date where work will actually begin for the first feature. This shouldn’t be the “planned” start date, it should be the date when you are sure work will start. Look at other project that might be delayed or other reasons key staff might be unavailable to start when deciding what this date should be. We find the start date is often overlooked as an easy estimate to make, its not. Its one of the few that involve thinking about company wide factors rather than just the project or feature work and the people in the room

2. Target Date – This is the calendar date that decides whether to show a tick or cross. If the likelihood of a feature delivery date (set by 3. Likelihood) is on or before this date, that feature is said to make the deliverable and adorned with a green tick-mark. Otherwise it will be given a red-cross mark. This should be the date that code is tested, packaged and ready to enter any final deployment pipeline. Often this date is before the product is released to market, its a hand-off date to operations.

4. Split Rate – We find that a big cause of forecast accuracy is using a delivery rate (throughput or velocity) based on work that has been split into multiple parts when started. It makes the team appear to be delivering faster than it is. Our most commonly seen settings (and suggested if no actual data exists) is a low estimate of 1 (not split) and a high estimate of 3 (a backlog item is split into three parts for delivery). Check if the throughput and velocity estimate or data includes defects. If it does, bump these estimates up to model the most common defect rate rate range so each item embeds its allocation of defects. For example, if every piece of work get one defect, start with a low estimate of 2 (1 for the work + 1 for a defect).

5. Throughput Estimate or Data – The throughput estimate is where you express the delivery pace of the team. This is the rate that work is delivered in the same units estimated for each feature. Velocity in points or throughput (count of items completed over time) can be used, the only requirement is that the same units be used for item estimates. You also have the choice here of whether to use a low and high bound guess, or to use historical data (where samples are placed in the Throughput worksheet. Our advice is to start with a wide estimate (spread out low and high value). What is important is the actual values of throughput and velocity found out through delivery fall within this estimate, not to have a narrow estimate. When real data starts to flow in from the team, switch over to using it instead of the guess. Normally about 11 samples are needed before i’m comfortable changing to use data. Try both, and see when the forecasts begin to agree, then use the data samples.

7. The Features and Estimates – This is the part of the spreadsheet where you describe the feature (by name) and get a low and high guess of the amount of work for each feature. Similar guidance here to entering the throughput estimate, start wide and narrow as real data and lessons allows. Remember to consider the split rate in input 4. Don’t double count splitting or defect inputs. Choose either to account for scope creep and defects in these estimates or through the splitting estimates – be consistent and explicit! The first column, start order is for convenience. Sometimes when some features are going to miss a release date, experimenting by changing the start order gets the “agreed” features the best chance of making the cut. Its provided here so that immediate results are shown during discussions and meetings (we used to cut and paste rows, but it was slow and error prone, so we added the ability to control order by column A).

8. Month Adjustments – We found some major holiday events, seasons, or company conferences made significant differences to throughput rate. These inputs allow the throughput or velocity estimates to be adjusted depending on which month work falls. The spreadsheet multiplies its random pick for delivery pace by the multiplier you enter for each month if work falls in that month. Its fine tuning mostly. In Europe we find summer pace slows. In the US we find less fluctuation. One key insight we see if a companies vacation policy matters. For companies that have a “use it or lose it” policy for vacation time, the last month (often December) is a fraction of other months for throughput. Ask and look for reasons one month might fluctuate. Re-orgs, team hire and ramp up, conventions and conferences are some of our common reasons for using these adjustments.

The Forecast Results

The Forecast results are shown to the right of each feature input. Its designed to be simple enough your manager can understand it. We show the start date because its important that any date we show in the table assumes this is the start date; miss that date, then the forecast is incorrect.

Given the start order in column A for each feature, a green tick means on or before the Target date (given the likelihood percentage confidence entered as input 3, 85% in this case). A red cross means it will miss by at least a week (or 2 weeks depending on the time interval entered in the throughput input as part of input 5.). An orange exclamation mark means within one throughput period (on the fence). We encourage our stakeholders to look at what is going to miss and offer to trade something that is going to be delivered. This experimentation is easy by changing the start order in the column A adjacent to each feature. The start orders must be non-duplicated, ascending order from 1 to 10 (errors are shown if you get this wrong).

Conclusion

This tools aims to help early and vigorous discussion about scope and start order of multiple features competing for the same delivery team. Use it to help others see likely outcomes and make hard decisions earlier. Although we have more complex forecasting tools, this is often the right tool for early exploratory discussions about what resources are needed and what set of features will make the cut given teams historical pace of delivery.

This spreadsheet helps forecast how long a single feature or single project will take to deliver using agile (scrum or kanban) using Monte Carlo simulation. We have to balance keeping the spreadsheet light to use and easy to understand whilst capturing the most likely factors that cause feature or project blowout.

We have made our Single Feature Forecaster spreadsheet available for free for over six months, and get a lot of feedback and questions about how it works and how to use it. This post tries to answer how to use it. Future post will cover how it works in detail.

(uses no macros – all formulas, requires Microsoft Excel or Google sheets (but it is slow))

Single Feature Agile Forecasting Inputs

Forecast a single feature spreadsheet inputs

1. Start date – seems relatively straightforward, but don’t be fooled. Poor choice of start date is a major factor in poor forecast accuracy. Often the planned start date is missed, staff aren’t moved from prior projects, or a final decision to begin doesn’t arrive in time. Be pragmatic. Too optimistic or pessimistic is will cause errors. Consider other projects that may run long, these might impact on the start date for this feature or project.

2. How Many Stories – this is the base amount of work to be performed for this feature or project. Its is a range estimate, and our advice is to keep it wide. Monte Carlo works best when it is more certain that the actual final range is within the estimate, so making this range narrow increases the risk of a miss. Include any item count that is included in your throughput samples of estimates. If your throughput includes defects, then estimate defects. Our advice is to NOT include defects, to just include work contributing to the features. Also, don’t include risks, these are best managed separately (in the Risk tab).

3. Story Split Rate – this range estimate wasn’t in our first release. What we encountered is the mistake (that all of our competitors make) when historical throughput data is used to forecast. Items in the backlog are often split before being delivered. If the throughput count is post-split (items are split into multiple parts or grow in scope), the forecast will be out by factors of up the three times (if stories split on average into three parts, which is common in our experience). This is a significant miss, so we added it to the spreadsheet. Our advice is to split low estimate 1, and high estimate 3 until you have more accurate data.

4. Throughput – The rate that work will be delivered. The only requirement is that it is in the same units as the story estimate, whether that be story points or count. There are two ways to give this estimate. Use a low and high range guess if no historical data is available, or historical data if it is available. If a range estimate is used, keep it wide. You are estimating the rate that the story estimate given in input 2 and 3. If that includes defects, then this estimate should as well. We find it easier to NOT include defects in either the throughput or story estimate. Start wide pay particular attention to the low estimate and be realistic and use the lowest you have seen (don’t be afraid of 0 or 1 until you get more information). Don’t be to pessimistic on the high bound. Go one higher than you expect or have seen. Remember to set whether your estimate is per week or two weeks. We like to keep data per-week. If you choose to use historical data, enter your samples in the Throughput Samples tab and choose “Data” from the drop-down list.

The Results Table

Forecast Single Feature Results Table

Forecast results are calculated continuously. We have tried to suggest that 85th percentile and above is safe territory. What we have found traditional estimation methods relay on averages. Although not exact, the 50th percentile often matches what is expected and although (much) more realistic, the 85th percentile is a shock. Be ready to defend the process used. To help, we have added some charts that pictorially help explain how it works.

Forecast Result Charts

The lightning burndown chart shows the first fifty simulated project burn-downs. When explaining the process used to others, explain that this spreadsheet hypothetically complete the project 1,000 times given the estimates you provide. 85th percentile will be the date that 850 of the simulations completed on or before. Go onto explain that traditional methods would give a date closer to where only 500 of these simulated runs would have completed.

Forecast Single Feature Result Histogram

The other chart displays the histogram of results, up to 20 segments maximum. This is similar to the results table, but we find people just want a date and find this histogram too technical to interpret. When multiple risks are entered into the risk tab, this histogram can get very complex. We will go into much deeper detail in a future post on how to capture project risk factors.

Conclusion

We use this same spreadsheet when doing complex analysis. For us its a great first step to getting to a reliable and defensible forecast outcome and allows us to play with the scope and risk factors to find a close solution. We have more complex tools, but sometimes this answers the main question asked – is it possible.

Errors in formulas in complex spreadsheets can be hard to detect. After a few compound formulas (one formula depends on the output of another), the permutations can be large to enumerate. Traditional software development uses unit testing to help initially write code by initially failing, then passing as code is written, and to confirm code continues to work after re-factoring or seemingly unrelated changes are made. Unit testing spreadsheets seems to be just as valuable.

Our Multiple Feature Cut Line Forecaster spreadsheet is complex It doesn’t use any macros (VB code), it is all formula based. Its complexity is in the detailed formulas and the cascading of one result as an input to another formula. We needed to test we got this right.

We decided to do the lightest test possible and use Excel’s built-in Scenario Manager. The Scenario Manager allows values to be changed in tandem on a worksheet. We pre-built sets of inputs and tested the calculation results against a “human tested” result.

How it works:

1. We defined scenarios that set input fields to know states
2. We defined test rows that compare the current calculated results against an expected result we determined by hand
3. We show a pass or fail for each test row only when the correct scenario is shown

We were happy with the result. Our tests found errors we had missed, and we occasionally make changes that impact the current tests. Its not ideal, but the effort was light for the confidence we have in our tools. Worth the effort.

Excel Challenges:

1. Scenarios are hard-coded to a limit of 32 input changes per scenario. We often had more. To solve this some of the tests need multiple scenarios to be applied sequentially to change all the values we needed. A pain, but not a show-stopper.

2. The scenarios can’t make changes across sheets. This means that our unit tests are on the bottom of the Forecast tab. We would like it elsewhere, but this limitation means it has to just be out of sight (rows 50 onwards, at least off the first screen unless you scroll down)

3. No test runner. We have an aversion to macro’s. This means we need scenarios to be shown using the Scenario Manager by hand. Changing the scenario will cause some tests to fail and others to pass. This looked messy. Our fix was to target the pass/fail indicator against a scenario. To achieve this each scenario sets a field value (1, 2, 3, 10, etc). Each test row indicated which scenarios that row targets. This way, we avoid showing an alarming fail when testing another scenario.