“If a metric does not offer predictive power, capturing that metric is waste” (Troy Magennis)

1 – Definitions

For me after a longer journey on the web a simple, speaking and coherent definition for cycle time, WIP, throughput and flow efficiency.

FLOW: Flow is the movement and delivery of customer value through a process – (as we build for the customer) therefore our whole process should be oriented around optimizing flow.

ACTIONABLE METRICS: The set of metrics that will suggest specific interventions that will result in the outcomes you are expecting

ARRIVAL POINT: A specific point where a unit of work transforms from being just some arbitrary idea into being a legitimate work item that is to immediately be acted on and completed.

DEPARTURE POINT: Defined as delivery to an actual end user or delivery to some other downstream team or process

WORK: Any direct or indirect discrete unit of work of customer value is a candidate of work – named as work item (story, epic, feature, requirement, use case, enhancement,…)

WORK IN PROGRESS – WIP: The total count of items currently being worked on; the number of items that we are working on at any given time; all discrete units of customer value that have entered a given process but have not exited.

predictor of over overall system performance

all work items between arrival point and departure point

can be segmented across different types

CYCLE TIME: The amount of elapsed time that a work item spends as Work in Progress.

How long it takes each of those items to get through our process?

How long to complete?

When will it be done?

Can also be used as a predictor of cost

The amount of time it takes to get customer feedback

Unpredictability lies it the time an items spends waiting to be worked on – that’s why it’s the elapsed time that is important.

THROUGHPUT: The amount of WIP completed per unit of time.

How many of those items complete per unit of time?

How many features am I going to get in the next release

Understanding throughput at each step will help to identify the constraints in the workflow – find spots for process improvements

FLOW EFFICIENCY: Ratio of total elapsed time that an item was actively worked on to the total elapsed time that it took for an item to complete.

not actively worked = waiting to be pulled, waiting for feedback

often a starting point of 15% flow efficiency

LITTLE’S LAW:

Average cycle time (CT) = Average WIP (WIP)/Average Throughput (TH)

TH = WIP/CT

WIP = CT*TH

implies that increasing WIP leads to a higher CT and vice versa – check to reduce WIP to increase CT … in order to get stuff done faster, you need to work on less (on average)

Average items in queue = Average Arrival Rate * Average Wait Time

L = lambda * W

L = average number of items in the queuing system

lambda = average number of items arriving per unit of time

W = average wait time in the system for an item

Thanks Daniel S. Vacanti for your explanation!

2 – Cumulative Flow Diagrams (CFD)

offer a concise, coherent visualization of the three metrics of flow (Avg. Cycle Time, WIP, Throughput)

provides qualitative and quantitative insight into problems with flow

shows cumulative process arrivals and departures over time

not a tool for projection (but introspection)

Backlog should not be part of the CFD

they are not committed too (but in the diagram it looks like they are)

it will destroy cycle time calculation

Try to show active and done states (as it shows areas of delays)

Avoid trap of drawing conclusions just by looking at the CFD! It’s a tool to ask the right questions

Important Properties

Top line = cumulative arrivals

Bottom line = cumulative departures

No line can ever decrease! (it’s a cumulative chart) … If it happens the chart is wrong (very likely work items disappeared in the process)

The vertical distance between any two lines is the total amount of work that is in progress between the two workflow steps represented by the two chosen lines

The horizontal distance between any two lines represents the approximate average cycle time for items that finished between the two workflow steps represented by the chosen two lines (average cycle time = bottom line date – top line date + 1)

The data displayed depicts only what has happened (no projections allowed)

The slope (top line) of any line between any two reporting intervals represents the average arrival rate

The slope (bottom line) of any line between any two reporting intervals represents the average departure

Average throughput = rise of average departure / run (Nr. of days)

Necessary assumptions

The average Arrival Rate (Lambda) should equal the average Departure Rate (TH) = we will only start new work at about the same rate that we finish old work

Needs a more late binding (commitment) approach.

Monitor policies around the order in which we pull items through the system – so that work items do not sit and age unnecessarily

all work started will be completed and exit the system

WIP should be roughly the same in the chosen interval

average WIP is neither increasing nor decreasing

CT,WIP,TH are measured using consistent units

Some patterns

Flat lines

check for mismatches of arrival and departure rates (items arrive faster than departure … means increasing WIP over time … leading to an increasing cycle time)

check for flat lines … meaning no departures for a longer time (could be that nothing gets done, but also shows release phases or several public holidays)

Why isn’t anything getting done?

What can we do to get things flowing again?

Stair steps

e.g. with sprints and fixed arrival and departure dates

Can batch periods be reduced? Eliminated? Check impact on cycle time…

Bulging bands

explosion of WIP in a particular workflow step

maybe work is progressing slowly due to poor requirements or poor design?

must not be in workflow step where it appears! Could also be due to a push from a previous step or a blockage in a downstream step(s) … try to separate Active and Done states (queuing) … Done bands should be as thin as possible

Disappearing bands

Reporting interval could be too big (e.g. one week, but work flows rather quickly e.g. one-two days)

upstream variability causes downstream steps to be starved or

team decides to skip one step in the process frequently (you can remove this step from the workflow very likely)

3 – Cycle time scatterplots

work item cycle time data is not a normal distribution! – that’s why applying standard deviation and arithmetic mean is not appropriate (as e.g. done with control charts)

percentile lines are preferable – it’s their robustness in the face of outliers

“If Bill Gates walks into a bar, then on average everyone in the bar is a millionare”

4 – Cycle time histograms

a condensed, spatial view based on the frequency of occurrence of Cycle Times

y-axis are the number of items

x-axis is the cycle time

vertical percentile lines (like in the scatterplot)

in addition to the scatterplot the histogram shows the shape of the data. You can better detect patterns for Cycle Times over a given timeline … it’s a more advanced cycle time analysis

5 – SLAs

Cycle Time Target; Service Level Expectation

it is expressed using a probability to meet a cycle time range

e.g. with a 50% percent probability a work item finishes in 10 days (according to the 50% percentile of all previously finished work items in our system)

can be used as a substitute for many upfront planning and estimation activities

the choice of a teams SLA should be made in close collaboration with their customers

get predictable at an overall system level first … very likely good enough … only optimize for subtypes if really necessary

use the SLA for right sizing items too – SLA as the litmus test for right size of an item for flow through the process

the older a work item gets, the greater chance it has of aging still more

true definition of Agile is – to respond quickly to new information

One of the most common things we do that hinders our predictability is not pay attention to the order in which items are pulled through our process.

Mistakes

set a SLA independent of analyzing cycle time data

set by an external manager

set without close customer collaboration

Classes of Services (CoS)

For all practical purposes, introducing COS is one of the worst things you can do to predictability

CoS – every time you put a policy in place around the order in which you pull something

will introduce variability and unpredictability into the process (e.g. will produce flow debt)

Only introduce if you have operated your process for a while and are confident that CoS is necessary

In his book Daniel S. Vacanti shows via a simulation the terrible effect on predictability of random pulling from queues in combination with having an expedite lane. A cycle time increase from 50 days to 100 days – meaning 100% more time.

FIFS (FIFO) – is the clear winner for cycle time predictability … the further you stray from FIFO, the less predictable you are

Slack

Slack is pretty much the only way to PREDICTABLY deliver in the face of variability introduced by different CoS is to build slack into the system.

6 – Forecasting

a proper forecast includes a date range and a probability

for forecasting a single items use SLAs

do not use Little’s Law and averages (as data is not in a normal distribution)

straight line projections are problematic – they do not communicate a probability