Weighted Shortest Job First

Weighted Shortest Job First (WSJF) is a prioritization model used to sequence jobs (eg., Features, Capabilities, and Epics) to produce maximum economic benefit. In SAFe, WSJF is estimated as the Cost of Delay (CoD) divided by job size.

Agile Release Trains (ARTs) provide an ongoing, continuous flow of work that makes up the Enterprise’s incremental development effort. It avoids the overhead and delays caused by the start-stop-start nature of traditional projects, where authorizations and phase gates control the program and its economics.

While this continuous flow model speeds the delivery of value and keeps the system Lean, priorities must be updated continuously to provide the best economic outcomes. In a flow-based system, job sequencing, rather than theoretical, individual job return on investment, produces the best result. To that end, WSJF is used to prioritize backlogs by calculating the relative CoD and job size (a proxy for the duration). Using WSJF at Program Increment boundaries continuously updates backlog priorities based on user and business value, time factors, risk, opportunity enablement, and effort. WSJF also conveniently and automatically ignores sunk costs, a fundamental principle of Lean economics.

Details

Reinertsen describes a comprehensive model, called weighted shortest job first, for prioritizing jobs based on the economics of product development flow [2]. WSJF is calculated by dividing the Cost of Delay by the duration. Jobs that can deliver the most value (or CoD) in the shortest duration are selected first for implementation. When applied in SAFe, the model supports some additional principles of product development flow, including:

Taking an economic view

Ignoring sunk costs

Making financial choices continuously

Using decision rules to decentralize decision-making and control

If you only quantify one thing, quantify the Cost of Delay

Figure 1 shows the impact of correctly applying Reinertsen’s WSJF. The areas shaded in blue illustrate the total CoD in each case. Doing the weighted shortest job first delivers the best economics.

Note: In his original work [2], as shown in Figure 1, Reinertsen uses the actual values for Cost of Delay and duration whereas SAFe applies relative estimation using a modified Fibonacci sequence, described later in this article.

Calculating the Cost of Delay

User-business value – What is the relative value to the customer or business? Do our users prefer this over that? What is the revenue impact on our business? Is there a potential penalty or other negative impact if we delay?

Time criticality – How does the user/business value decay over time? Is there a fixed deadline? Will they wait for us or move to another solution? Are there Milestones on the critical path impacted by this? What is the current effect on customer satisfaction?

Risk reduction-opportunity enablement value – What else does this do for our business? Does it reduce the risk of this or a future delivery? Is there value in the information we will receive? Will this feature enable new business opportunities?

Since we are in a continuous flow and should have a large enough backlog to choose from, we needn’t worry about the absolute numbers. We can just compare backlog items relative to each other using the modified Fibonacci numbers we use in ‘estimating poker.’ Then the relative CoD is calculated as follows:

Figure 2. Calculating relative CoD

Calculating Job Duration

Next, we need to understand the job duration. That can be pretty difficult to determine, especially early on when we might not know who is going to do the work or the capacity allocation for the teams. Fortunately, we have a proxy available: job size. In systems with fixed resources, job size is a good proxy for the duration. (If I’m the only one mowing my lawn, and the front yard is three times bigger than the backyard, it’s going to take three times longer.) Taking job size, we have a reasonably straightforward calculation for comparing jobs via WSJF, as Figure 3 illustrates.

Figure 3. A formula for WSJF

Then we can create a simple table to compare jobs (three features, in this case), as shown in Figure 4.

Figure 4. A table for calculating WSJF

As with story estimating, we apply a modified Fibonacci sequence to our estimates as it better reflects the range of uncertainty in estimates as size gets bigger. To use the table in Figure 4, the team estimates each feature relative to the others for each of the three components of CoD and then again for the job size. (Note: With relative estimating, you look at one column at a time, set the smallest item to a “one,” and then set the others relative to that.) Then calculate and divide the CoD by job size. The job with the highest WSJF is the next most important job to do.

This model encourages splitting large items into multiple smaller ones so that they can compete against other smaller items. Otherwise, big important jobs might never get done. But that’s just Agile at work. Since the implementation is incremental, whenever a continuing job doesn’t rank well against its peers, then you have likely satisfied that particular requirement sufficiently that you can move on to the next one.

As we have described, another advantage of the model is that it is not necessary to determine the absolute value of any of these numbers. Instead, you only need to rate the components of each item against the other items from the same backlog. Finally, as the backlog estimates should include only the remaining job size, then frequent reprioritization means that the system will automatically ignore sunk costs.

A Note on Job Size as a Proxy for Duration

Caution must be exercised however, as job size does not always make a good proxy for the duration of the WSJF algorithm. For example:

If availability of specialty skills means that a larger job with larger value may be delivered more quickly than would otherwise be the case, then the larger job would be picked because it delivers more value in a shorter period. (If three people are available to mow my front lawn, while I do the back, then these items have about the same duration, but not the same value.)

A small job may have a paucity of resources or dependencies with other jobs that might cause it to actually take longer than a bigger job.

But rarely do we need to worry about these edge cases. If there is some small error in selection, that next important job will make its way to the top of the backlog soon enough.