An ongoing theme of this blog is that development processes differ from other business processes in that there is a wide range of uncertainty inherent in the efforts. It follows that tracking and steering development efforts entails ongoing predicting, from the evolving project information, when a project is likely to meet its goals.

Late last year, Nate Silver author of the Fivethrityeight blog and well know predictor of elections published The Signal and the Noise, a text for the intelligent layperson on how prediction works. I was impressed by the book as it explained the principles behind the sort of Bayesian analytics we need for development analytics without any explicit math. However, I felt for the folks in our field would greatly benefit by having the mathematical blanks filled in. So I decided to write a series of papers introducing the topics to folks who had some statistics and maybe some calculus in college, but not a solid background in prediction principles.

To now, this blog has been a series of essays on the theoretical considerations underlying the analytics of development. With this entry, I want to start changing the emphasis to the practicalities of building analytic tools. Going from theory to practice raises all kinds of issues: data content and formats, robustness of algorithms, reinforcing agile practices, .... To start that discussion, lets start with an epic on how an analytic tool for agile teams might work:

A lead of an agile team, call her Shirley. has been asked to deliver a mobile application, with a specified set of features, in time for the next world games, which is one year away. Understanding that the future is uncertain, Shirley treats the time to complete as the random variable. Before committing to the project, she needs an initial distribution of the time to complete the project. With such a distribution, she has a view of the probability of achieving the goal. It is the area under the distribution curve that lies to the left of the target date in Figure 1.

Fortunately she has tool called 'ARaVar' to help her build and maintain this distribution. This tool is federated with her OSLC agile project environment, Agilista (a fictional product). To use ARaVar, the team estimates the level of effort required for each feature using planning poker. In particular, for each feature’s level of effort the leadership team agrees on three values to enter in Agilista:

The low (best case) – Assumes all the stars align and the feature comes together easily to meet requirements.

The high (worse case) – Assumes Mr. Murphy S. Law and Ms. May Hem unexpectedly join the team and inject unexpected challenges and obstacles.

The nominal (most likely) – Assumes level of effort has the expected mix of good fortune and bad luck.

Behind the scenes, the ARaVar finds these inputs to in Agilista and uses them to define triangular probability distributions. In particular, AraVar interprets these effort inputs as saying

There is zero probability that the level of effort will be less than the best case.

There is zero probability that the level of effort will be greater than the worst case.

The greatest probability of the level of effort will be at the expected case.

So ARaVar sets the distributions to be zero below the low value and above the high value, with a peak at the expected case. Figure 2 show the resulting triangular distribution, setting the high and low to zero and setting the peak (expected case) so that the total area of the distribution in one.

Figure 2: Typical triangular distribution for each feature.

In the parlance of Bayesian reasoning, this technique provides the subject matter experts a means of arriving at an honest prior, based on current information and informed belief. If the difference between the low and high of the distribution of a feature is large, then the team is expressing its uncertainty of the effort required to deliver the feature. This gives Shirley’s team the opportunity to focus the team on resolving the uncertainties early, progressively de-risking the project.

With this prior estimate in place, Shirley has an idea of how likely it is she can make the commitment and she negotiates the content. What-if analysis in ARaVa provides her with capability to compute the impact of adding, changing or dropping one or more features from the program. Luckily, she does find that one of the relatively uncertain features is more of a nice-to-have than a must-have and adds considerably more risk than value. So she negotiates that feature out of scope for a firmer commitment to an earlier delivery in 11 months as illustrated in Figure 3.

Figure 3: The negotiated delivery commitment: earlier and more predictable.

So Shirley now is in a good place. She has agreement on the scope of the project between her team and her stakeholders. She feels her team has a good chance of delivering on time.

In the Agile fashion, work proceeds by establishing work items to deliver the features. These work items are scheduled for iterations/sprints, on an ongoing basis. As the team completes work items, they not only have less work to complete, but also have a track record of the actual time it takes the team to complete work (called team velocity). From a Bayesian perspective, these constitute important evidence of how well the project is actually executing. ARaVa queries Agilista for the completion status of the features, the work item burndown history, and updated effort-to-complete estimates for the remaining features. ARaVa uses modern predictive algorithms to update the time to complete distribution.

With these ongoing predictions, Shirley can discuss with her team, and external stakeholders, whether the odds of meeting the commitment are improving (as they should) or degrading. If the later is the case, she can use ARaVa to predict the impact of managing content (decommitting features) or adjusting resources. For example, the tool revealed that one feature was very much at risk. In discussion with the stakeholders, it was decided that this feature was necessary and so it was decided that for the next sprint there should be more resources focused on the this feature. Some staff were assigned to the team for just that sprint. With ARaVa, all stakeholders can have a more honest and trustworthy discussion on how best to proceed.

ARaVa does not yet exist, but it is not a dream. IBM Rational and Research are now in the process of developing such a tool for a possible delivery next year. We are calling the project AnDes (for Analytics of Development). AnDes uses state of the art learning algorithms. We do have working versions federated with Rational Team Concert (We did show a preview at last year’s Rational Innovate). In addition to consideration of automating the data collection, we are exploring how it can be applied across a wide range of projects:

I am not a natural blogger, but do like to share thoughts with a broad community of folks with shared interests.

My longterm passion is how to enable organizations develop and deliver software and systems more effectively. I am thoroughly convinced that well organized, governed organizations not only deliver value to the businesses, but also enhance the lives of the staff. In such organizations, people get to work on cool things, innovate, work well with colleagues, and build a legacy by being a part of making things of value.I also am a mathematician by training. I am especially energized by the opportunity to apply mathematical reasoning to the improvement of software and system organizations. My current assignment is to lead the Business Analytics and Optimization (BAO) strategy for Rational.

So, with this blog, from time to time, I will share my thoughts on BAO for software and system organizations. I hope this blog will be a catalyst for building a community. I especially look forward to comments and conversations.