To now, this blog has been a series of essays on the theoretical considerations underlying the analytics of development. With this entry, I want to start changing the emphasis to the practicalities of building analytic tools. Going from theory to practice raises all kinds of issues: data content and formats, robustness of algorithms, reinforcing agile practices, .... To start that discussion, lets start with an epic on how an analytic tool for agile teams might work:

A lead of an agile team, call her Shirley. has been asked to deliver a mobile application, with a specified set of features, in time for the next world games, which is one year away. Understanding that the future is uncertain, Shirley treats the time to complete as the random variable. Before committing to the project, she needs an initial distribution of the time to complete the project. With such a distribution, she has a view of the probability of achieving the goal. It is the area under the distribution curve that lies to the left of the target date in Figure 1.

Fortunately she has tool called 'ARaVar' to help her build and maintain this distribution. This tool is federated with her OSLC agile project environment, Agilista (a fictional product). To use ARaVar, the team estimates the level of effort required for each feature using planning poker. In particular, for each feature’s level of effort the leadership team agrees on three values to enter in Agilista:

The low (best case) – Assumes all the stars align and the feature comes together easily to meet requirements.

The high (worse case) – Assumes Mr. Murphy S. Law and Ms. May Hem unexpectedly join the team and inject unexpected challenges and obstacles.

The nominal (most likely) – Assumes level of effort has the expected mix of good fortune and bad luck.

Behind the scenes, the ARaVar finds these inputs to in Agilista and uses them to define triangular probability distributions. In particular, AraVar interprets these effort inputs as saying

There is zero probability that the level of effort will be less than the best case.

There is zero probability that the level of effort will be greater than the worst case.

The greatest probability of the level of effort will be at the expected case.

So ARaVar sets the distributions to be zero below the low value and above the high value, with a peak at the expected case. Figure 2 show the resulting triangular distribution, setting the high and low to zero and setting the peak (expected case) so that the total area of the distribution in one.

Figure 2: Typical triangular distribution for each feature.

In the parlance of Bayesian reasoning, this technique provides the subject matter experts a means of arriving at an honest prior, based on current information and informed belief. If the difference between the low and high of the distribution of a feature is large, then the team is expressing its uncertainty of the effort required to deliver the feature. This gives Shirley’s team the opportunity to focus the team on resolving the uncertainties early, progressively de-risking the project.

With this prior estimate in place, Shirley has an idea of how likely it is she can make the commitment and she negotiates the content. What-if analysis in ARaVa provides her with capability to compute the impact of adding, changing or dropping one or more features from the program. Luckily, she does find that one of the relatively uncertain features is more of a nice-to-have than a must-have and adds considerably more risk than value. So she negotiates that feature out of scope for a firmer commitment to an earlier delivery in 11 months as illustrated in Figure 3.

Figure 3: The negotiated delivery commitment: earlier and more predictable.

So Shirley now is in a good place. She has agreement on the scope of the project between her team and her stakeholders. She feels her team has a good chance of delivering on time.

In the Agile fashion, work proceeds by establishing work items to deliver the features. These work items are scheduled for iterations/sprints, on an ongoing basis. As the team completes work items, they not only have less work to complete, but also have a track record of the actual time it takes the team to complete work (called team velocity). From a Bayesian perspective, these constitute important evidence of how well the project is actually executing. ARaVa queries Agilista for the completion status of the features, the work item burndown history, and updated effort-to-complete estimates for the remaining features. ARaVa uses modern predictive algorithms to update the time to complete distribution.

With these ongoing predictions, Shirley can discuss with her team, and external stakeholders, whether the odds of meeting the commitment are improving (as they should) or degrading. If the later is the case, she can use ARaVa to predict the impact of managing content (decommitting features) or adjusting resources. For example, the tool revealed that one feature was very much at risk. In discussion with the stakeholders, it was decided that this feature was necessary and so it was decided that for the next sprint there should be more resources focused on the this feature. Some staff were assigned to the team for just that sprint. With ARaVa, all stakeholders can have a more honest and trustworthy discussion on how best to proceed.

ARaVa does not yet exist, but it is not a dream. IBM Rational and Research are now in the process of developing such a tool for a possible delivery next year. We are calling the project AnDes (for Analytics of Development). AnDes uses state of the art learning algorithms. We do have working versions federated with Rational Team Concert (We did show a preview at last year’s Rational Innovate). In addition to consideration of automating the data collection, we are exploring how it can be applied across a wide range of projects:

In my last blog, I laid out a vision of how a project lead and her stakeholders might use the predictive analytics to drive to better project outcomes. As I mentioned in the entry, IBM Rational is work on such a tool. A demonstration of this tool is found here: Agile Development Analytics Demo. The video was created by Peri Tarr, the lead architect on the project.

Some of you might notice that the terminology and development process described in the demo is at odds with your understanding of Agile. We do understand that and currently we are working on making a robust tool that accommodates a wide range of processes for what might some might call 'pure Agile' to various hybrids we are discovering in the market place.

I am not a natural blogger, but do like to share thoughts with a broad community of folks with shared interests.

My longterm passion is how to enable organizations develop and deliver software and systems more effectively. I am thoroughly convinced that well organized, governed organizations not only deliver value to the businesses, but also enhance the lives of the staff. In such organizations, people get to work on cool things, innovate, work well with colleagues, and build a legacy by being a part of making things of value.I also am a mathematician by training. I am especially energized by the opportunity to apply mathematical reasoning to the improvement of software and system organizations. My current assignment is to lead the Business Analytics and Optimization (BAO) strategy for Rational.

So, with this blog, from time to time, I will share my thoughts on BAO for software and system organizations. I hope this blog will be a catalyst for building a community. I especially look forward to comments and conversations.

An ongoing theme of this blog is that development processes differ from other business processes in that there is a wide range of uncertainty inherent in the efforts. It follows that tracking and steering development efforts entails ongoing predicting, from the evolving project information, when a project is likely to meet its goals.

Late last year, Nate Silver author of the Fivethrityeight blog and well know predictor of elections published The Signal and the Noise, a text for the intelligent layperson on how prediction works. I was impressed by the book as it explained the principles behind the sort of Bayesian analytics we need for development analytics without any explicit math. However, I felt for the folks in our field would greatly benefit by having the mathematical blanks filled in. So I decided to write a series of papers introducing the topics to folks who had some statistics and maybe some calculus in college, but not a solid background in prediction principles.

In the previous entry, I introduced a probabilistic view of a commitment. The main idea is that when you commit to deliver something in a future, you are making a kind of bet. The odds of winning the bet is the fraction of the distribution of the time=to-deliver before the target date. For example, in the following example, the project manager has a 47% likelihood of winning the bet.

Figure 1

The raises a couple of questions. First, how is the distribution of time-to-complete determined? There are variety of methods to estimate time to complete of an effort. I am not taking a position on what method to adopt. The important point is that the estimation method should not return a number but a distribution! The major estimation vendor have this capability even if it not always surfaced. I will expand on this point in the next blog entry. For now, the key point is that you should be working not with point estimations, but with the distributions.

Second is how the project manager affects the shape and position of the distribution and therefore affects the odds. Some of the techniques are intuitive, some not so much, There are two things one might do: move the distribution relative to the target date, and change the shape if distribution typically narrowing it so that more of it .lies within the target date.

In the first, one can either move the target date out, so that the picture looks like this

Figure 2

This is, of course, intuitive - moving out the date lowers the risk. Another intuitive thing a project manager might do is the descope the project - commit to deliver less functionality. This may have two effects on the distribution: It will move it to the left as there will be less work to do. Depending on the difficulty of the descoped feature, the descoping may also narrow the distribution. By removing a difficult to implement feature. one is more certain of delivery, narrowing the distribution, removing risk resulting in this diagram:

Figure 3

Now comes the unintuitive part. Suppose the target date and content are not negotiable. What is a project manager to do then? The idea is to take actions that will narrow the distribution in Figure 1 so that it looks like

Figure 4

How is this done? Many project managers, in the name of making progress choose the easiest functions to implement first, "the low hanging fruit". However, by doing this the shape of the curve in figure in minimally affected, The less intuitive approach, Following the principle of the Ration Unified Process, is to work on the most difficult, riskiest requirements first! These are the requirements of which

the team has the least information and so should tackle first in order to have time to gain the information needed to succeed. Putting off the riskier requirements and doing the easy stuff first gives the appearance of progress, but by putting off the riskier requirements, one will run out time to do the riskier requirements and fail to meet the commitment.

All this has to be while ensuring their is sufficient time to fulfill all the requirements, risky or not. So in the end, one must account for both the time to complete tasks and their uncertainty to meet commitments. Some techniques for doing that will be discussed in the next blog entry.

One of the things that characterizes software or systems development is that the project manager routinely commits to deliver certain functionality on a given date at an agreed-upon level of quality for a given budget. It is the role of the project manager to make good on the commitment, The software and systems organization leadership may count on the commitments being met in order to meet their business commitments or there may be an explicit contract to deliver on time for a fixed budget. The measure of a good project manager is the ability to make and meet commitments.

iIn this blog entry, I will discuss the nature of that commitment and how it relates to project analytics. First of all, lets define 'commitment' in this context. Of course, I do not mean the confinement to a mental institution, I mean, as suggested above, the promise to deliver certain content with acceptable on or before a certain date.

The first thing to notice is that the future is never certain, and so we are in the realm of probability and random variables, i.e. a quantity described by probability distribution. Going forward, I will assume the reader is familiar with the concepts of random variables and their associated distributions . Soon, I will devote a blog entry just that topic.

Meanwhile, the best way to describe the likelihood of meeting a commitment is the use of a random variable. Consider the distribution of the time it will take to meet the commitment. It might look something like this:

A similar distribution would apply to cost to complete.

Recall, the probability then of the commitment being met is the area under the curve that falls before the target date:

The manager, in making the commitment, is essentially betting (perhaps his or her career) that he or she will meet the commitment. According to this measurement, the odds are about 50-50. The key measurement then is the amount area of the random variable that lies prior to the target date, which in turn relies on the the ability to calculate the probability distribution. I also will discuss some techniques to do that in a later entry.

Now consider for example, "project health". What I believe what is meant is the likelihood of meet the commitment to deliver the project on time.
If it highly probable the project will ship at the target date, the project is 'green' otherwise it is 'yellow' or 'red' like in the following figure.

There are three reposes to a yellow or red project. One can move the target date, move the distribution, or change the shape of the distriburion, again a topic for a later bog.