Marco Tedone's blog

Project Management

Tuesday, 11 June 2013

So far I've described the components that form the Revenues part of the PILS formula. In this and subsequent posts I'll describe the components that form the MTB Costs side, starting with the Cost of Production bugs (CPB).

Measuring the costs of fixing production bugs is important because CPB reduces the overall profitability of IT live systems and it's an indicator of the IT deliverables quality.

For these reasons, our goal as IT leaders should be to reduce CPB. How could we go about it? In order to answer this question we first need to identify the causes that lead to a high number of production bugs.

Production bugs are a direct function of the quality of production deliveries. So we need to understand why IT generates poor quality products. Some of the most common reasons are:

Focus on the wrong targets. Too many teams focus only on delivering to production without paying too much attention to the quality of what's delivered. More specifically, speed over quality is often the path chosen by many IT managers and, subsequently, developers.

Too much work in progress. The more functionalities are delivered into production at any single time, the higher the chances of introducing bugs. This is advocated at large also by Kanban practitioners: one of the key actions in Kanban is to reduce Work In Progress (WIP)

Lack of (automated) testing. The lack of testing is not only a symptom of the lack of a safety net when applying changes to our systems, but also of a lack of good design. If we don't secure our IT systems with an automated test suite, we increase the chances of introducing new bugs every time we add or change a feature.

Inadequate IT methodology for requirements gathering. Partially related to too much WIP, if we try to gather all requirements up-front because, say, we're using a pre-emptive methodology such as Waterfall, the risk of misunderstanding the business requirements, therefore introducing production bugs as functionality gaps, highens. This is also known as the problem of Early Commitment; because the project needs to move from one SDLC phase to the next and in order to do so it needs a sign-off, in pre-emptive methodologies stakeholders at various stages of the SDLC are asked to commit early.

Lack of development best practices. A wake-up call mainly introduced by XP, best development practices aim at delivering quality products. Amongst them we find Test Driven Development (TDD), Clean Code, Continuous Delivery, Continuous Integration environments, the use of Source Code Management (SCM) tools and DevOps. Sometimes developers are simply not aware of best development practices and although this lack of knowledge doesn't automatically introduce production bugs, the use of best development practices is widely recognised as one of the main tools to increase the quality of production deliveries.

Before identifying how we, as IT leaders, can help increase the quality of production deliveries and therefore reduce production bugs, I'd like to take a brief detour and state what for many will be obvious. Why are production bugs costly?

The following graph might illustrate why:

Numerous research showed how the cost of defects increases exponentially as we move along the timeline in the Software Development Life Cycle (SDLC).

A defect found early in the development lifecycle is significantly cheaper to deal with than a bug found in production. The knowledge surrounding the issue is fresh in the mind of those who developed the functionality and, if found before hitting production, the fix doesn't need to go through a production release, which usually involves considerable overhead and related costs.

If a defect is found during development, there aren't any additional infrastructural costs (project ceremony costs) involved in fixing it, other than the time required to write a failing test and the subsequent fix.

If a defect is found after a product has been deployed to production, the knowledge on that product is not fresh in the developer's mind. Depending on the code quality, finding the root cause might be quick or might take a significant amount of time. However, even if found quickly, the ceremony associated with setting up the environment for a fix takes significant time and therefore costs money. Typically, when fixing a production bug, the development environment needs to be setup, the fix needs to be developed then deployed to QA for QA sign-off and to UAT for business sign-off and finally, it is deployed to production (with all the bureaucracy that this requires).

Production bugs indicates also hidden costs: in those organisation without a team dedicated to production bug fixes, someone has to stop working on business valuable deliverables to fix malfunctions. Where a dedicated team is available (what in my book on <ALT+F> I describe as Maintain The Business - MTB team) an IT organisation is paying development and staff costs to fix what should have worked in the first place.

Because production bugs are costly, when possible often organisations tend to adopt workarounds.

Workarounds don't remove the MTB costs associated with production bugs; they defer them indefinitely, therefore contributing to a continuous cash outflow to implement them.

Let's think for a moment what happens with workarounds. When an issue occurs in production, a user typically flags it with production support. Depending on the maturity and size of the organisation, this triggers a whole series of activities. In small organisations, it may be a phone call to the IT manager's blackberry; in enterprise organisations, an incident might be raised through an electronic system and flash messages sent to emails and blackberries of various interested stakeholders. The people in first line support, who are probably the first port of call, will either rely on memory to remember whether this is a recurring issue, or, in the best of cases, will scan their knowledge base to check whether this problem has occurred before.

In the best case scenario, they'll have a procedure to follow for implementing the workaround; by experience, this typically consists in raising an incident, running some SQL scripts in the UAT environment to simulate the incident, checking whether the fix worked, and finally applying it to production, only being 1,000 times more careful than UAT as this is production after all. Eventually the issue is fixed, the user notified, the incident closed and business is back to normal...Until the next, identical production issue occurs.

In the worse case scenario, nobody remembers seeing this problem before and there's no knowledge base, therefore 2nd or 3rd line support (typically development) will need to jump out of bed, connect to the office and investigate the problem. Depending on the type of organisation, the pressure can be as low as "Don't worry, we can fix this tomorrow morning" or as high as "Fix the damn thing, we're losing money!". One might argue that in the latter case, the business would probably have opted for a permanent fix, true, but that's not always the case. Once the poor unfortunate 3rd line support developer finds the problem after a few hours of debugging, they notice a line of code, buried deep in some nested function, with a small comment on it: "//This is a known bug - the business chose a workaround. Run the SQL script documented at http://ourbizwiki.com/workarounds/wknds-171.htm", at which point they cry both tears of joy because there's a solution to the problem, and tears of rage as they could have slept a few hours more.

Let's analyse for a moment what happens in both the best and worse case scenarios: for a recurring issue, a few people had to use their time to fix it, maybe some people had to jump out of bed, maybe the company lost some money. The obvious question is: wasn't this avoidable? The obvious answer is, yes, it was, it just needed a permanent fix.

OK, so far we've ascertained that production bugs are costly and workarounds are ineffective from a cost perspective. They also represent a cost in terms of social capital. Both sides of the fence, i.e. the business and developers, won't be happy in an organisation experiencing a high number of production bugs. People will eventually get tired and leave.

For all the reasons above, one of our priorities as IT leaders should be that of increasing the quality of what our IT teams deliver, by choosing an IT methodology that enhances the ability to delay commitment and by placing the business at the centre of the process. So, how do we go about it?

In my book I suggest a possible approach that consists of triggering a transformation strategy within the organisation. Such transformation operates on two levels: at the operational level, i.e. IT, we can adopt ScruXBan, an IT methodology which combines Scrum, XP and Kanban. I describe this methodology in detail in two of my posts (Google ScruXBan).

At the organisational level, we need to lead a cultural shift to educate the business and IT to work in Agile and Lean environments.

If we apply the strategy right, we'll then solve all the problems described above:

Focus on the wrong target. ScruXBan promotes the focus on quality as a pre-requisite for speed.

Too much work in progress. The Kanban side of ScruXBan leads to a reduction of WIP.

Lack of (automated testing). The XP side of ScruXBan introduces development best practices such as TDD and the importance of an automated test suite as both a good API design tool and a safety net for refactoring and exploring activities.

Inadequate IT methodology for requirements gathering. ScruXBan is the combination of Agile (Scrum, XP) and Lean (Kanban) methodologies which are known for adopting an Iterative and Incremental Delivery (IID) approach. This ultimately boils down to gathering in detail and delivering only the highest priority requirements in a continuous delivery cycle. ScruXBan eliminates gates, thus frees stakeholders at all levels of the SDLC from the problem of Early commitment.

The <ALT+F> framework suggests a simple template (available here) to record CPB.

The template is just a guideline. Ideally, you would have access to some automated tool to extract the data automatically. The key ideas when recording CPB are to keep track of the hours, location and category of each production bug.

Hours and locations (i.e. on/off shore) allow us to keep track of costs. Categories allow us to identify the legacy systems currently in the worse shape and how much we're spending on them, eventually allowing us to provide the business with a business case for a long term solution.

Thursday, 06 June 2013

In my previous article I started describing the various PILS formula components, by describing PBV (Perceived Business Value).

In this article I will describe the other PILS formula component that represents the Revenues side of the formula: The Business Value of a CTB Project (BVCP).

In order to understand BVCP one needs to understand what is a CTB project.

In my book on <ALT+F> I divide projects into two categories when it comes to BVCP:

Change The Business (CTB)

Maintain The Business (MTB)

CTB projects are new IT projects that the business see as revenue generators and/or games changes. Examples might be a new trading system, a new game, a new social network platform, etc. By new I mean that CTB projects either haven't yet been deployed to production or they have been recently rolled out to production but are still under the warranty period.

MTB projects are all other projects types. Amongst them we find projects related to all IT systems that have been live for a period long enough to pass their warranty period (legacy systems), evergreening projects and production bug fixes. MTB components will be the subject of subsequent articles.

BVCP is part of the Revenues side of the PILS formula. The business has a revenue generating or game changing idea and therefore estimates a business value associated with the delivery of such idea into production. IT has to implement this idea and in doing so it sustains costs (typically development and infrastructure). Similarly to what I described in my article on PBV, there is a guessing dimension in defining BVCP. Since we've said that CTB projects are not officially live yet, the business value they deliver can only be estimated. However, because of their nature, this should be a positive value (i.e. generate revenue), otherwise why bother to work on it in the first place?

CTB projects are the riskiest kind, as although the business value they'll contribute to an oganisation once deployed into production has only been guessed, the costs sustained to deliver them are real.

There is something else: once a project passes its warranty period in production, it becomes a legacy system, therefore the revenue it generates becomes part of PBV, whereas the costs for its maintainance (be production bug fixes, small enhancements or evergreening) move to the MTB section of the PILS formula.

In subsequent articles I'll describe development costs (DECOPD) and infrastructure costs (KTLO) but for now we should take a brief detour on infrastructure costs (Keeps The Lights On - KTLO).

In the PILS formula, we consider KTLO costs as part of the MTB costs, so why are they also considered in calculating BVCP?

The answer is that we can look at out IT systems profitability from two points of view:

Management accounts. This is a view on numbers that makes sense for directors

Statutory accounts. This is the view on numbers given to the public and auditing companies

Similarly, we can look at BVCP from a Management accounts or Statutory Accounts perspective, depending on our goal.

From a Management accounts point of view, we want to keep KTLO costs into consideration, as these generally represent a high percentage of the costs sustained in delivering a CTB project into production. From a Statutory accounts point of view, we shouldn't deduct KTLO costs from BVCP as all infrastructure costs are calculated as part of the MTB costs.

BVCP therefore represents the revenue an IT system will generate for the period it has been in production under its warranty period. After such period, the generated revenue will be calculated as part of PBV.

Despite its uncertain nature due to the uncertainty concerning the real revenue that a CTB project will generate once deployed into production, BVCP is fundamental in driving an IT organisation strategy. For example, no PBV could be generated if no CTB project had been implemented in the first place.

Think of an IT organisation and ask yourself why developers are there. Is it just to maintain legacy systems ? Obviously not. In any IT organisation with some budget, developers are mainly tasked with the delivery of new business ideas through new IT systems. Once deployed to production, these systems will first go through a warranty period (the duration of which depends from organisation to organisation) and then they'll transform into legacy systems.

So, if BVCP is so important for an organisation, how do we go about optimising it? The answer appears somewhat obvious:

Increase the estimated business value delivered by the CTB project.

Reduce development costs (DECOPD)

Reduce infrastructure costs (KTLO)

The first option is also the most difficult, because it concerns the realm of ideas. We know that ideas can't be manufactured like in a production line; in the end this is why there are so few people having great ideas and they are paid so handsomly.

Something, however, can be done in the other two areas: development and infrastructure costs and this will be the topic of subsequent articles. Stay tuned!

Monday, 03 June 2013

In this article, I'll describe the first of the PILS formula components: PBV (Perceived Business Value).

As a reminder, this is the PILS formula:

PBV appears on the revenues side of the formula. It represents the business value the business perceives IT systems currently live are contributing to an organisation's revenues.

Why perceived?

When we think of IT systems, we can identify two categories:

Those that directly generate revenue (e.g. trading systems)

Those that provide a service (e.g. HR applications)

When it comes to the first category, it's easy to identify the business value they deliver: thinking of a trading system, for instance, it's quite straight forward identifying how much revenue it's generating.

For IT systems that provide a service, however, the sitatuion is more complicated. How would one quantify the business value delivered by a family of admin screens necessary to manage live applications? That's why I talk of perceived business value. For service-like systems, the business can only estimate their contribution to an organisation's profitability.

One might ask why do we need to consider revenues for service-like systems. The answer is that regardless of whether their business value can be empirically measured, these are fully flagged IT systems, therefore subject to the same costs sustained for those IT systems whose business value can be measured precisely.

If we want to measure IT OPS performance, we cannot prescind from service-like systems, since an organisation spent money to build them. So the question is how can we attribute a business value to service like systems.

First of all, such value must be provided by the business. The approach I suggest is to ask the business to express as a percentage how important an IT service system is for an organisation.

Let's take as an example a website that provides a family of admin screens which provide users with the ability to change the behaviour of a trading system at runtime. We know that the website can be identified as a service system because it doesn't directly produce revenue. So the key question is: how important is this system for the organisation?

The answer might vary depending on whom you're asking. For the sake of our example, let's say that the traders told us that if this system didn't exist, they could still be operative, but it would slow down their activity by 30%, because every time they would need a change, IT should raise a production incident, write some SQL to change the trading system behaviour, test it in UAT, apply it to production, restart the servers, etc.

If such trading system generated £100M in revenues each day, the administrative website PBV would then be £30M. This is an example of an IT system providing a service to another IT system. There are cases when a service system provides a service to an entire organisation.

Let's consider a centralised build system for example. Such system doesn't produce direct revenue, however interviewing staff from the COO team it turns out that such system decreases operational risk by 40%, which is why the system was requested in the first place. IT staff instead tell us that thanks to the centralised build system they can reduce their Time To Market (TTM) by 30%.

We should present the business with these figures and ask them in what measure they think this system is important for the organisation as a whole. The answer won't always be black or white, but it's important to get one.

If, say, the business estimated the centralised build system to contribute to the overall organisation's profitability in a measure of 5% and the overall profitability was £200M, then the perceived business value for such system would be £10M or, in other words, how much money the organisation would loose if such system wasn't in place. Thinking of operational risk, the business might attribute a value to the company's reputation with its clients. Thinking of the time saved by IT, the business might translate the use of such system as direct cost saving.

Tuesday, 14 May 2013

In my opening article I introduced the <ALT+F> framework as a tool to help IT leaders to streamline IT Operations. I discussed how the framework moves withing the MAPE (Measure, Adapt, Plan and Execute) lifecycle.

In this article I'll describe ALTPD (Average Lead Time of a Production Delivery). This is one of the most important measurements in the Measure phase of the MAPE lifecycle.

In a nutshell, ALTPD defines the average number of days that it takes a requirement to be delivered to production from the day it was requested by the business. Why is important?

In order to answer this question we need to understand the importance of delivering fast.

When the business asks for a requirement it means that probably what they are asking is meant to generate some kind of business value. In the majority of cases, we know that the value of a business requirement is higher the quickets it reaches the market (Time To Market). Therefore I assume that in an ideal world the number of days for a business requirement to reach production should be zero to maximise revenue.

This assumption is backed by the modern approach to software development: Agile and Lean methodologies are so successful nowadays because, amongst others, they bring the benefit of shortening the rollout of IT systems to production. Continuous Delivery is adopted by more and more companies for the same reasons. You will probably be familiar with the frustration of business stakeholders when requirements are not delivered to production fast enough. This was also one of the main reasons why we moved from pre-emptive methodologies such as Waterfall to less pre-emptive ones, such as Agile and Lean.

If the ideal number of days to deliver a business requirement to production is zero, any additional number of days can be considered as waste. From Lean, we know that we should aim at eliminating waste, therefore our role as IT leaders is to foster an environment where business requirements can be delivered to production as fast as possible. In doing so, we should focus our attention on quality as well, another cornerstone of Agile and Lean methodologies. However, whereas speed of delivery can be linked to revenues, quality can be linked to cost savings. Both, when applied to the delivery of an IT system, maximise profit.

The <ALT+F> framework provides a series of templates to measure IT Operations performance and amongst them we find a template to measure ALTPD.

To measure ALTPD we record some simple information:

The feature description

The Class Of Service (COS) for this feature

An id linking the feature to some sort of requirements planning tool, e.g. JIRA. (This entry is optional)

The date the feature was asked by the business

The date the feature was delivered to production

The number of lead days between the two dates above

There are few things to notice about ALTPD:

The COS can be defined at different levels of detail. ALTPD can be measured from outside or inside the team. In the former case, the COS will be coarse grained and they will typically be one of the main drivers for a transformation strategy which aims at optimising Time To Market. In the latter case, they will be used from within the team, typically by Scrum Masters or, as I call them, project leaders, to help them provide better estimates for similar tasks in future

The task id is optional, and can be linked to an electronic tracking system, e.g. JIRA

The starting date is the date the business required a feature, not the date the requirement entered the backlog. There is a difference between the two: once the business asks for a requirement, the clock in their mind starts ticking. As we've seen, any day that passes without their requirement being delivered causes waste for the business and therefore our goal should be to shorten as much as possible this delay. This is why I talk of Lead Time as opposed to Project Length. A business requirement might have been requested a year ago, but it might take just a month to deliver

In my book, I explain how ALTPD is the driver for a number of IT performance optimisations. Apart from the arguments that we've already discussed, i.e. Time To Market, part of ALTPD (i.e. from the moment the requirement actually becomes a fully flagged project) is directly linked to development costs. This is obvious: the more it takes to deliver a project to production, the higher the development costs.

So, if ALTPD is so important, how do we go about optimising it?

From outside the team, one possible way is to apply Lean principles to business requirements by introducing queues. One of the fundamental performance optimisation principles in Lean is to reduce the Work In Progress (WIP). This can be achieved by introducing queus and by setting limits on them. In our case, IT leaders can create a business requirements queue and set a limit on it. This is not as simple as it seems: such approach needs to be agreed and sponsored by the business. From a practical point of view, the agreement should be that the business can request only enough requirements to fill the queue up, but no more. The project leaders, depending on the methodology followed, would then pick one (Lean) or more (in case of Scrum) items from the backlog and move them to the WIP queue. As soon as items are moved from the backlog to the (Scrum/Kanban) board, the business could request for more requirements by again filling the backlog queue.

This approach offers tremendous benefits: by reducing WIP the time that usually goes in project ceremonies can instead be dedicated to the projects currently in progress. This reduces the margin for errors, delays commitment (if a requirement didn't enter the backlog and therefore nobody started working on it and in the meantime the market conditions changed, there is no waste of resources) and provides the business with a fast response.

Additionally, by adopting Agile and Lean methodologies it's possible to shorten the feedback cycle and derisk the project with the added benefit that IT gives the business a tool to change priorities at a short notice (in Lean shorter than in Agile).

I witnessed situations where items have been on an unbounded backlog for years, not becuase it took long time to implement them, but because they never made the project pipeline. However the business matured a poor consideration for IT, as in their mind the requirements hadn't been delivered.

From within the team, optimising ALTPD is something else alltogether. Here, the responsibility of the Scrum Master and the team members is to focus primarily on quality, although even here reducing WIP brings its benefits. There are various tools to achieve this goal: Test Driven Development (TDD), pair programming, automated test suites for unit, integration, performance and system tests, SCM and Continuous Integration environments, Continuous Delivery, DevOps and automated builds and deployments.

Optimising ALTPD has got the following benefits:

Maximise Time To Market thus revenue

Minimise short, medium and long term maintenance costs by ensuring that what's delivered to production won't come back as rework in the form of production bugs

Reduce development costs

Derisk the project

Increase social capital, as both the team and the business feel happier and more motivated by the results

Monday, 13 May 2013

In my previous article I described the most common problems affecting IT performance. In this article I'll describe ScruXBan, a methodology that I defined while writing my book on <ALT+F>.

I'll start looking at the most common Agile methodologies currently in use today, identifying pros and cons. I'll then describe ScruXBan as a possible alternative to those methodologies; an alternative that can help us improve the value we deliver the business with.

When faced with the problems I described in my previous article, the IT community moved from pre-emptive methodologies such as Waterfall to Agile first and, most recently, Lean.

Agile and Lean certainly helped solving the majority of the problems faced by IT teams, e.g. delayed Time To Market, early commitment, lack of communication between the business and IT and lack of quality. They did so by placing the business at the centre of the software development lifecycle, by applying iterative and incremental deliveries (what Craig Larman in his book Agile & Iterative Development defines as IID), by shortening the feedback lifecycle thus allowing IT decision makers to delay their commitment and finally by increasing attention to quality. The Toyota Production System or TPS, brought to the table an even more aggressive approaches to business value delivery, by defining Lean manufacturing and the concept of waste. In Lean, waste is considered as any activity for which customers wouldn't pay and as such, it could (and should) be eliminated without affecting the value delivered to the business. Lean manufacturing was eventually not only exported to the US, revolutionising the car manufacturing industry (especially Ford), but eventually was exported to other disciplines as well, amongst which software engineering.

Lean methodologies applied to software engineering go under the name Kanban, whereas the two most popular Agile methodologies currently in use are Scrum and XP. Taken in isolation, each of these methodologies have pros and cons, however it's the cons that I'm concerned with. If you talk to a Scrum Master, they'll tell you that in order for Scrum to work, one should apply Scrum by the book. However, having used Scrum in a number of projects, I came to realise that there are quite few aspects of Scrum that get in the way of maximise business value delivery whereas others are useful.

Let's start with Scrum. I think that Scrum is great in providing project management cadence. It defines the team structure (with the PO, the Scrum Master and the team), Sprint and Release Planning meetings and metrics (with velocity, burnups and burndowns). You want to do Agile? Well, if you go with Scrum you can certainly perform all classic project management activities while moving within an Agile methodology. However, in my view, Scrum presents some internal contradictions, especially when it comes to early commitment. It also implies some activities considered wasteful, e.g. sprint planning meeting. In fact I believe that the only thing that really doesn't work in Scrum are Sprints. Let's see why.

One of the cornerstones of Agile methodologies is that as Agile practitioners, we don't have a crystal ball when it comes to estimates. For this reason, in Scrum we're not asked to provide the business with an exact delivery date (or rather we're asked, but we push back). Instead we're asked to provide estimates of when, given what we know today, we think we'll be able to deliver the business with what they want. In Agile we refine the estimates as the project progresses and as our knowledge of the domain, the risks and our team mates increases. Additionally, as the teams work together and get to know each other, estimates become more and more accurate.

One of the project ceremonies in Scrum is to have a Sprint Planning meeting at the beginning of each iteration. The ceremony dictates that the team members commit to deliver all stories picked up from the backlog by the end of the upcoming Sprint. There are techniques aiming at reducing the risks of such approach, such as chosing a certain number of must have (high priority) stories, a smaller number of should have ones and finally a very small number of could have ones. The team is asked to commit to the delivery of all must have stories. I don't know if you've ever worked with Scrum, but my personal experience is that, no matter from which angle one looks at it, it's impossible to commit to the delivery of a certain number of stories by a certain date, simply because the future is unknown.

This is not, however, the worrying bit. What is really worrying is that by asking the team to commit effectively to a Fixed Date, Fixed Scope project, Scrum goes against its very foundating principles, i.e. that it's not possible to commit to a delivery date as uncertainty reigns our lives.

This is, in my modest opinion, the biggest pitfall and the biggest contradiction of Scrum. When applying Lean concepts to Scrum, the pitfalls become even more evident. The ceremonies surrounding Scrum are something a client wouldn't pay for, therefore can be considered waste and should, whenever possible, be eliminated.

OK, so from Scrum we don't like early commitment and process ceremonies.

What about XP? XP is an Agile methodology that represents a set of Principles and Practices. There is nothing wrong with that, in fact I like XP's Principles and Practices. All XP lacks is, for my requirements, project management cadence, i.e. XP tells us how to apply best development practices to our projects, how to promote a trust and grassroot culture, but it doesn't give us project management structure and I believe that this is a requirement for any project, be the methodology being used more or less pre-emptive.

Similarly, Kanban lacks project management structure, although in my opinion it's more structured than XP. This methodology in fact suggests to start with the current flow visualisation to expose and therefore eliminate waste and to continue with a smooth flow of activities aimed at maximising business value by reducing Work In Progress (WIP). Kanban makes also an attempt to help us be more precise when it comes to estimates, by introducing concepts such as Classes Of Service (COS). However, I can't see in Kanban the elements of project cadence.

I like Scrum's concepts such as team structure, velocity, burnups and burndowns, Release Planning Meetings and the Backlog.

So what to do? It would appear that if one was to choose one of the above methodologies in isolation would get and loose something.

I decided to use the most useful parts of all three methodologies by defining a new one called ScruXBan which, by now, I believe you'll have guessed where it gets its name from.

Scru(m) - X(P) - (Kan)Ban

From Scrum, I'm adopting the following concepts:

Team structure and roles (PO, SM and the team)

Backlog

Release Planning meeting

Burnups and burndowns

Velocity (although combined with COS just as a metric tool)

Stories

From XP, I'm adopting concepts around best development practices, such as a Trust Culture, grassroot approach, collective ownership and responsibility, TDD, Pair Programming and refactoring.

From Kanban, I'm adopting process management, i.e. the art of streamlining the workflow by visualising the process, identifying and remove waste, empowering and coaching the teams, seeing the big picture, doing the right thing right from the start.

If you want to know more about ScruXBan, I'd suggest to read my book on <ALT+F> where I introduce how to use this new methodology in an actual project.

Sunday, 12 May 2013

In my previous post I introduced the <ALT+F> framework. I mentioned how the framework moves withint the MAPE (Measure, Adapt, Plan and Execute) lifecycle.

If, during the Measure phase, the <ALT+F> templates show poor IT Operations performance, the next steps in the Adapt and Plan phases are to identify optimal targets to achieve operational excellence and best-in-class status and then to set the stage for an Agile and Lean transformation strategy that should be executed on two levels:

On the organisation level, the transformation strategy should educate people on both sides of the fence (i.e. business and IT) to what it means working in an Agile and Lean environment. This will be the topic of another post.

On the operational level, the transformation strategy should make use of Agile and Lean methodologies to streamline IT Operations, eliminate waste and maximise business value delivery.

In my book on <ALT+F> I introduce a new IT methodology called ScruXBan, which I believe helps IT organisation maximise the business value they deliver. Before entering into the details of this methodology, we first need to ask ourselves what are the common problems that IT departments face. This analysis is the content of this first article. In the next article of this series, I will describe in detail the ScruXBan methodology.

From my experience, the most common problems that cause poor IT performance boil down to the following aspects:

Too much time goes in unproductive activities, what Lean defines as waste, i.e. any activity for which customers wouldn't pay.

Gated processes. A typical example is represented by the Waterfall methodology which demands that before moving to the next SDLC phase, a formal sign off should be obtained for the phase currently in progress. So, for example, before moving to the Design phase, requirements should be signed off.

Early commitment. A super set of the previous point, early commitment is a lot more widespread than one might think. By early commitment, I mean the requirement to take unmodifiable decisions early on in the process. Pre-emptive IT methodologies, such as Waterfall, are a case in point. By asking various stakeholders to commit early to a certain phase in order to move to the next, we're asking for early commitment. However early commitment does not only apply to gates. We witness examples of early commitment in all phases that accompany an IT system from inception to production: the request for a precise delivery date, the decision of implementing a system in a particular way, the need for up-front budgets, the constraints imposed by contract negotiation, requirements specifications, the specialisation of certain people in certain roles (key-man dependency), etc.

Top-down management instead of grass-root leadership.

Considering offshoring simply as a way to save money, rather than an opportunity to introduce diverse skills in a team. How many of us could honestly say that offshore resources are treated as first-class IT citizens? And, conversely, how many of us could honestly say that we're not delegating to offshore teams repetitive and not-so-challenging tasks such as Quality assurance (QA) or maintenance?

The lack of access to business resources. Too often, developers are organised as if they should only think within the box, do what they're told, just deliver the task. What happened to seeing the big picture, experimenting and looking at development as a learning exercise? When stuck in such a restricted way of looking at the development activities, there can't be a favourable environment that promotes a dialogue between the business and IT and then two things normally happen: IT starts taking decisions for the business and the business starts loosing fate in IT, which introduces us to the next point.

Lack of trust and communitation barriers between the business and IT. When we, as leaders, don't foster an open communication between the business and IT, don't place the business at the centre of the process, don't see IT departments as a service to the business, we're actually promoting an environment that eventually leads to a luck of trust and communication between IT and the business. We're all supposed to work in the same team, but punctually we tend to do anything in our power to accentuate the split between who is supposed to drive the requirements (i.e. the business) and who is supposed to deliver them (i.e. developers).

Lack of attention to the quality of software deliverables. Too often, developers are only focused on gettings things done, little thinking goes to the quality of what's being delivered. If not properly coached, this attitude might result in false positivies, i.e. the perception that things are getting done until that time when they reach production. This is when the lack of quality really shows not only in the form of production bugs, but most importantly, as an avoidable cost. It's our responsibility, as IT leaders, to make sure our teams see the big picture and understand that the cost of production bugs, what in <ALT+F> I describe as CPB, increases the further we move away from the initial SDLC phases. It's our responsibility to educate our teams to think of quality long before thinking of speed. Sometimes the obsession with speed is fostered by the business who wants to see results. It's our responsibility to educate the business to the benefits of delivering high-quality products because if we fall in the trap of just giving the business what they want, our organisations will pay the price later on.

In my next article, I will describe how these common problems are being addressed and then I will introduce the ScruXBan methodology as an alternative to the current ones being used in the majority of cases.

Friday, 10 May 2013

After nearly 11 months, I finally published my book on <ALT+F>. The book kept me away from blogging for quite some time, but now that it's finished I've got a bit more time to blog.

I'd like to restart my blogging by introducing <ALT+F>. This is a framework that provides Agile and Lean leaders with a tool to help the organisations they work for streamline their IT Operations. How does it do that?

The framework follows the MAPE lifecycle, i.e. a cycle composed of the following phases:

Measure

Adapt

Plan

Execute

In the Measure phase, the framework implementors use the <ALT+F> templates to measure IT Operations performance. This is finally represented by the PILS (Productivity of IT Live Systems) formula.

There are quite few acronyms in the framework in order to make it easier to express complex concepts with symbols, very much like math symbols are used in formulae.

Let's analyse each one of the PILS formula components:

PBV. This represents the Perceived Business Value, i.e. the revenue that the business perceives IT live systems generate for an organisation. For some IT systems, it's easy to determine the revenue they generate (e.g. a trading application). For others, typically service-like systems, it's more difficult. I call the first category direct revenue-generating systems and the second service-like systems. To attribute an estimated revenue generated by service-like systems, it's necessary to identify which revenue-generating systems they support, then calculate the importance of service-like systems for the existence of direct revenue-generating systems and calculate the revenue as the resulting percentage.

BVCP. This is the Business Value of a CTB project. In my book I divide IT projects into two main categories:

CTB (Change The Business). These are projects thought to be game changers and revenue generators.

MTB (Maintain The Business). These are maintainance projects, e.g. activities on Legacy systems, Evergreening Projects, Production bugs or small enhancements

CPB. Cost of Production bugs.

CEP. Cost of Evergreening projects.

COLS. Cost of Legacy Systems.

KTLO. Keep The Lights On (or infrastructure costs)

The PILS formula provides a single figure that represents the profitability of IT live systems. Broadly speaking, this is given by the revenue generated by IT systems minus legacy systems costs (by Legacy System I mean any IT system that has been in production long enough to pass its warranty period).

Once the PILS is calculated, the framework implementors analyse each of the component's figures and the PILS value itself and ask the following question: is there a performance problem with IT Operations?

If there isn't, then they can stop, as this means that the organisation is already working at its best. However, <ALT+F> becomes useful once performance issues have been identified.

In this case, we enter the Adapt and Plan phases of the MAPE lifecycle, by identifying how to resolve the performance issues, setting optimal operational targets and planning a transformation strategy that uses Agile and Lean methodologies to streamline IT operations.

The transformation strategy should touch every level of the organisation, from business management to IT development and the adoption of Agile and Lean practices help achieve that on two levels:

Agile helps a cultural transformation at all levels of the organisation, leading people to think differently at the way they do business.

Lean methodologies help the execution side of IT organisation, by providing IT leaders and developers with a tool that streamlines the implementation of business requirements and that eliminates waste.

The framework then enters the Execution phase of the MAPE lifecycle, where the transformation strategy is actually executed. After the transformation strategy has been executed for some time, <ALT+F> implementors should calculate PILS again and answer the following questions:

Has the organisation improved its IT Operations performance?

Has the organisation achieved the optimal targets set during the Adapt and Plan phases?

If not, what are the gaps to reach operational excellence and best-in-class status?

Especially in the third case, the gap between actual and optimal IT Operations performance should provide the foundations for a Continuous Improvement culture, where the MAPE lifecycle and the <ALT+F> measurement templates are used again and again until the organisation reaches operational excellence.

In subsequent series of this article, I'll touch more in detail on the various PILS formula components, CTB vs MTB projects and Agile contracting and budgeting.

In the book, the <ALT+F> framework is presented through a novel, where a fictitious company is experiencing performance issues and the Head of IT, Neth, calls a friend of his, Sharon, to help him resolve them.

I hope you'll enjoy the book and if you do get a copy, I'd love to hear your feedback.

Sunday, 15 January 2012

I've been working in Agile environments for almost 6 years now and I still manage to learn everyday! Recently, for instance, I came across one reason why story points are a good measure for planning. We had a list of stories whose complexity the team had estimated in story points; however because we were at the beginning of our release, there was a high number of unknowns (which we addressed by creating spikes). We use to divide a story in tasks and to estimate tasks in Ideal Engineering Hours (IEH); however due to the number of unknowns, we didn't know all the tasks that would make up each story.

We had on the one hand a list of stories for which the team had a "feeling" about their complexity and which estimated considering the "complexity" of each story, and on the other hand a great number of spikes to identify tasks.

Thanks to the fact that the team estimated the complexity of each story, although it couldn't estimate all various tasks, allowed us to select the stories to address for the coming Sprint; hadn't we estimated stories in Story Points, we would not have been able to have a "feeling" for how many stories we could have committed to, because the number of tasks actually known was very low.

Morale: I learned that one of the true values of story points, as a measure of story complexity, is that they allow the team to plan an iteration even if the tasks to complete each iteration are unknown, so they are a great planning tool.

Of course, the outcome of different spikes could very well be that a story is ways more or ways less complex than originally estimated; that could however happen even if all tasks were known when planning the iteration (for instance because a high number of unknowns emerged during the Sprint, or the team realised that not all tasks originally planned were actually required). However having story points allowed us to provide an iteration estimate even if we didn't know in detail everything that was needed to complete that iteration.