How to make ITIL Problem Management agile

First of all, what is Problem Management? Problem Management is an ITIL Service Operations process. The goal of this process is to find out the root cause of incidents that are re-occurring and to fix them permanently by taking away that root cause. The value of problem management is that this increases customer happiness, business value and your reliability as an IT provider.

Strengths of ITIL problem management

The ITIL Problem management process has a number of strengths, being:

Focus is not just on restoring IT Services as soon as possible (Incident Management), but is on fixing root causes, therefore preventing incidents from happening again and structurally increasing the quality of your IT Services.

ITIL assigns a problem manager to this process, making sure that there is always someone responsible for following up on a problem. Of course, ownership for analysing and fixing problems is preferably delegated to a team. However, when a team requires different disciplines to find a root cause, it helps to have a problem manager taking ownership .

Problem management gives attention to problemsby administering them and periodically reporting on them. This provides the organization with valuable information on the Quality of Service and possibilities to increase it.

Weaknesses of ITIL problem management

Though there is clearly value in problem management, in my experience it is usually not the most well implemented and effective ITIL process. I stumble upon three issues regularly:

Problems are usually not very urgent and as a result little or no capacity is allocated to problem management, as there are always incidents and/or requests that are (more) urgent. Of course, this is a vicious circle, as not fixing problems means incidents keep re-occurring, but I have seen this happening in many organizations.

There is often not a clear process on making the customer impact of a problem visible. Reason is that there is one central problem manager, who is managing the problems, but doesn’t have the knowledge to assess the customer impact. As little resources are available for problem management, they are usually allocated to analysing and fixing problems instead of assessing their impact. However, not knowing the impact or business value makes it hard to prioritize them, but also to claim resources for fixing them. Thus, a vicious circle again.

Third, problem managers often don’t have a clear decision making process or sufficient mandateto decide which problems are worth fixing and which are not. As a result, all problems that are identified are put on the problem list and organization ends up with a problem list that contains hundreds of problems. Such a list is way too long to effectively manage and the problem manager usually gets demotivated, meaning that even less energy is put into this process .

Though there is definitely value in the Problem Management, in reality the benefits are often not obtained. Luckily, best practices from agile and scrum can overcome a number of these issues.

Agile problem management

I see value in combining the agile mindset and the practices of the scrum framework with the ITIL problem management process, to overcome the issues but keep the benefits:

Decentralize the problem management process to the development teams, by putting problems on the backlog of the teams and allowing development teams to create problems as user stories on their backlog, when they see incidents re-occurring.

Treat problems like any product backlog item and have the product owner assign a business value/impact to the problem and prioritize it on the backlog accordingly.

When the business value or impact is too low, the problem is not put on the backlog, the same way the product owner handles requests for new features and incidents that don’t have sufficient business value.

When the priority of a problem is sufficiently high, it reaches the top of the backlog and can be pulled in a sprint by the team.

So far, that sounds really simple and straightforward. But there a some issues to manage:

How will you keep the benefit of problem management, being that it increases quality, when problem can be assigned a low priority by the product owner? Attention to quality is already an integrated part of the scrum framework and the agile way of working. The Definition of Done needs to ensure that user stories always meet certain quality criteria, by for example having coding standards and always doing a peer review on the code. In addition, it is strongly recommended to implement tooling that measures compliance with coding standards real time and make sure deviations from these standards are fixed before code get into Production.

As scrum works with dedicated teams, this makes it easier to spot incidents that are re-occurring. The development team will also act as a stakeholder for the product owner, explaining the value of a problem the team wants to put on the backlog.

Finally, if incidents are registered in ticketing tooling, this allows for trend analysis either being done by the team or a central department.

How can you deal with the fact that it can be difficult to estimate the effort of analysing a problem?

A problem can be refined by the team leading to a number of hypotheses on the possible root cause of the problem. The team then decides on a fixed timebox needed to validate one or more of those hypotheses. This way the problem can be timeboxed in a sprint. If the root cause is not known at the end of the timebox, assess if there is still value in further analysing the problem, if so do more refinement to come up with new hypotheses that can be validated and a new timebox for upcoming sprints. This might seem like taking a long time, but as problems are usually not urgent, this is often a workable approach.

Who will be responsible for the coordination of a problem as there is no central problem manager?

If a problem is worth investigating this means that it has business value or impact. Therefore the product owner of the team that owns this business value of impact, should be owning the problem and put it on his or her backlog. If there is no product owner willing to have the problem on his or her backlog, than apparently there is not sufficient business value or impact. If that’s the case there is no point in analysing the problem.

How do you deal with the fact that a problem analysis often requires effort of other teams to find the root cause?

In scrum, teams try to minimize dependencies as much as possible, as coordination over different teams is always more difficult and slower. However, backlog items can always have dependencies with other teams. The same process used for dealing with backlog items with dependencies can be used for problems that require a multidisciplinary approach. It makes sense to discuss these problem in regular meetings that are held between product owners, to make sure all product owners involved have an understanding of the business value/ impact of the problem and the need to cooperate. That way product owner can coordinate a timebox for analysing the problems, during which members of multiple teams are available.

How can you get multiple teams to work together on coming up and validating hypotheses on finding the root cause?

If people from different teams have allocated a timebox in their sprint for working on a problem it makes sense to co-locate them together for the duration of this timebox. This way it will be easier to come up with hypotheses together and share the results of the activities done to validate them. It also takes away the need for status meetings, making the process more efficient. Though scrum is pretty keen keeping team members in their own development team and not making them part of others teams, this approach seem valid. Especially as it is usually for a short timebox of perhaps 1 -2 days.

In conclusion, agile problem management means that problem management is fully decentralized to the scrum teams and there is no longer a central problem manager who is responsible for problems. However, problem management as a concept remains valid as it makes sense to identify a product backlog item as a problem to have an overall insight in the number of problems in a team or organization as this gives an indication on the Quality of Service of an organization.

If you have experience combining ITIL Problem Management with agile and scrum, I would love to hear what your experience are and which best practices you have come across.