Change Risk Expert

Overview

“Never touch a running system” — unfortunately, running IT systems are not static systems. Applications need to be adapted, preventive changes carried out, bugs fixed, faulty configurations corrected, and updates applied. Changes bring with them the risk of failure. Seemingly innocent modifications can trigger cascading avalanches of service disruptions, in the worst case bringing entire companies to a standstill.

Changes are responsible for some 80% of all incidents that result in client outages. The more complex an IT system, the more difficult it becomes to estimate the effect of a change.

An essential aspect of an effective change management process is risk management, which aims to assess and mitigate the impact of changes to reduce any chance of failure. Today, IT service providers typically assess the risk of a change through the risk categorization approach performed either manually, or through a questionnaire. The manual approach to estimating risk is very subjective and in the worst-case biased. The questionnaire approach suffers from applying the same set of questions regardless of the type of change request submitted. There is, thus, a need for a more accurate risk assessment method that takes into consideration the unique context of each change request.

Another important issue is that, the larger an IT organization, the more difficult it becomes for individual change requesters to stay abreast of the success and failure reasons encountered by their colleagues.

To address both issues, CRE employs an advanced classification method that goes beyond the current change classification systems and classifies changes finely in order to define a unique change context for accurate risk management and effectively share best practices and disseminate lessons learned in a targeted fashion by showing only the relevant information to the change requesters at a given time. Furthermore, CRE’s advanced real-time risk management capabilities ensure proper assessment and mitigation of change risks, thereby reducing the chance of failure.

For the change ticket classification, we choose to implement a regularized logistic regression as it has been shown to provide outstanding predictive performance across a range of text classification tasks and corpora. Although the maximum entropy classifier yields very high classification accuracy, the creation of labeled tickets is costly. This is further exacerbated by the fact that classifiers trained for support groups in one location cannot readily be transferred to other support groups performing the same task due to variations in lingua.

To reduce labeling costs we added active learning, experiment with transfer learning as well as a general expectation criteria classifier. Change tickets are classified using their short description (around 100 characters long and human-generated).