The state of an alarm is controlled by the Threshold Engine unless it is explicitly set via the Monasca API.

Alarm state transition event

An event that is created by the Threshold Engine when the alarm transitions state.

Assignment/Owner

The user that the incident is assigned to.

Comment

A comment on an incident.

Actions

Similar to alarm definition actions in Monasca, incidents can also have actions which occur when an incident is modified.

Incident Lifecycle

This section describes the lifecycle of an incident.

Alarms states

Alarm states transition events are created by the Threshold Engine and are processed as follows:

To ALARM

Open a new incident for the supplied alarm, or add an alarm state transition event to an existing incident.

If an incident doesn't exist for the alarm, or the status of the incident has been RESOLVED, a new incident is created with the incident status as OPEN.

If there exists an incident with a status of OPEN or ACKNOWLEDGED for the alarm, the alarm state transition event is added to the existing incident, and the status is not modified.

To OK

Adds an alarm state transition event to an existing incident.

If an incident doesn't exist for the alarm, or the status of the incident has been RESOLVED, nothing is done.

If there exists an incident with a status of OPEN or ACKNOWLEDGED for the alarm, the alarm state transition event is added to the existing incident, and the status is not modified.

To UNDETERMINED

Open a new incident for the supplied alarm, or adds an alarm state transition event to an existing incident.

If an incident doesn't exist for the alarm, or the status of the incident has been RESOLVED, a new incident is created with the incident status as OPEN.

If there exists an incident with a status of OPEN or ACKNOWLEDGED for the alarm, the alarm state transition event is added to the existing incident, and the status is not modified.

Incident status

The Incident status is modified via the Incident Manager API and processed as follows:

To ACKNOWLEDGED

Modify the incident to ACKNOWLEDGED.

Publish incident status event to Kafka which is processed by the Notification Engine.

If an incident is acknowledged, it won't generate any additional notifications, even if it receives new alarm state transition events.

To RESOLVED

Modify the incident to RESOLVED.

Publish incident status event to Kafka which is processed by the Notification Engine.

If an incident is resolved, it won't generate any additional notifications.

Assign or reassign incident

Assign or reassign incidents are processed as follows:

When an incident is created it is initially unassigned. It can then be assigned or reassigned later.

Incidents

GET /v2.0/incidents/

Query parameters

status

state

assigned_to

acknowledged_by

create_start_time

status_update_start_time

GET /v2.0/incidents/{incident-id}

PATCH /v2.0/incidents/{incident-id}: Update an incident, such as modifying the status to ACKNOWLEDGED or RESOLVED.

GET /v2.0/incidents/history: Get the history of all incidents filtering on the supplied query parameters.

Query parameters

status (string, optional)

state (string, optional)

created_timestamp (string, optional)

GET /v2.0/incidents/{incident-id}/history/: Get the history of a specific incident

Incident Response Object

id: The ID of the incident.

name: The name of the incident.

description: The description of the incident.

alarm: {alarm}

alarm_state_transitions: [{alarm_state_transition}]

status: OPEN, ACKNOWLEDGED, RESOLVED

created_timestamp: The timestamp when the incident was created.

status_updated_timestamp: The timestamp when the incident was last updated.

comments: [comment-id]: An array of comments for the incident.

assignments: [{Assignment}]: The user ID and timestamp that the incident was assigned.

acknowledgments: [{Acknowledgment}]: The user ID and timestamp that acknowledged the incident.

actions: [{notification-method}]: Array of notification method IDs that are invoked when the incident is modified in any way.

Comments

GET /v2.0/comments

Query parameters

incident_id (string, optional) -

GET /v2.0/comments/{comment-id}

POST /v2.0/comments

Comment Object

id

incident_id

created_timestamp

comment

user-id (string, required)

Architecture

Monasca Incident Manager

Provides an API that enables the following:

Query and update incidents, such as updating the status of incidents.

Create and query comments

Consumes alarm state transition events from the Kafka alarm state transition events topic.

Creates incidents in the MySQL database based on the rules listed above

Publishes incident transition events to the incident transition events topic in Kafka which are consumed by the Notification Engine and result in notifications being sent.

MySQL

Schemas

Incidents

Comments

Issues

How to assign actions when a new incident is created?

Should alarm IDs match to incidents directly, or should there be a level of indirection between an incident ID and an alarm ID? In PagerDuty you create an incident and get a response that has the incident ID, which the client should store. On subsequent events, the same incident ID can be provided for the same alarm. If the incident has been resolved an new incident is created and a new incident ID is returned. If the incident has not been resolved, the event is added to the incident. In PagerDuty the responsibility is on the client to manage the incident IDs associated with an alarm such that on subsequent alarm events the incident ID can be provided. What is described here is that the Incident Manager creates new incident when a alarm event occurs, but the incident tracking the alarm has already been resolved.