Alignak is capable of monitoring hosts and services in two ways: actively and passively. Active checks are described in this chapter and passive checks are described in this chapter.

Active checks are the most common method for monitoring hosts and services. Active checks are initiated by the Alignak framework to “poll” a device on a regularly scheduled basis. In most cases you’ll use Alignak to monitor your hosts and services with this checking strategy.

Alignak also supports a way to monitor hosts and services passively instead of actively. Passive checks are initiated and performed by external applications/processes and then submitted to Alignak for its processing.

The major difference between active and passive checks is that active checks are initiated and performed by Alignak, while passive checks are performed by external applications.

Active checks are initiated by the check logic in the Alignak daemon. When Alignak needs to check the status of a host or service it will execute a plugin and pass it information about what needs to be checked. The plugin will then check the operational state of the host or service and report the results back to the Alignak daemon. Alignak will process the results of the host or service check and take appropriate action as necessary (eg. send notifications, run event handlers, etc).

At regular intervals, as defined by the check_interval and retry_interval options in your host and service definitions

On-demand as needed

Regularly scheduled checks occur at intervals equaling either the check_interval or the retry_interval in your host or service definitions, depending on which type of state the host or service is in. If a host or service is in a HARD state, it will be actively checked at intervals equal to the check_interval option. If it is in a SOFT state, it will be checked at intervals equal to the retry_interval option.

On-demand checks are performed whenever Alignak needs to obtain the latest status information about a particular host or service. For example, when Alignak is determining the reachability of an host, it will often perform on-demand checks for parent and child hosts to accurately determine the status of a particular network segment.

The external application notifies the result of the check to Alignak with an external command.

Alignak gets the external command and places the result of all passive checks into a queue for processing by the Alignak framework.

Alignak will execute a poll each second and scan the check result queue.

Each service check result is processed in the same manner - regardless of whether the check was active or passive. Alignak may send out notifications, log alerts, etc. depending on the check result information.

Passive checks are conditioned by another parameter: the freshness of the check. What if an external application does not raise any check since a long time? And what if a passively checked host does not give some news since several hours? Alignak allows to define a freshness threshold to make some decision about what is to be done in this situation.

When the freshness threshold is reached, Alignak sets the host or service in its defined freshness state and runs the appropriate actions according to this new state (eg. notifications, event handlers,…).

The processing of active and passive check results is essentially identical. This allows for seamless integration of status information from external applications with Alignak.

Note

When the freshness threshold is reached, Nagios will run the check_command. Alignak do not implement such a behavior!It simply makes the host/service go to its defined freshness_state and executes the according actions if any…

External applications can submit passive service check results to Alignak by notifying a PROCESS_SERVICE_CHECK_RESULT external command.

The format of the command is as follows: [<timestamp>]PROCESS_SERVICE_CHECK_RESULT;<host_name>;<svc_description>;<return_code>;<plugin_output>
where:

timestamp is the time in time_t format (seconds since the UNIX epoch) that the service check was performed (or submitted).

host_name is the short name of the host associated with the service in the service definition

svc_description is the description of the service as specified in the service definition

return_code is the return code of the check (0=OK, 1=WARNING, 2=CRITICAL, 3=UNKNOWN)

plugin_output is the text output of the service check (i.e. the plugin output)

Note

The plugin_output can also contain some performance data. To include performance data you simply
need to include a | and the perf_data string after the plugin_output.

A service must be defined in Alignak before Alignak will accept passive check results for it! Alignak will ignore all check results for undefined services unless you set the accept_passive_unknown_check_results option in the monitoring configuration file.

Once data has been received by Alignak, the check results will be forwarded to the appropriate Scheduler which will apply the check logic.

The format of the command is as follows: [<timestamp>]PROCESS_HOST_CHECK_RESULT;<host_name>;<monitoring_objects/host_status>;<plugin_output>
where:

timestamp is the time in time_t format (seconds since the UNIX epoch) that the host check was performed (or submitted). Please note the single space after the right bracket.

host_name is the short name of the host (as defined in the host definition)

host_status is the status of the host (0=UP, 1=DOWN, 2=UNREACHABLE)

plugin_output is the text output of the host check

Note

The plugin_output can also contain some performance data. To include performance data you simply
need to include a | and the perf_data string after the plugin_output.

A host must be defined in Alignak before you can submit passive check results for it! Alignak will ignore all passive check results for undefined hosts unless you set the accept_passive_unknown_check_results option in the monitoring configuration file.

Once data has been received by Alignak, the check results will be forwarded to the appropriate Scheduler which will apply the check logic.

Unlike with active host checks, Alignak does not attempt to determine whether an host is DOWN or UNREACHABLE with passive checks. Rather, Alignak takes the passive check result to be the actual state the host is in and doesn’t try to determine the hosts’ actual state using the reachability logic.

You can tell Alignak to translate DOWN/UNREACHABLE passive check result states to their “proper” state by using the translate_passive_host_checks variable.

This chapter may seem quite esoteric for some of the readers but it uses an algorithm-like style to describe what’s Alignak doing when it gets a check result. This may help understanding the framework behavior ;)

what does Alignak do when it gets a check result? Here are the steps of the check result processing:

if check status is not 0 and some dependencies exist, wait the result of dependent checks

get the check data: execution time, output, …

modulate the check status if some check modulation is defined

set real item state according to plugin check status and impacts management

manage the check status, if all dependencies are down, set item as unreachable

manage the new state:

to UP/OK from UP/OK/PENDING:

unacknowledge former problem

if state type SOFT and not last state PENDING

if max attempts and SOFT state

HARD state

else

SOFT state

else

state type HARD
attempt 1

to UP/OK from WARNING/CRITICAL/UNKNOWN/UNREACHABLE/DOWN (other states)