All the variables described in this chapter may be used in the Alignak environment configuration file as defined in the following chapter.

Note

some variables existing in the Alignak environment configuration file are not described in this chapter. This because they are not really useful for configuration or too specific … despite they are commented and explained in the default shipped configuration file;)

Note

When creating and/or editing configuration files, keep the following in mind:

Lines that start with a # or ; character are comments that are not processed

This variable defines the name of the Alignak instance. This is useful, for instance, when you share an Alignak backend between several Alignak instances to identify the source of the stored information.

The arbiter daemon can report the overall Alignak status to an external application that exposes the same services as implemented by the Alignak Web service module.
The Arbiter will report the Alignak status as a passive host check. The Alignak daemons are considered as some services of an host named with the instance alignak_name.

The arbiter is checking the satellites that it launched every daemons_check_period seconds. If daemons_failure_kill is set, and a missing process is detected, it will stop all the other self-launched daemons and stop itself.

These parameters allow to configure how Alignak will export its inner performance metrics to a StatsD/Graphite server.

When graphite_enabled is set, the Alignak internal metrics are sent to a graphite/carbon port (statsd_host:statsd_port) instead of a StatsD instance (if statsd_enabled is set). Contrary to StatsD, Graphite/carbon uses a TCP connection but it allows to bulk send metrics. This is more reliable and improved than the StatsD interface that is based upon UDP

Some environment variables exist to log the metrics to a file in append mode:

‘ALIGNAK_STATS_FILE’

the file name

‘ALIGNAK_STATS_FILE_LINE_FMT’

defaults to [#date#] #counter# #value# #uom#n’

‘ALIGNAK_STATS_FILE_DATE_FMT’

defaults to ‘%Y-%m-%d %H:%M:%S’
date is UTC
if configured as an empty string, the date will be output as a UTC timestamp

If a file is enough for you, set the statsd_host ‘None’ and the metrics will not be sent to the StatsD/Graphite.

This setting determines how often (in minutes) that Alignak scheduler will automatically save retention data during normal operation.
If you set this value to 0, it will not save retention data at regular intervals, but it will still save retention data before shutting down or restarting.

This option determines the maximum number of minutes from when Alignak starts that all hosts/services (that are scheduled to be regularly checked) are checked. This option will ensure that the initial checks of all hosts/services occur within the timeframe you specify. Default value is 30 (minutes).

This is the maximum number of seconds that Alignak will allow service/host checks to run. If checks exceed this limit, they are killed and a CRITICAL state is returned. A timeout error will also be logged.

There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off plugins which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each check normally finishes executing within this time limit. If a check runs longer than this limit, Alignak will kill it off thinking it is a runaway processes.

This option is used to set the history size of states keep by the scheduler to make the flapping calculation. By default, the value is 20 states kept.

The size in memory is for the scheduler daemon : 4Bytes * flap_history * (nb hosts + nb services). For a big environment, it costs 4 * 20 * (1000+10000) - 900Ko. So you can raise it to higher value if you want. To have more information about flapping, you can read this.

This option is used to know if we apply or not the state change when a host or service is impacted by a root problem (like the service’s host going down or a host’s parent being down too). The state will be changed by UNKNONW for a service and UNREACHABLE for a host until their next schedule check. This state change do not count as a attempt, it’s just for console so the users know that theses objects got problems and the previous states are not sure.

This option allows you to override the default timezone that this instance of Alignak runs in. Useful if you have multiple instances of Alignak that need to run from the same server, but have different local times associated with them. If not specified, Alignak will use the system configured timezone.

This option determines whether or not the Alignak daemon will make all standard macros available as environment variables to your check, notification, event hander, etc. commands. In large installations this can be problematic because it takes additional CPU to compute the values of all macros and make them available to the environment. It also costs an increased network communication between schedulers and pollers.

This variable determines whether or not Alignak will force all initial host and service states to be logged, even if they result in an OK state. Initial service and host states are normally only logged when there is a problem on the first check. Enabling this option is useful if you are using an application that scans the log file to determine long-term state statistics for services and hosts.

This variable determines whether or not notification messages are logged. If you have a lot of contacts or regular service failures your log file will grow (let say some Mo by day for a huge configuration, so it’s quite OK for nearly every one to log them). Use this option to keep contact notifications from being logged.

This variable determines whether or not service/host check retries are logged. Service check retries occur when a service check results in a non-OK state, but you have configured Alignak to retry the service more than once before responding to the error. Services in this situation are considered to be in “soft” states. Logging service check retries is mostly useful when attempting to debug Alignak or test out service/host event handlers.

This variable determines whether or not service and host event handlers are logged. Event handlers are optional commands that can be run whenever a service or hosts changes state. Logging event handlers is most useful when debugging Alignak or first trying out your event handler scripts.

This is the maximum number of seconds that Alignak will allow a host performance data processor command or service performance data processor command to run. If a command exceeds this time limit it will be killed and a warning will be logged.

This option allows you to specify a command to be run after every host/service check to process host/service performance data that may be returned from the check. The command argument is the short name of a command definition that you define in your object configuration file. This command is only executed if the Performance Data Processing Option option is enabled globally and if the process_perf_data directive in the host definition is enabled.

This option determines whether or not Alignak will treat passive host checks as HARD states or SOFT states. As a default, a passive host check result will put a host into a HARD state type. You can change this behavior by enabling this option.

This option determines whether or not Alignak will execute predictive checks of hosts/services that are being depended upon (as defined in host/services dependencies) for a particular host/service when it changes state. Predictive checks help ensure that the dependency logic is as accurate as possible.

This option allows you to enable or disable checks for orphaned service/host checks. Orphaned checks are checks which have been launched to pollers but have not had any results reported in a long time.

Since no results have come back in for it, it is not rescheduled in the event queue. This can cause checks to stop being executed. Normally it is very rare for this to happen - it might happen if an external user or process killed off the process that was being used to execute a check.

If this option is enabled and Alignak finds that results for a particular check have not come back, it will log an error message and reschedule the check. If you start seeing checks that never seem to get rescheduled, enable this option and see if you notice any log messages about orphaned services.

This option determines whether or not Alignak will use soft state information when checking host and service dependencies. Normally it will only use the latest hard host or service state when checking dependencies. If you want it to use the latest state (regardless of whether its a soft or hard state type), enable this option.

This option determines the maximum amount of time (in seconds) that the state of a previous host check is considered current. Cached host states (from host/service checks that were performed more recently than the time specified by this value) can improve host check performance immensely. Too high of a value for this option may result in (temporarily) inaccurate host/service states, while a low value may result in a performance hit for host/service checks. Use a value of 0 if you want to disable host/service check caching. More information on cached checks can be found here.

Tip

Nagios default is 15s, but it’s a tweak that make checks less accurate. So Alignak uses 0s as a default. If you have performance problems and you can’t add a new scheduler or poller, increase this value and start to buy a new server because this won’t be magical ;).

This option determines whether or not the Alignak daemon will take shortcuts to improve performance. These shortcuts result in the loss of a few features, but larger installations will likely see a lot of benefit from doing so. If you can’t add new satellites to manage the load (like new pollers), you can activate it.

This option determines whether or not Alignak will try and detect hosts and services that are “flapping”. Flapping occurs when a host or service changes between states too frequently, resulting in a barrage of notifications being sent out. When Alignak detects that a host or service is flapping, it will temporarily suppress notifications for that host/service until it stops flapping.

More information on how flap detection and handling works can be found here.

This is the maximum number of seconds that Alignak will allow event handlers, notifications to be run. If an command exceeds this time limit it will be killed and a warning will be logged.

There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off commands which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more for notification), so that each event handler command normally finishes executing within this time limit. If an event handler runs longer than this limit, Alignak will kill it off thinking it is a runaway processes.

This option determines whether or not Alignak will periodically check the “freshness” of host/service checks. Enabling this option is useful for helping to ensure that passive service checks are received in a timely manner.

This setting determines how often (in seconds) Alignak will periodically check the “freshness” of host/service check results. If you have disabled host/service freshness checking (with the check_service_freshness option), this option has no effect.

This option allows you to specify a host event handler command that is to be run for every host state change. The global event handler is executed immediately prior to the event handler that you have optionally specified in each host definition. The command argument is the short name of a command that you define in your commands definition. The maximum amount of time that this command can run is controlled by the Event Handler Timeout option. More information on event handlers can be found here.

Such commands should not be so useful with the new Alignak distributed architecture. If you use it, look if you can avoid it because such commands will kill your performance!

This is the number of seconds per “unit interval” used for timing in the scheduling queue, re-notifications, etc. “Units intervals” are used in the object configuration file to determine how often to run a service check, how often to re-notify a contact, etc.

The default value for this is set to 60, which means that a “unit value” of 1 in the object configuration file will mean 60 seconds (1 minute).

Tip

Changing this option is not a good thing with Alignak. It’s not designed to be a hard real time monitoring system…

This option allows you to specify illegal characters that cannot be used in host names, service descriptions, or names of other object types. Alignak will allow you to use most characters in object definitions, but we recommend not using the characters shown in the example above because it may give you problems in the web interface, notification commands, etc.

This option allows you to specify illegal characters that should be stripped from macros before being used in notifications, event handlers, and other commands. This DOES NOT affect macros used in service or host check commands. You can choose to not strip out the characters shown in the example above, but we recommend you do not do this. Some of these characters are interpreted by the shell (i.e. the backtick) and can lead to security problems. The following macros are stripped of the characters you specify:

This option allows you to specify the prefix that is prepended to the Alignak macros when they are propagated to the executed plugins shell environement. The default prefix is ALIGNAK_ and this variable to specify an alternate prefix. Indeed, some existing scripts may use the default Nagios / Shinken NAGIOS_ prefix… so feel free to declare this legacy prefix here;)

This is the pager number (or pager email gateway) for the administrator of the local machine (i.e. the one that Alignak is running on). The pager number/address can be used in notification commands by using the $ADMINPAGER$ macro.

These parameters allow to configure the scheduler actions execution period.
Each parameter is a scheduler recurrent action. On each scheduling loop turn, the scheduler checks if the time is come to execute the corresponding work.

Each parameter defines on which loop turn count the action is to be executed. Considering a loop turn is 1 second, a parameter value set to 10 will make the corresponding action to be executed every 10 seconds.

Note

changing some of those parameters may have unexpected effects! Do not change unless you know what you are doing ;)

Tip

Some tips:
- tick_check_freshness, allow to change the freshness check period
- tick_update_retention, allow to change the retention save period

tick_update_downtimes_and_comments=1tick_schedule=1### Check host/service freshness every 10 secondstick_check_freshness=10tick_consume_results=1tick_get_new_actions=1tick_scatter_master_notifications=1tick_get_new_broks=1tick_delete_zombie_checks=1tick_delete_zombie_actions=1tick_clean_caches=1### Retention save every hourtick_update_retention=3600tick_check_orphaned=60### Notify about scheduler status every 10 secondstick_update_program_status=10tick_check_for_system_time_change=1### Internal checks are computed every loop turntick_manage_internal_checks=1tick_clean_queues=1### Note that if it set to 0, the scheduler will never try to clean its queues for oversizingtick_clean_queues=10tick_update_business_values=60tick_reset_topology_change_flags=1tick_check_for_expire_acknowledge=1tick_send_broks_to_modules=1tick_get_objects_from_from_queues=1tick_get_latency_average_percentile=10