The main role of this feature is to allow users to have in one “indicator” the aggregation of
more complex monitored elements. This indicator can provide a unique view for users focused on different roles.

Typical roles:

Service delivery Management

Business Management

Engineering

IT support

Let’s take a simple example of a service delivery role for an ERP application. It mainly consists of the following IT components:

2 databases, in high availability, so with one database active, the service is considered up

2 web servers, in load sharing, so with one web server active, the service is considered up

2 load balancers, again in high availability

These IT components (Hosts in this example) will be the basis for the ERP service.

With business rules, you can have an “indicator” representing the “aggregated service” state for
the ERP service! Alignak already checks all of the IT components one by one including processing
for root cause analysis from a host and service perspective.

It’s a simple service (or an host) with a “special” check_command named bp_rule. :)

This makes it compatible with all your current habits and UIs. As the service aggregation is
considered as any other state from a host or service, you can get notifications, actions and
escalations. This means you can have contacts that will receive only the relevant
notifications based on their role.

Warning

You must NOT define the bp_rule command, it’s purely internal and Alignak already defined it for you.

Here is a configuration example for an ERP service, attached to a dummy host named “servicedelivery”.

A complete service delivery view should include an aggregated view of the end user
availability perspective states, end user performance perspective states, IT component states,
application error states, application performance states. This aggregated state can then be used
as a metric for Service Management (basis for defining an SLA).

The Xof: expression may have different values depending on the needs.
The supported expressions are described below:

a positive integer, which means “at least X host/services should be UP/OK”

a positive percentage, which means “at least X percents of hosts/services should be UP/OK”.

This percentage expression may be combined with grouping expression expansion to build expressions
such as “95 percents of the web front ends should be up”. This way, adding hosts in the web
frontend hostgroup is sufficient, and the QoS remains the same.

a negative integer, which means “at most X host/services may be down”

a negative percentage, which means “at most X percents of hosts/services should may be down”.

This percentage expression may be combined with grouping expression expansion to build expressions
such as “5 percents of the web front ends may be down”. This way, adding hosts in the web
frontend hostgroup is sufficient, and the QoS remains the same.

In the Xof: way the only case where you got a “warning” (=”degraded but not dead”)
is when all your elements are in WARNING state. But you should want to be in WARNING if 1 or your
3 HTTP server is CRITICAL: the service is still running, but in a degraded state.

For this you can use the extended operator X,Y,Z of:

X: number min of OK to get an overall OK state

Y: number min of WARNING to get an overall WARNING state

Z: number min of CRITICAL to get an overall CRITICAL state

State processing will be done the following order:

is Ok possible?

is critical possible?

is warning possible?

if none is possible, set OK.

Here are some example for business rules about 5 services A, B, C, D and E: 5,1,1 of:A|B|C|D|E

Sometimes, you do not want to specify explicitly the hosts/services contained in a business rule,
but prefer use a grouping expression such as hosts from the hostgroup xxx,
services holding label yyy or hosts which name matches regex zzz.

To do so, it is possible to use a grouping expression which is expanded into hosts or services.
The supported expressions use the following syntax:

flag:expression

The flag is a single character qualifying the expansion type. The supported types (and associated flags) are described in the table below.

Labels are arbitrary names which may be set on any host or service using the label property.

Tags are the template names inherited by hosts or services, usually coming from packs.

It is possible to combine both host and service expansion expression to build complex business rules.

Note

A business rule expression must always be made of an host expression (a selector)
AND a service expression (still a selector) separated by a coma when looking at service status.
If not so, there is no mean to distinguish a host status from a service status in the expression.
In servicegroup flag case, as you do not want to apply any filter on the host
(you want ALL services which are member of the XXX service group, whichever host they are bound to),
you may use the * host selector expression. The correct expression syntax should be:
bp_rule!*,g:my-servicegroup
The same rule applies to other service selectors (l, r, t, and so on).

As of any host or service check, a business rule having its state in a non OK state may send
notifications depending on its notification_options directive. But what if the underlying
problems are known, and may be acknowledged ? The default behaviour is to continue sending notifications.

This may be what you need, but what if you want the business rule to stop sending notifications ?

Imagine your business rule is composed of all your site’s web front ends. If a host fails, you
want to know it, but once someone starts to fix the issue, you don’t want to be notified anymore.
A possible solution is to acknowledge the business rule itself. But if you do so, any other
failing host won’t get notified. Another solution is to enable smart notification on the business rule check.

Smart notifications is a way to disable notifications on a business rule having all its
problems acknowledged. If a new problem occurs, notifications will be enabled back while it has not been acknowledged.

To enable smart notifications, simply set the business_rule_smart_notifications to 1.

Downtimes are a bit more tricky to handle. While acknowledgement are necessarily set by humans,
downtimes may be set automatically (for instance, by maintenance periods). You may still want
to be notified during downtime periods. As a consequence, downtimes are not taken into account by
smart notification processing, unless explicitly told to do so.

To enable downtimes in smart notifications processing, simply set the business_rule_downtime_as_ack to 1.

Another useful usage of business rules is consolidated services. Imagine you have a large web
cluster, composed of hundreds of nodes. If a small portion of the nodes fail, you may receive a
large number of notifications, which is not convenient. To prevent this, you may use a business
rule looking like `bp_rule`!g:web,.... If you disable notifications by setting
notification_options to n on the underlying hosts or services, you would receive a single
notification with all the failing nodes in one time, which may be clearer.

To avoid having to manually set notification_options on each node, you may use two convenient
directives on the business rule side: business_rule_host_notification_options which enforces
notification options of underlying hosts, and business_rule_service_notification_options which
does the same for services.

This feature, combined with the convenience of packs and Smart notifications allows to build large consolidated services very easily.

In the previous example, HTTP/HTTPS services come from the http pack. If one or more http
servers fail, a single notification would be sent, rather than one per failing service.

Warning

It would be very tempting in this situation to acknowledge the consolidated service
if a notification is sent. Never do so, as any, as any new failure would not be
reported. You still have to acknowledge each independent failure.
Take care to explain this to people in charge of the operations.

It is possible in a business rule expression to include macros, as you would do for normal
check command definition. You may for instance define a custom macro on the host or service
holding the business rule, and use it in the expression.

Combined with macro modulation, this allows to define consolidated services with variable fault tolerance thresholds depending on the timeperiod.

Imagine your web frontend cluster composed of dozens servers serving the web site. If one is
failing, this would not impact the service so much. During the day, when the complete team is
at work, a single failure should be notified and fixed immediately. But during the night, you
may consider that losing let’s say up to 5% of the cluster has no impact on the QoS: thus waking
up the on-call guy is not useful.

You may handle that with a consolidated service using macro modulation combined with an X of: expression.

In the previous example, during the day, we’re outside the modulation period. The _XOF_WEB is
not defined, so the resulting business rule is g:web,g :HTTPS?. During the night, the macro is
set a value, then the resulting business rule is -5% of: g:web,g:HTTPS?, allowing to lose 5%
of the cluster silently.

By default, business rules checks have no output as there’s no real script or binary behind.
But it is still possible to control their output using a templating system.

To do so, you may set the business_rule_output_template option on the host or service holding
the business rule. This attribute may contain any macro. Macro expansion works as follows:

All macros outside the $( and )$ sequences are expanded using attributes set on the host or service holding the business rule.

All macros between the $( and )$ sequences are expanded for each underlying problem using its attributes.

All macros defined on hosts or services composing or holding the business rule may be used in
the outer or inner part of the template respectively.

To ease writing output template for business rules made of both hosts and services, 3 convenience
macros having the same meaning for each type may be used: STATUS, SHORTSTATUS, and
FULLNAME, which expand respectively to the host or service status, its status abbreviated form
and its full name (host_name for hosts, or host_name/service_description for services).

Example:

Imagine you want to build a consolidated service which notifications contain links to the
underlying problems in the WebUI, allowing to acknowledge them without having to search.
You may use a template looking like: