Structure of the Health Rule Wizard

Affects: Sets the entities evaluated by the health rule. The options presented vary according to the health rule type set in the Overview panel.

Critical Condition: Sets the conditions, whether all or any of the conditions need to be true for a health rule violation to exist, and the evaluation scope (business transaction and node health policies defined at the tier level only); it also includes an expression builder to create complex expressions containing multiple metrics.

You can navigate among these panels using the Back and Next buttons at the bottom of each panel or by clicking their entries in the left panel of the wizard. You should configure the panels in order, because the configuration of the health rule type in the Overview panel determines the available affected entities in the Affects panel as well as the available metrics in the Condition panels.

Use the Health Rules Wizard

This section describes the procedure for creating health rules of one of the standard types.

Configure Generic Heath Rule Settings

You configure generic settings in the Overview panel.

Enter a name. If a name already exists, you can change it.

Check Enabled to enable the rule, clear the check box to disable it.

Select a health rule type by clicking the name in the list.This setting affects metrics offered for configuration in subsequent panels in the wizard, so you must select a health rule type before continuing to other panels.

If the health rule is always (24/7) enabled, check the Always check box.If the health rule is enabled only at certain times, clear the Always check box and either:

Select a predefined time interval from the During these times drop-down menuor

Click +See To Create a New Health Rule Schedule below.

Click the dropdown menu Use the last <> minutes of data and select a value between 1 and 360 minutes for the evaluation window. This is the amount of recent data to use to determine whether a health rule violation exists. This value applies to both critical and warning conditions. See Health Rule Evaluation Window.

In the Wait Time after Violation section, enter the number of minutes to wait before evaluating the rule again for the same affected entity in which the violation occurred. See Health Rule Wait Time After Violation.

Save your configuration.

Create a Health Rule Schedule

In the Overview window of the Health Wizard, clear the Always check box if it is checked.

Click + to the right of the During these times dropdown list. Enter a name for the schedule.

Enter an optional description of the schedule.

Enter the start and end times for the schedule as cron expressions. For example, the following custom schedule specifies a start time value of "* * 13 ? * 2-6" and end time of "* * 15 ? * 2-6", directing the health rule to be evaluated from 1pm to 3pm, Monday through Friday:For additional examples, you can select a built-in schedule from the During these times menu and click the View Schedule icon () to see the cron expression that makes up the built-in schedule.The Controller cron expressions are evaluated in PDT, and their format is based on Quartz Scheduler cron expressions. Therefore, for more information, see the Quartz Scheduler documentation.

Save your configuration.

After creating new health rule schedule, delete it by choosing it in the During these times menu and clicking the minus icon to the right of the menu.

Configure Affected Entities

The Affects panel lets you define what your health rule affects. The choices you are offered depends on the health rule type you chose in the Overview panel. To use the panel:

Use the dropdown menu to select the the entities affected by this health rule.The entity affected and the choices presented in the menu depend on the health rule type configured in the Overview window.See Entities Affected by a Health Rule for information about the types of entities that can be affected by the various health rule types.

If you select entities based on matching criteria, specify the matching criteria.For nodes, you can restrict the node on criteria such as meta-info, environment variables, and JVM system environment properties. Meta-info includes key value pairs for:

key: supportsDevMode

key:ProcessID

key: appdynamics.ip.addresses

any key passed to the agent in the appdynamics.agent.node.metainfo system property

If you are configuring a JMX health rule, select the JMX objects that the health rule is evaluated on. See JMX Health Rules.

Configure Health Rule Conditions

The high-level process for configuring conditions is:

Determine the number and kind of metrics the health rule should evaluate. For each performance metric you want to use, create a condition.

You can use a single condition component or multiple condition components for a single condition state.

You can use values based on complex mathematical expressions.

Decide whether the health rule is violated if all of the tests are true or if any single test is true.

For business transaction performance health rules and node health rule types that specify affected entities at the tier level, decide how many of the nodes must be violating the health rule to produce a violation event. See Health Rule Evaluation Scope.

To configure a critical condition use the Critical Condition window. To configure a warning condition use the Warning Condition window.

The configuration processes for critical and warning conditions are identical.

You can copy the settings between Critical and Warning condition panels and just edit the fields you desire. For example, if you have already defined a critical condition and you want to create a warning condition that is similar, in the Warning Condition window click Copy from Critical Condition to populate the fields with settings from the Critical condition.

To Create a Condition:

In the Critical Condition or Warning Condition window, click + Add Condition to add a new condition component.The row defining the component opens. See To Configure a Condition Component. Continue to add components to the condition as needed.

From the drop-down menu above the components, select All if all of the components must evaluate to true to constitute violation of the rule. Select Any if a health rule violation exists if any single component is true.

For health rules based on the following health rule types:

business transaction

node health-hardware

node health-transaction performance

you must specify evaluation scope:

If the Health Rule will violate if the conditions above evaluate to true section is visible, click the appropriate radio button to set the evaluation scope.

If you select percentage of nodes, enter the percentage. If you select number of nodes, enter the absolute number of nodes.

To Configure a Condition Component:

In the first field of the condition row, enter a name for the condition.This name is used in the generated notification text and in the AppDynamics console to identify the violation.

To select the metric on which the condition is based, do one of the following:

To specify a simple metric, click the metric icon to open a small metric browser and select Specify a Metric from the Metric Tree. The browser display metrics appropriate to the health rule type. Select the metric to monitor and click Select Metric.or

To build an expression using multiple metric values, click the gear icon at the end of the row and select Use a mathematical expression of 2 or more metric values.This opens the mathematical expression builder where you can construct the expression to use as the metric. See To Build an Expression for details on how to do this.

From the Value drop-down menu before the metric, select the qualifier to apply to the metric from the following options:

Qualifier Type

What This Means

Minimum

The minimum value reported across the configured evaluation time length. Not all metrics have this type.

Maximum

The maximum value reported across the configured evaluation time length. Not all metrics have this type.

Value

The arithmetic average of all metric values reported across the configured evaluation time length. This value is based on the type of the metric.

Sum

The sum of all the metric values reported across the configured evaluation time length.

Count

The number of times the metric value has been measured across the configured evaluation time length.

Group Count

The number of nodes contributing to a metric value, generally relevant for application or tier level metrics.

Current

The value for the current minute.

4. From the drop-down menu after the metric, select the type of comparison by which the metric is evaluated.

To limit the effect of the health rule to conditions during which the metric is within a defined distance (standard deviations or percentages) from the baseline, select Within Baseline from the menu. To limit the effect of the health rule to when the metric is not within that defined distance, select Not Within Baseline. Then select the baseline to use, the numeric qualifier of the unit of evaluation and the unit of evaluation. For example:

Within Baseline of the Default Baseline by 3 Baseline Standard Deviations

To compare the metric with a static literal value, select < Specific value or > Specific Value from the menu, then enter the specific value in the text field. For example:

Value of Errors per Minute > 100

To compare the metric with a baseline, select < Baseline or > Baseline from the drop-down menu, and then select the baseline to use, the numeric qualifier of the unit of evaluation and the unit of evaluation. For example:

Maximum of Average Response Time is > Baseline of the Daily Trend by 3 Baseline Standard Deviations

The "baseline percentage" is the percentage above or below the established baseline at which the condition will be triggered. If, for example, you have a baseline value of 850 and you have defined a baseline percentage of "> 1%", the condition is true if the value is > [850+(850x0.01)] or 859. In addition, to prevent too small sample sets from triggering health rules violations, these rules are not evaluated if the load (the number of times the value has been measured) is less than 1000. So if, for example, a very brief time slice is specified, the rule may not violate even if the conditions are met, because the load is not large enough.

The Evaluate to true on no data option controls the evaluation of the condition in cases where any metric on which the condition is based returns no data. The default when no data is returned is for the condition to evaluate to unknown. If the health rule is based on all the conditions evaluating to true, having no data returned may affect whether the health rule triggers an action.

If you want the condition to evaluate to true whenever a metric on which the condition depends returns no data, check the Evaluate to true on no data option. Note however that this option does not affect the evaluation of unknown in the case where there is not yet enough data for the rule to evaluate. For example, if the health rule is configured to evaluate the last 30 minutes of data and a new node is added, the condition will evaluate to unknown for the first 30 minutes even if the Evaluate to true on no data box is checked.

If you want all of the conditions to evaluate to true, you can check Evaluate all as true on no data instead of specifying the option for each condition separately. If you check his option and then add more conditions, new conditions will not be affected automatically. To apply this option to the added conditions, uncheck and then check the Evaluate all as true on no data check box again when you are finished adding conditions.

Click Save when done.

Using Health Rule Conditions to evaluate agent availability metrics can result in false positives. For example:

Agents may not be connecting with controllers due to communication errors for a couple of minutes.

Data may be delayed for a couple of minutes due to latency issues.

You can avoid occasional 1-2 minute metric loss due to network issues or late arrival by configuring your Health Rule as follows:

Select Nodes for what the Health Rule affects. Tiers can be set, but more often we recommend setting Nodes.

Select Node Health - Hardware, JVM, CLR as the Type.

Use the last 5 minutes, with a wait time of 10 minutes.

Set your condition to be the Sum of < Specific Value of 3.

This will generate a violation when the agent is down for more than 2 minutes in the last 5 mins.

To remove a condition component:

Remove a component condition by clicking the delete icon.

To build an expression:

To access the expression builder to create a complex expression as the basis of a condition, click the gear icon at the end of the row and select Use a mathematical expression of 2 or more metric values.

In the expression builder, use the Expression pane to construct the expression. Use the Variable Declaration pane to define variables based on metrics to use in the expression.

For example, this is a metric to measure the percent of slow business transactions. See the screenshot that follows for the UI location where each step is performed.

Custom Metrics in Multiple Entities

To create a health rule on a custom metric in a single business transaction, node, or overall application performance, you specify the health rule type as "custom" and when you configure the condition component, in the Select Metric window choose Specify a Metric from the Metric Tree and select the metric from the embedded metric browser.

A different use case is to create a rule that evaluates a custom metric that exists across various entities, for example across several nodes. You want to do this with one health rule, you do not want to create a separate health rule for each node. In this case, you need to specify the custom metric using the relative metric path to the metric instead of selecting the metric from the embedded metric browser.

First get the relative path to the metric and then configure the health rule using that relative path.

To get the relative metric path for a multi-entity metric:

Navigate to the Metric Browser by selecting Metric Browser in the left navigation pane.

Select the metric that you want to use for the condition.

Right-click and select Copy Full Path.

Save this value in a file from which you can copy it later.

The following example gets the metric path for the CPU %Busy metric for the Inventory Server tier. This would be appropriate to use in a health rule that affects all the nodes in that tier.

To configure a health rule that evaluates the custom metric over multiple entities:

In the Overview panel of health rule wizard choose the health rule type for the kind of entity that you are monitoring.

In the Affects panel select the affected entity.

When you create the condition component that uses the metric, in the Select Metric window choose Specify a Relative Path Metric.

Crop the relative metric path that you saved from the metric browser by doing one of the following:

For all health rule types except Node Health-Hardware, JVM, CLR or Custom, crop the path to use the metric name alone - for example, Average Wait Time (ms))

For Node Heath-Hardware, JVM, CLR and Custom health rule types, crop the path to use everything after the entity, for example, after the Node name. In the example below the cropped path would look like this.

Paste the cropped relative metric path in the relative metric path field of the Select Metric window.

Click Select Metric.

Additional Considerations

Your application status is based on health rules for the current time range. If you disable old health rule policies, or enable new ones, you might see errors in red in your application status, even if there are no current critical events based on the new policies. To verify that your new or disabled health rule policies have taken effect, change the time range in your dashboard to a smaller, more recent timeframe.

When you are configuring health rules for business transactions with a very fast average response time (ART) such as 25 ms, using standard deviation as a criterion can cause the health rule to be violated too frequently. This is because a very small increase in response time can represent multiple standard deviations. In this case, consider adding a second condition that sets a minimum ART as a threshold. For example, if you don't want to be notified unless ART is over 50 ms, you could set your threshold as: ART > 2 Standard Deviations and ART > 50 ms.

Similarly, when configuring health rules for calls-per-minute (CPM) metrics, the health rule may never be violated if the condition is using standard deviations, and the resulting value is below zero. In this case, consider adding a second condition that checks for a zero value, such as: CPM < 2 Standard Deviations and CPM < 1.