What is a Health Rule?

Health rules let you specify the parameters that represent what you consider normal or expected operations for your environment. The parameters rely on metric values, for example, the average response time for a business transaction or CPU utilization for a node.

The health statuses are critical, warning, normal, and unknown. When the performance of an entity affected by the rule violates the rule's conditions, a health rule violation exists.

When the health status of an entity changes, a health rule violation event occurs. Examples of health rule violation events are a health rule violation starting, ending, upgrading from warning to critical, or downgrading from critical to warning.

The health statuses of entities and health rule violations are surfaced in the controller user interface. A health rule violation event can also be used to trigger a policy, which can initiate automatic actions, such as sending alerting emails or running remedial scripts.

You create health rules using the health rule wizard, described in Configure Health Rules. The wizard groups commonly-used system entities and related metrics to simplify setting up health rules. You can also use, as is or modified, the default health rules provided by AppDynamics.

Health Rule Scopes

The health rule scope determines the set of default health rule types. You can choose the scope to get a set of default health rule types for applications, servers, or databases. For example, when you choose a mobile application as the scope, you're given health rules such as crash rates and HTTP/network error rates.

If the health rule scope is for an application, the health rules would be for business transactions, CPU/memory utilization, etc.

From Alert & Respond > Health Rules, you can select one of the following health rule scopes from the drop-down list:

Applications

User Experience: Browser Apps

User Experience: Mobile Apps

Databases

Servers

Analytics

You can also create new health rules to add to the default set for each scope. You may want to add the health rule app starts to your mobile application. This health rule is not part of the default set of health rules in the mobile app scope, so you would just need to add a new health rule.

Heath Rule Types

The health rule wizard groups health rules into types that are categorized by the entity that the health rule covers. This allows the wizard to display appropriate configuration items during the health rule creation process.

Pages: Groups metrics like DOM building time, JavaScript errors, etc. with the performance of application pages for the end user

IFrames:Groups metrics like first-byte time, requests per minute, etc. with the performance of iframes for the end user

AJAX Requests:Groups metrics like Ajax callback execution time, errors per minute, etc. with the performance of Ajax requests for the end user

Virtual Pages: Groups metrics like End User Response Time, Digest Cycles, HTML Download Time, DOM Building Time, etc. for virtual pages created with Angular. See AngularJS Support for information on what these metrics mean in the context of virtual pages.

User Experience-Mobile Apps

Mobile Apps:Groups metrics related to mobile app crashes, starts, and server calls as well as network requests and errors

Network Requests: Groups metrics like HTTP and network errors, request time, and requests per minute with network requests

Servers: Groups metrics related to hardware resources

Databases & Remote Services: Groups metrics related to response time, load, or errors with databases and other backends

Error Rates: Groups metrics related to exceptions, return codes, and other errors with applications or tiers

Information Points: Groups metrics like response time, load, or errors with information points

Service Endpoints: Java and .NET only; groups metrics like average response time, calls per minute, and errors per minute with service endpoints

Custom: Presents all the metrics collected by the agent that could affect a single business transaction, a single node or overall application performance. Use this type to create rules that evaluate custom metrics.

When you select one of these health rule types, the wizard offers you the metrics commonly associated with that type in an embedded browser.

Health Rule Schedules

The metrics associated with a health rule are evaluated according to a schedule that you control. You can configure:

Time evaluation for health rule schedules is based on the time zone of the Controller, regardless of where an app agent is situated. For example, if a Controller is in San Francisco but the app agent is in Dubai, Pacific Time applies to the health rule schedule.

All SaaS Controllers use Pacific Time (PT).

Health Rule Enabled Schedule

By default, health rules are always enabled. Instead, you can define schedules for the health rules. Built-in schedules exist for:

End of business hours

Weekday lunch

Weekday mornings

Weekdays

Weekends

You can also configure your own schedules based on UNIX cron expressions using custom values.

Health Rule Evaluation Window

The health rule evaluation window is the period of time over which the data used to evaluate the health rule is collected.

Different kinds of metrics may provide better results using different sets of data. You can manage how much data AppDynamics uses when it evaluates a particular health rule by setting the data collection time period. The default value is 30 minutes.

For metrics based on an average calculation, such as average response time, AppDynamics averages the response time over the evaluation window. A five-minute window means that the last five minutes of data is used to evaluate if the health rule was violated.

For metrics based on a sum calculation, such as the number of calls, AppDynamics uses the total number of calls counted during the evaluation window.

Health Rule Wait Time After Violation

The health rule wait time setting lets you control how often an event is generated while the conditions found to violate a health rule continue. If the Controller determines that a health rule has been violated, with a status of either Critical or Warning, an Open Critical or Open Warning event is generated. That event can be used to trigger any policies that match that the health rule, and thus to initiate any actions that the policies require.

Once an Open event has occurred, the Controller continues to evaluate the status of the health rule every minute. If the Controller continues to detect the same violation, the violation remains open with the same status. A corresponding Continues Critical or Continues Warning event may be generated to link to any related policies.

But a Continues event every minute might be too noisy for your situation. The health rule's Wait Time after Violation setting is used to throttle how often these Continues events are generated for continuing health rule violations. The default is every 30 minutes.

To use Continues Critical and Continues Warning events, adjust the default Wait Time after Violation value to the desired frequency. Then configure a policy matching that health rule with the Health Rule Violation Continues - Warning and/or Health Rule Violation Continues - Critical events selected in the Health Rule Violation Events section of the policy settings.

Note that the violations displayed in the Health Rules Violations page, under Troubleshoot, are updated only when a health rule violation event is triggered.

If the Controller is unable to evaluate the rule—for example, if a node simply stops reporting—the Evaluation Status of the health rule is marked as a grey question mark or Unknown in the Current Evaluation Status tab in the right panel of the health rules list. The current violation event remains open until the Wait Time after Violation period has elapsed, at which point the violation event is closed and a new event is triggered, causing the Health status itself of the rule to display as Unknown.

Default Health Rules

AppDynamics provides a default set of health rules for some products, such as applications and servers. These default health rules vary depending on the entity. To see the default rules, before any health rules have been added to your AppDynamics installation:

Select the Alert & Respond tab at the top.

Click Health Rules in the left panel.

From the drop-down listin the right panel select the entity.The default health rules are displayed.

If any of these predefined health rules are violated, the affected items are marked in the UI as yellow-orange if it is a Warning violation and red if it is a Critical violation.

In many cases, the default health rules may be the only health rules that you need. If the conditions are not configured appropriately for your application, you can edit them. You can also disable the default health rules.

Health Rule Entities

A health rule can evaluate metrics associated with an entire application or a limited set of entities. For example, you can create business transaction performance health rules that evaluate certain metrics for all business transactions in the application or node health rules that cover all the nodes in the application or all the nodes in specified tiers. The default health rules are in this category.

You can also create health rules that are narrowly applied to a limited set of entities in the application, or even a single entity such as a node or a JMX object or an error. For example, you can create a JMX health rule that evaluates the initial pool size and number of active connections for specific connection pools in nodes that share certain system properties.

The health rule wizard lets you specify precisely which entities the health rule affects, enabling the creation of very specific health rules. For example, for a business transaction, you can limit the tiers that the health rule applies to, or limit the health rule application to specific business transactions by name or by names that match certain criteria.

For node health rules, you can specify the type of the node, such as Java, .NET, PHP, and so on.

You can specify that a health rule applies only to nodes that meet certain criteria.

Note that the Type of Node pulldown menu does not allow you to specify Node.js, Python, or Web Service nodes. To restrict a health rule to these types of nodes, you can specify the affected entity as a tier and then select only Node.js or Python or Web Service tiers as needed. Or to more finely-tune the affected nodes, use the Nodes matching the following criteria menu item to specify node names or matching environment variables or meta-info to restrict the health rule to the nodes you want.

Entities Affected by a Health Rule

For an Overall Application Performance health rule type, the health rule applies to the entire application, regardless of the business transaction, tier, or node.

If you configure your Health Rule to work with tiers, you must also configure the parallel policy to work with tiers. However, if you configure your Health Rule to work with tiers, but your policy is configured with nodes first, you will not trigger any actions or notifications. The inverse is also true. The following screenshots show examples of a health rule and a policy created in the correct order.

For a Business Transaction Performance health rule type, you can apply the health rule to:

All Business Transactions in the application

All Business Transactions within tiers that you select

Individual Business Transactions that you select

Business Transactions with names that have patterns matching criteria that you specify (such as all Business Transactions with names that start with "INV")

For a Node Health—Transaction Performance or Node Health—Hardware, JVM, CLR health rule types, you can apply the health rule to:

All tiers in the application

Individual tiers that you specify

All nodes in the application

Nodes types, such as Java nodes, PHP nodes, and so on.

Nodes within specified tiers

Individual nodes that you specify

Nodes with names, meta-data, environment variables or JVM system environment properties with matching criteria that you specify

For a Node Health—JMX health rule type, you can apply the health rule to:

All JMX instance names (MBeans) in the application

Specific MBeans

MBeans on certain nodes

Specific JMX objects

All nodes in the application

Nodes within specified tiers

Individual nodes that you specify

Nodes with names, meta-data, environment variables or JVM system environment properties with matching criteria that you specify

For a Databases & Remote Services health rule type, you can apply the health rule to:

All databases and remote services in the application

Individual databases and remote services that you specify

Databases and remote services with name matching criteria that you specify

For an Error Rates health rule type, you can apply the health rule to:

All Errors in the application

Specific error types that you select

Errors with the specified tiers

Errors with names that have patterns matching criteria that you specify

For Information Points health rule types, you can apply the health rule to:

All servers in the application

Information points that you specify

Information points with names matching criteria that you specify

For Service Endpoint health rule types, you can apply the health rule to:

All service endpoints in the application

Service endpoints that you specify

Service endpoints with names matching criteria that you specify

Service endpoints within specified tiers

For a Custom health rule type, you can apply the health rule to:

A business transaction that you specify

A node that you specify

Overall application performance

Health Rule Conditions

You define the acceptable range for a metric by establishing health rule conditions. A health rule condition sets the metric levels that constitute a Warning status and a Critical status.

A condition consists of a Boolean statement that compares the current value of a metric against one or more static or dynamic thresholds based on a selected baseline. If the condition is true, the health rule violates. The rules for evaluating a condition using multiple thresholds depend on configuration.

Static thresholds are straightforward. For example, is a business transaction's average response time greater than 200 ms?

Dynamic thresholds are based on a percentage in relation to, or a standard deviation from, a baseline built on a rolled-up baseline trend pattern. A daily trend baseline rolls up values for a particular hour of the day during the last thirty days, whereas a weekly trend baseline rolls up values for a particular hour of the day, for a particular day of the week, for the last 90 days. For more information about baselines, see Dynamic Baselines.

You can define a threshold for a health rule based on a single metric value or on a mathematical expression built from multiple metric values.

The following are typical health rule conditions:

If the value of the Average Response Time is greater than the default baseline by 3 X the Baseline Standard Deviation . . .

If the count of the Errors Per Minute is greater than 1000 . . .

If the number of MB of Free Memory is less than 2 X the Default Baseline . . .

If the value of Errors per Minute/Calls per Minute over the last 15 days > 0.2 . . .

The last example combines two metrics in a single condition. You can use the expression builder embedded in the health rules wizard to create conditions based on a complex expression comprising multiple interdependent metrics.

Multi-Statement Conditions

Often a condition consists of multiple statements that evaluate different metrics. A health rule is violated either when one of its condition evaluates to true or when all of its conditions evaluate to true, depending on how the condition is configured.

For example, a health rule that measures response time—average response time greater than some baseline value—makes more business sense if it is correlated with the application load —for example, 50 concurrent users or 10,000 calls per minute—on the system. You may not want to use the response time condition alone in a policy that initiates a remedial action if the load is low, even if the response time threshold is reached. The first part of the condition would evaluate the response time performance measurement and the second part would ensure that the health rule is violated only when there is sufficient load.

Health Rule Evaluation Scope

The health rule evaluation scope defines how many nodes in the affected entities must violate the condition before the health rule is considered violated.

Evaluation scope applies only to business transaction performance type health rules and node health type health rules in which the affected entities are defined at the tier level.

For example, you may have a critical condition in which the condition is unacceptable for any node, or you may want to consider the condition a violation only if the condition is true for 50% or more of the nodes in a tier.

Options for this evaluation scope are:

The tier average: Evaluation is performed on the tier average instead of the individual nodes.

Any node: If any node exceeds the thresholds, the rule is violated.

Percentage of the nodes: If x% of the nodes exceed the thresholds, the rule is violated.

Number of nodes: If x nodes exceed the thresholds, the rule is violated.

Critical and Warning Conditions

Conditions are classified as either critical or warning conditions.

Critical conditions are evaluated before warning conditions. If you have defined a critical condition and a warning condition in the same health rule, the warning condition is evaluated only if the critical condition is not true.

The configuration procedures for critical and warning conditions are identical, but you configure these two types of conditions in separate panels. You can copy a critical condition configuration to a warning configuration and vice-versa and then adjust the metrics in the copy to differentiate them. For example, in the Critical Condition panel you can create a critical condition based on the rule:

If the Average Response Time is greater than 1000

Then from the Warning Condition panel, copy that condition and edit it to be:

If the Average Response Time is greater than 500

As performance changes, a health rule violation can be upgraded from warning to critical if performance deteriorates to the higher threshold or downgraded from critical to warning if performance improves to the warning threshold.

Preparing to Set Up Health Rules

AppDynamics recommends the following process to set up health rules for your application:

Identify the key metrics on the key entities that you need to monitor.

Create schedules for health rules, if needed.In some situations, a health rule is more useful if it runs at a particular time. See Health Rule Schedules.

After you set up health rules you must configure policies and actions to run when health rules are violated. See Policies and Actions.

Health Rule Management

To view current health rules, including the default health rules, and to access the health rule wizard, click Alert & Respond > Health Rules. Then choose the type of entity for which you want health rules from the pulldown menu at the top.

Current health rules are listed in the left panel. If you click one of these rules, a list appears in the right panel showing which entities this selected health rule affects and what the status of the latest evaluation is. You can also select the Evaluation Events tab to see a detailed list of evaluation events.

In the left panel, you can directly delete or duplicate a health rule. From here you can also access the health rule wizard to add a new rule or edit an existing one.

You can turn off evaluation of all health rules in the selected entity by clearing the Evaluate Health Rules check box. Check it when you want health rule evaluation to start again.