HBase Health Tests

HBase Master Health

This is an HBase service-level health test that checks for an active, healthy HBase Master. The test returns "Bad" health if the service is running and an active Master cannot be found.
If an active Master is found, then the test checks the health of that Master as well as the health of any standby Masters configured. A "Good" health result will only be returned if the active Master
is healthy and all running standby Masters are healthy. A failure of this health test may indicate stopped or unhealthy Master roles, or it may indicate a problem with communication between the
Cloudera Manager Service Monitor and the HBase service. Check the status of the HBase service's Master roles and look in the Cloudera Manager Service Monitor's log files for more information when
this test fails. This test can be enabled or disabled using the Active Master Health Test HBase service-wide monitoring setting. The check for a healthy standby Master
can be enabled or disabled with Backup Masters Health Test. In addition, the HBase Active Master Detection Window can be used to adjust the
amount of time that the Cloudera Manager Service Monitor has to detect the active HBase Master before this health test fails.

When computing the overall HBase cluster health, consider the health of the backup HBase Masters.

hbase_backup_masters_health_enabled

true

no unit

HBase Active Master Detection Window

The tolerance window that will be used in HBase service tests that depend on detection of the active HBase Master.

hbase_active_master_detecton_window

3

MINUTES

HBase RegionServer Health

This is a HBase service-level health test that checks that enough of the RegionServers in the cluster are healthy. The test returns "Concerning" health if the number of healthy
RegionServers falls below a warning threshold, expressed as a percentage of the total number of RegionServers. The test returns "Bad" health if the number of healthy and "Concerning" RegionServers
falls below a critical threshold, expressed as a percentage of the total number of RegionServers. For example, if this test is configured with a warning threshold of 95% and a critical threshold of
90% for a cluster of 100 RegionServers, this test would return "Good" health if 95 or more RegionServers have good health. This test would return "Concerning" health if at least 90 RegionServers have
either "Good" or "Concerning" health. If more than 10 RegionServers have bad health, this test would return "Bad" health. A failure of this health test indicates unhealthy RegionServers. Check the
status of the individual RegionServers for more information. This test can be configured using the HBase HBase service-wide monitoring setting.

Short Name: RegionServer Health

Property Name

Description

Template Name

Default Value

Unit

Healthy RegionServer Monitoring Thresholds

The health test thresholds of the overall RegionServer health. The check returns "Concerning" health if the percentage
of "Healthy" RegionServers falls below the warning threshold. The check is unhealthy if the total percentage of "Healthy" and "Concerning" RegionServers falls below the critical threshold.

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.