Troubleshooting

Default alerts

Alert triggers continuously to ensure that the entire Alerting pipeline is functional. For more information, see Dead Man’s Switch in Configuring Alertmanager.

AlertmanagerConfigInconsistent

critical

The configuration of the instances of the Alertmanager cluster for a given service are out of sync.

AlertmanagerDownOrMissing

warning

Alertmanager down or not discovered. An unexpected number of Alertmanagers are scraped or Alertmanagers have disappeared from discovery.

APIServerErrorsHigh

warning/critical

The API server responds to a lot of requests with errors.

APIServerLatencyHigh

warning/critical

The response latency of the API server to clients is high.

DaemonSetRolloutStuck

warning

A daemon set is not fully rolled out to all desired nodes.

DeploymentGenerationMismatch

warning

The observed generation of a deployment does not match its desired generation.

DeploymentReplicasNotUpdated

warning

A deployment has not been rolled out properly. Either replicas are not being updated to the most recent version, or not all replicas are ready. The alert does not fire if the deployment was paused intentionally.

FailedReload

warning

Reloading Alertmanager's or Prometheus’ configuration has failed for a given namespace.

FdExhaustionClose

two default alerts, with two severities: warning and critical

File descriptors for the given job, namespace, pod, or instance will soon be exhausted.

K8SApiServerLatency

warning

Kubernetes API server latency is high. More than 99th percentile latency for given requests to the kube-apiserver is above 1 second.

K8SApiserverDown

critical

The API server is unreachable. Prometheus failed to scrape the API server(s), or all API servers have disappeared from service discovery.

K8SControllerManagerDown

critical

There is no running K8S controller manager. Deployments and replication controllers are not making progress.

K8SKubeletDown

warning

Many kubelets cannot be scraped. Prometheus failed to scrape the listed percentage of kubelets, or all kubelets have disappeared from service discovery.

K8SKubeletTooManyPods

warning

Kubelet is close to pod limit. The given kubelet instance is running the listed number of pods, which is close to the limit of 110.

K8SManyNodesNotReady

critical

More than 10% of the listed number of Kubernetes nodes are NotReady.

K8SNodeNotReady

warning

The Kubelet on the listed node has not checked in with the API, or has set itself to NotReady, for more than an hour.

K8SSchedulerDown

critical

There is no running Kubernetes scheduler. New pods are not being assigned to nodes.

NodeExporterDown

warning

Prometheus could not scrape a node-exporter for more than 10m, or node-exporters have disappeared from discovery.

NodeDiskRunningFull

warning/critical

If disks keep filling up at the current pace they will run out of free space within the next hours.

PodFrequentlyRestart

warning

A pod is restarting several times an hour.

PrometheusNotConnectedToAlertmanagers

warning

A monitored Prometheus instance is not connected to any Alertmanagers. Any firing alerts will not be sent anywhere.

PrometheusNotificationQueueRunningFull

warning

Prometheus is generating more alerts than it can send to Alertmanagers in time.

PrometheusErrorSendingAlerts

warning/critical

Prometheus encounters errors while trying to send alerts to Alertmanagers.