Overview

Kubernetes Engine's node auto-repair feature helps you keep the nodes
in your cluster in a healthy, running state. When enabled, Kubernetes Engine
makes periodic checks on the health state of each node in your cluster. If a
node fails consecutive health checks over an extended time period
(approximately 10 minutes), Kubernetes Engine initiates a repair process
for that node.

Repair criteria

Kubernetes Engine uses the node's health status to determine if a node
needs to be repaired. A node reporting a Ready status is considered healthy.
Kubernetes Engine triggers a repair action if a node reports consecutive
unhealthy status reports for a given time threshold (approximately 10 minutes).
An unhealthy status can mean:

A node reports a NotReady status on consecutive checks over the given time
threshold.

A node does not report any status at all over the given time threshold.

A node's boot disk is out of disk space for an extended time period.

Note: You can manually check your node's health signals at any time by using the
kubectl get nodes command in the gcloud command-line tool.

Node repair process

If Kubernetes Engine detects that a node requires repair, the node is
drained and re-created. The drain might not succeed if the node is unresponsive
or is too unhealthy to process the drain command.

If multiple nodes require repair, Kubernetes Engine repairs one node at a
time, with each repair lasting approximately 5-10 minutes. If you disable node
auto-repair at any time during the repair process, the in-progress repairs are
not cancelled and still complete for any node currently under repair.

Kubernetes Engine generates an entry in its operation logs for any
automated repair event. You can check the logs by using the gcloud container
operations list command.

Enabling node auto-repair

You enable node auto-repair on a per-node pool basis. When you create a
cluster, you can enable or disable auto-repair for the cluster's default node
pool. If you create additional node pools, you can enable or disable node
auto-repair for those node pools, independent of the auto-repair setting for the
default node pool.