Disaster recovery of scheduler and controller manager pods

It is possible to recover a Kubernetes cluster from the failure of certain control plane components by first replacing failed control plane pods with manually scheduled temporary instances. These temporary instances will then schedule permanent replacements for the failed components.

If the API server is still running, a failed kube-scheduler, or kube-controller-manager, or both may be replaced with the following process.

Recovering a scheduler

To recover a failed scheduler pod, place a new kube-scheduler pod into the cluster. Delete the temporary recovery pod once it has created a new kube-scheduler, and inspect the health of the pod to confirm.

If using AWS EC2, your master node name will be returned with the format: ip-12-34-56-78.us-west-2.compute.internal. Use this value as the master nodeName when creating your temporary scheduler pod, as described below.
Wrap the spec in a pod header and specify the name of the master node in nodeName:

If using AWS EC2, your master node name will be returned with the format: ip-12-34-56-78.us-west-2.compute.internal. Use this value as the master nodeName when creating your temporary controller manager pod, as described below.

Then, wrap the pod spec in a pod header and specify the name of the master node in nodeName: