Overview

Taints and tolerations allow the node to control which pods should (or should not) be scheduled on them.

Taints and Tolerations

A taint allows a node to refuse pod to be scheduled unless that pod has a matching toleration.

You apply taints to a node through the node specification (NodeSpec) and apply tolerations to a pod through the pod specification (PodSpec). A taint on a node instructs the node to repel all pods that do not tolerate the taint.

Taints and tolerations consist of a key, value, and effect. An operator allows you to leave one of these parameters empty.

Table 1. Taint and toleration components

Parameter

Description

key

The key is any string, up to 253 characters. The key must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.

value

The value is any string, up to 63 characters. The value must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.

effect

The effect is one of the following:

NoSchedule

New pods that do not match the taint are not scheduled onto that node.

Existing pods on the node remain.

PreferNoSchedule

New pods that do not match the taint might be scheduled onto that node, but the scheduler tries not to.

Existing pods on the node remain.

NoExecute

New pods that do not match the taint cannot be scheduled onto that node.

Existing pods on the node that do not have a matching toleration are removed.

operator

Equal

The key/value/effect parameters must match. This is the default.

Exists

The key/effect parameters must match. You must leave a blank value parameter, which matches any.

A toleration matches a taint:

If the operator parameter is set to Equal:

the key parameters are the same;

the value parameters are the same;

the effect parameters are the same.

If the operator parameter is set to Exists:

the key parameters are the same;

the effect parameters are the same.

Using Multiple Taints

You can put multiple taints on the same node and multiple tolerations on the same pod. OpenShift Container Platform processes multiple taints and tolerations as follows:

Process the taints for which the pod has a matching toleration.

The remaining unmatched taints have the indicated effects on the pod:

If there is at least one unmatched taint with effect NoSchedule, OpenShift Container Platform cannot schedule a pod onto that node.

If there is no unmatched taint with effect NoSchedule but there is at least one unmatched taint with effect PreferNoSchedule, OpenShift Container Platform tries to not schedule the pod onto the node.

If there is at least one unmatched taint with effect NoExecute, OpenShift Container Platform evicts the pod from the node (if it is already running on the node), or the pod is not scheduled onto the node (if it is not yet running on the node).

Pods that do not tolerate the taint are evicted immediately.

Pods that tolerate the taint without specifying tolerationSeconds in their toleration specification remain bound forever.

Pods that tolerate the taint with a specified tolerationSeconds remain bound for the specified amount of time.

In this case, the pod cannot be scheduled onto the node, because there is no toleration matching the third taint. The pod continues running if it is already running on the node when the taint is added, because the third taint is the only
one of the three that is not tolerated by the pod.

Using Toleration Seconds to Delay Pod Evictions

You can specify how long a pod can remain bound to a node before being evicted by specifying the tolerationSeconds parameter in the pod specification. If a taint with the NoExecute effect is added to a node, any pods that do not tolerate the taint are evicted immediately (pods that do tolerate the taint are not evicted). However, if a pod that to be evicted has the tolerationSeconds parameter, the pod is not evicted until that time period expires.

Here, if this pod is running but does not have a matching taint, the pod stays bound to the node for 3,600 seconds and then be evicted. If the taint is removed before that time, the pod is not evicted.

Setting a Default Value for Toleration Seconds

This plug-in sets the default forgiveness toleration for pods, to tolerate the node.alpha.kubernetes.io/not-ready:NoExecute and node.alpha.kubernetes.io/unreachable:NoExecute taints for five minutes.

If the pod configuration provided by the user already has either toleration, the default is not added.

To enable Default Toleration Seconds:

Modify the master configuration file (/etc/origin/master/master-config.yaml) to Add DefaultTolerationSeconds to the admissionConfig section:

Preventing Pod Eviction for Node Problems

OpenShift Container Platform can be configured to represent node unreachable and node not ready conditions as taints. This allows per-pod specification of how long to remain bound to a node that becomes unreachable or not ready, rather than using the default of five minutes.

When the Taint Based Evictions feature is enabled, the taints are automatically added by the node controller and the normal logic for evicting pods from Ready nodes is disabled.

If a node enters a not ready state, the node.alpha.kubernetes.io/not-ready:NoExecute taint is added and pods cannot be scheduled on the node. Existing pods remain for the toleration seconds period.

If a node enters a not reachable state, the node.alpha.kubernetes.io/unreachable:NoExecute taint is added and pods cannot be scheduled on the node. Existing pods remain for the toleration seconds period.

To enable Taint Based Evictions:

Modify the master configuration file (/etc/origin/master/master-config.yaml) to add the following to the kubernetesMasterConfig section:

To maintain the existing rate limiting behavior of pod evictions due to node problems, the system adds the taints in a rate-limited way. This prevents massive pod evictions in scenarios such as the master becoming partitioned from the nodes.

Daemonsets and Tolerations

DaemonSet pods are created with NoExecute tolerations for node.alpha.kubernetes.io/unreachable and node.alpha.kubernetes.io/not-ready
with no tolerationSeconds to ensure that DaemonSet pods are never evicted due to these problems, even when the Default Toleration Seconds feature is disabled.

Examples

Taints and tolerations are a flexible way to steer pods away from nodes or evict pods that should not be running on a node. A few of typical scenrios are:

The admission controller should add a node affinity to require that the pods can only schedule onto nodes labeled with the key:value label (dedicated=groupName).

Add a label similar to the taint (such as the key:value label) to the dedicated nodes.

Nodes with Special Hardware

In a cluster where a small subset of nodes have specialized hardware (for example GPUs), you can use taints and tolerations to keep pods that do not need the specialized hardware off of those nodes, leaving the nodes for pods that do need the specialized hardware. You can also require pods that need specialized hardware to use specific nodes.

To ensure pods are blocked from the specialized hardware:

Taint the nodes that have the specialized hardware using one of the following commands:

Adding a corresponding toleration to pods that use the special hardware using an admission controller.

For example, the admission controller could use some characteristic(s) of the pod to determine that the pod should be allowed to use the special nodes by adding a toleration.

To ensure pods can only use the specialized hardware, you need some additional mechanism. For example, you could label the nodes that have the special hardware and use node affinity on the pods that need the hardware.