Best practices for restarting nodes

Restarting nodes directly may cause an exception in clusters. In the context of Alibaba Cloud use cases, this document introduces the best practices for restarting nodes in the situations such as performing active Operation & Maintenance (O&M) on Container Service.

Check the high availability configurations of business

Before restarting Container Service nodes, we recommend that you check or modify the following business configurations. In this way, restarting nodes cannot cause the exception of a single node and the business availability cannot be impaired.

Data persistence policy of configurations

We recommend the data persistence for external volumes of important data configurations such as configurations of logs and business. In this way, after the container is restructured, deleting the former container cannot cause the data loss.

Best practices

We recommend that you check the high availability configurations of business by reading the preceding instructions. Then, follow these steps in sequence on each node. Do not perform operations on multiple nodes at the same time.

Back up snapshots

We recommend that you create the latest snapshots for all the related disks of the nodes and then back up the snapshots. When starting the shut-down nodes, an exception occurs because the server is not restarted for a long time and the business availability is impaired. However, by backing up the snapshots, this can be avoided.

Verify the container configuration availability of business

For a swarm cluster, restarting the corresponding business containers on nodes makes sure that the containers can be pulled up again normally.

Verify the running availability of Docker Engine

Try to restart Docker daemon and make sure that the Docker Engine can be restarted normally.

Perform related O&M

Perform the related O&M in the plan, such as updating business codes, installing system patches, and adjusting system configurations.

Restart nodes

Restart nodes normally in the console or system.

Check the status after the restart

Check the health status of the nodes and the running status of the business containers in the Container Service console after restarting the nodes.