Migrating workloads to different machine types

This tutorial demonstrates how to migrate workloads running on a Container
Engine cluster to a new set of nodes without incurring downtime for your
application. Such a migration can be useful if you want to migrate your
workloads to nodes with a different machine type.

Background

A node pool is a subset of machines that all have the same configuration,
including machine type (CPU and memory) authoriazation scopes. Node pools
represent a subset of nodes within a cluster; a container cluster can contain
one or more node pools.

When you need to change the machine profile of your Compute Engine cluster, you
can create a new node pool and then migrate your workloads over to the new
node pool.

To migrate your workloads without incurring downtime, you need to:

Mark the existing node pool as unscheduleable.

Drain the workloads running on the existing node pool.

Delete the existing node pool.

Kubernetes, which is the cluster orchestration system of Container Engine
clusters, automatically reschedules the evicted Pods to the new node pool as
it drains the existing node pool.

Note: This tutorial is only applicable if your container does not have Cluster
Autoscaling enabled. If the Autoscaler adds or removes nodes while you are
attempting a migration, you might not be able to mark the all nodes in the pool
as unscheduleable and drain them properly.

Step 1: Create a Container Engine cluster

The first step is to create a container cluster to run a sample load-balanced
web application deployment. The following command creates a new cluster with 5
nodes with default machine type (n1-standard-1):

gcloud container clusters create migration-tutorial --num-nodes=5

Note: If you are using an existing Container Engine cluster or if you have
created a cluster through Google Cloud Platform Console, you need to run the following command to retrieve
cluster credentials and configure kubectl command-line tool with them:

$ gcloud container clusters get-credentials migration-tutorial

If you have already created a cluster with the gcloud container clusters create
command listed above, this step is not necessary.

Step 2: Run a replicated web server deployment

The next step is to create a web application Deployment. The following command
will create a six replica Deployment of the nginx web server running on port
80:

To introduce instances with a different configuration, such as a different
machine type or different authentication scopes, you need to create a new
node pool.

The following command creates a new node pool with named larger-pool with five
high memory instances with n1-highmem-2machine type (a larger
machine type than the Container Engine default
n1-standard-1):

To migrate these Pods to the new node pool, you must perform the following
steps:

Cordon the existing node pool: This operation marks the nodes in the
existing node pool (default-pool) as unscheduleable. Kubernetes stops
scheduling new Pods to these nodes once you mark them as unscheduleable.

Drain the existing node pool: This operation evicts the workloads running
on the nodes of the existing node pool (default-pool) gracefully.

The above steps cause Pods running in your existing node pool to gracefully
terminate, and Kubernetes reschedules them onto other available nodes. In this
case the only available nodes are the ones in the larger-pool created in Step
3.

To make sure Kubernetes terminates your applications gracefully, your containers
should handle the SIGTERM signal. This can be used to close active connections
to the clients and commit or abort database transactions in a clean way. In your
Pod manifest, you can use spec.terminationGracePeriodSeconds field to specify
how long Kubernetes must wait before killing the containers in the Pod. This
defaults to 30 seconds. You can read more about pod termination in the
Kubernetes documentation.

First, cordon the nodes in the default-pool. You can run the following command
to get a list of nodes in this node pool:

kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool

Then cordon each node by running a kubectl cordon <NODE> command (substitute
<NODE> with the names from the previous command). The following command
iterates over each node and marks them unschedulable:

Next, drain Pods on each node gracefully. To perform the drain, use the kubectl
drain command which evicts Pods on each node.

Note: Kubernetes does not reschedule Pods that are not managed by a controller
such as Deployment, ReplicationController, ReplicaSet, Job, DaemonSet or
StatefulSet. Such Pods prevent kubectl drain commands from running, therefore
you must deploy your Pods using these controllers. For this tutorial, run
kubectl drain with --force option to clean up some Container Engine system
Pods, such as kube-proxy and fluend-cloud-logging.

You can run kubectl drain --force <NODE> by substituting <NODE> with the
same list of names passed to the kubectl cordon command.

The following command iterates each node in default-pool and drains them:

Visit the external IP address of the web Service to see if the application
serves the requests correctly from the new node pool.

Step 5: Delete the old node pool

Once Kubernetes reschedules all Pods in the web Deployment to the
larger-pool, it is now safe to delete the default-pool as it is no longer
necessary.
Run the following command to delete the default-pool:

Step 6: Cleanup

After completing this tutorial, follow these steps to remove the following
resources to prevent unwanted charges incurring on your account:

Delete the service: This step will deallocate the Cloud Load Balancer
created for your service:

kubectl delete service web

Wait for the Load Balancer provisioned for the web service to be
deleted: The load balancer is deleted asynchronously in the background when
you run kubectl delete. Wait until the load balancer is deleted by watching
the output of the following command:

gcloud compute forwarding-rules list

Delete the container cluster: This step will delete the resources that
make up the container cluster, such as the compute instances, disks and
network resources.