Persistent Storage with Red Hat OpenShift on VMware

This article is a technical summary with my experience of the Red Hat OpenShift Container Platform (OCP). Starting with this article, I publish some stats, thoughts about the creative writing process. I got involved in a sophisticated storage problem with OpenShift. Under the hood, it is Kubernetes trying to allocate persistent storage from the VMware infrastructure. Understanding and troubleshooting the problems was a challenge.

Stats

In the stats section, you find information about the creative writing process and the problem domain.

Estimated reading time: 21 minutes, 45 seconds

3261 words

783 lines

Wisdom:

What I cannot build, I do not understand. (Richard Feynman)

Stats for Geeks

Stats for Geeks is a fun section to illustrate that writing can be challenging but also fun if you are into technology and geek culture.

OCP is also an abbreviation for Omni Consumer Products, the megacorporation from the movie RoboCop.

OCP is an anagram for POC (proof of concept).

On the International Women's Day, I start writing this article. In our company nothing special. We don't distinguish between female and male engineers.

Products and Technology

In a customer project, I am supporting the infrastructure team. A dedicated production server farm uses VMware to provides services for its customers. The OpenShift Container Platform runs on virtual machines to deploy Docker containers with Kubernetes. The production platform is highly available. The platform has three master nodes and multiple application nodes.

For our non-technical users a glossary for the beauty of distributed systems in a world full of containerised software.

TLDR; (too long don't read)

Bottom-up explanation:

Docker is the software to run your desired software in containers

Kubernetes orchestrates your containers to multiple nodes and ensures high availability with pods

The cluster state is persisted in etcd

OpenShift uses Kubernetes and Docker

The OpenShift Platform is running on virtual machines, provided by the VMware vSphere solution

Docker Terminology

Docker = computer program that performs operating-system-level virtualisation, run software packages in containers

container = A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.

Kubernetes Terminology

Kubernetes (k8s) = open-source container orchestration from Google

kubectl = command line tool for Kubernetes

etcd = etcd is a distributed key-value store that provides a reliable way to store data across a cluster of machines.

pod = A pod (as in a pod of whales or pea pod) is a group of one or more containers (such as Docker containers), with shared storage/network, and a specification for how to run the containers.

Storage Architecture

Find below a brief overview of the storage architecture in the OpenShift Container Platform.

About Storage

Containers are stateless and ephemeral, but applications are stateful and need persistent storage. Data infrastructure software like PostgreSQL, Elasticsearch or Apache Kafka must store their data on persistent storage volumes; otherwise, we lose stored data after every restart, relocation or redistribution. Kubernetes manages storage in OpenShift. You can either invoke the OpenShift command line (oc) our use the Kubernetes command line (kubectl) to manage the underlying storage infrastructure.

About Storage High Availability

High availability of storage in the infrastructure is the responsibility of the underlying storage provider. The VMware vSphere platform provides several solutions for that. VMware adds this persistent storage support to Kubernetes through a plugin called vSphere Cloud Provider. Kubernetes provides many possibilities. See below an overview for VMware vSphere.

Storage Class

A StorageClass provides a way for administrators to describe the «classes» of storage they offer. Different classes might map to quality-of-service levels, or backup policies, or arbitrary policies determined by the cluster administrators. Kubernetes itself is unopinionated about what classes represent. This concept is sometimes called «profiles» in other storage systems.

Persistent Volume Claim (PVC)

PVCs are requests for PVs and also act as claim checks to the resources

claiming more storage than the PV provides, results in failure

VMDK

Since we are dealing with virtual disk, VMware provides several disk types:

zeroedthick (default) – Space required for the virtual disk is allocated during creation.

eagerzeroedthick – Space required for the virtual disk is allocated at creation time. In contrast to zeroedthick format, the data remaining on the physical device is zeroed out during creation. It takes the longest time of all disk types.

thick – Space required for the virtual disk is allocated during creation.

thin – Space required for the virtual disk is not allocated during creation, but is supplied, zeroed out, on demand at a later time.

Configure VMware vSphere for OpenShift

Ensure that your vcenter-user has all necessary permissions! Otherwise, you get for instance this kind of error messages:

The is a minor difference between the commercial VMware Tools and Open VM Tools extension. For server usage, the open-vm-tools are sufficient.

Enable UUID

Set the disk.EnableUUID parameter to true for each Node VM. This setting ensures that the VMware vSphere’s Virtual Machine Disk (VMDK) always presents a consistent UUID to the VM, allowing the disk to be mounted properly.

Configuration

Installing with Ansible also creates and configures the following files for your OpenShift vSphere environment:

/etc/origin/cloudprovider/vsphere.conf

/etc/origin/master/master-config.yaml

/etc/origin/node/node-config.yaml

Verify that you have all identical files on all nodes!

Provisioning Methods

To understand my difficulty I have to provide the basics. There are two methods for vSphere storage:

Static Provisioning

Dynamic Provisioning

Static Provisioning

This approach worked with minor adjustments to the given OpenShift examples. This process involves the cluster administrator and OpenShift developer.

Dynamic Provisioning

The vSphere Cloud Provider plugin for Kubernetes can perform step 1 and 2 and thus dynamically provision storage only with PVCs. To accomplish that we have to define a StorageClass definition. The administrator is not involved in this procedure.

This approach has given me headaches since there were multiple pitfalls and obstacles to overcome.

Static Provisioning

These steps are the standard procedure.

Create a virtual disk.

Create a persistent volume PV for that disk.

Create a persistent volume claim PVC for the PV.

Let pod claim PVC.

Create a virtual disk

As to layout, we need to create a virtual disk on the vSphere storage. Create a virtual disk with 10 GB disk space. Ensure that the volumes dir exists on the storage. As the vmdk name, I choose jesus.vmdk because of it always funny when the cluster admin tells me that he had found Jesus.

The claim name is vinh, and we map the claim to the persistence volume pv0001. Pay attention that we claim only 1 GB of 10 GB. That makes no sense, but it works. Requesting more storage than the persistent volume has, results to failure.

Dynamic Provisioning

The OpenShift Container Platform persistent volume framework enables dynamic provisioning and allows administrators to provision a cluster with persistent storage. The framework also gives users a way to request those resources without having any knowledge of the underlying infrastructure.

Storageclass Definition

To perform dynamic provisioning, we need a definition for a default storage. Therefore we use the annotation storageclass.kubernetes.io/is-default-class: "true". Check for a similar StorageClass definition.

Repeat that for every node. After the patch, the dynamic provisioning still didn't work.

Check logs

Search in /var/log/containers/*.log for datacenter.go, vsphere.go and pv_controller.go. You only see those messages by increasing the log verbosity of the OpenShift platform. See below a prettified message stack of stderr.

The problem is that there no recursive mechanism. Our vm nodes are in the sub-directories Master Nodes and Application Nodes. The directory name contains a space, which also seems problematic if you don't use proper escapes. I just followed my hunch and let the VMware administrators move the vm nodes to the right directory. Therefore we had to shut down the whole OpenShift Container Platform first.

Retry claim

After the reboot, the dynamic provisioning works. The persistence volume claim exists and works.

The virtual disk kubernetes-dynamic-pvc-548e75c0-4426-11e9-ac8a-005056ab11cb.vmdk was automatically created. Dynamic Provisioning has some advantages over static provisioning. For one as mentioned, you don't have to bother about the underlying infrastructure. Depending on your point of view, another advantage is the automatic creation and deletion of the virtual disk and persistence volume through the persistence volume claim. Better ensure that you don't need the data if you are going to delete a persistence volume claim.

Undo Provisioning

This provisioning explication was an experimental proof of concept. To undo the storage provisioning use following commands:

Summary

The OpenShift Container Platform by Red Hat utilises Kubernetes and Docker. Running OCP on VMware vSphere is ok. Static Provisioning of persistent storage works. Dynamic Provisioning of storage is a little bit more challenging since you have to examine the interaction between Kubernetes and the vSphere Cloud Provider. Overall it feels like an Odyssey; you have to gather information on three major products and have to discover the logging data. In the end, I have learned a lot. Troubleshooting and analysing logs are essential. Without Elasticsearch it is a cumbersome task.