In this guide, we’ll create a bare-metal Ceph RBD cluster which may provide persistent volume support for a Kubernetes environment.

Our Ceph RBD cluster will be composed of a single Ceph monitor (MON) and two Ceph Object Storage Daemon (OSD) nodes. In Ceph, MON nodes track the state of the cluster, and OSD nodes hold the data to be persisted.

Use this guide to implement persistent storage to make a Kubernetes environment suitable for hosting stateful applications. When Pods of persistent applications such as databases fail, the Kubernetes system directs the Pod to migrate from one Node to another. The data is often left behind and lost… unless it was stored on a Kubernetes volume backed by a network block device. In this case Kubernetes will automatically migrate the Volume mount along with the Pod to the new destination Node.

Ceph RBD may be used to create a redundant, highly available storage cluster that provides network-mountable block storage devices, similar to that of Rackspace Cloud Block Storage and Amazons EBS (minus the API). Ceph itself is a large project which also provides network-mountable posix filesystems (CephFS) and network object storage like S3 or Swift. However, for Kubernetes we only need to deploy a subset of Ceph’s functionality… Ceph RBD.

This guide should work on any machines, whether bare-metal, virtual, or cloud, so long as the following criterial are met:

All Ceph cluster machines are running Ubuntu 14.04 LTS

Centos7 may work with yum-plugin-prorities disabled.

All machines are network reachable, without restriction

i.e. open iptables, open security groups

Root access through password-less SSH is enabled

i.e. configured /root/.ssh/authorized_keys

Because not many of us have multiple bare-metal machines laying around, we’ll rent them from packet.net. If you already have machines provisioned, you may skip the provisioning section below.

Provision Packet.net Bare-Metal Machines

The ceph/ceph-ansible scripts
work on Ubuntu 14.04, but fail on Centos 7 unless yum-plugin-priorities are disabled.

If you enjoy clicking around WebUI’s, follow the manual instructions below. Otherwise, the only automated provisioning method supported by packet.net is to use Hashicorp Terraform. A CLI client does not yet exist.

Semi-Automatic Instructions (Using Terraform)

Instantiating servers via Hashicorp Terraform must happen in two steps if a “packet_project” has not yet been created.

This is because the “packet_device” (aka. bare metal machine) definitions require a “project_id” upon execution. However, one does not know the “project_id” until after the “packet_project” has been created. One might resolve this issue in the future by having the packet_device require a project_name instead of a project_id. Then we could “terraform apply” this file in one go.

See the Appendix for curl commands that will help you discover project_ids from the API.

Append to cluster.tf, and tweak the {variables} according to your needs. As of May 2016, only the ewr1 packet.net facility supports disk creation. Because of this, ensure all machines and disks are instantiated in facility ewr1.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

# Create a device

resource"packet_device""ceph-mon-0001"{

hostname="ceph-mon-0001"

plan="baremetal_0"

facility="ewr1"

operating_system="ubuntu_14_04"

billing_cycle="hourly"

project_id="{REPLACE_WITH_PACKET_PROJECT_ID}"

}

resource"packet_device""ceph-osd-0001"{

hostname="ceph-osd-0001"

plan="baremetal_0"

facility="ewr1"

operating_system="ubuntu_14_04"

billing_cycle="hourly"

project_id="{REPLACE_WITH_PACKET_PROJECT_ID}"

}

resource"packet_device""ceph-osd-0002"{

hostname="ceph-osd-0002"

plan="baremetal_0"

facility="ewr1"

operating_system="ubuntu_14_04"

billing_cycle="hourly"

project_id="{REPLACE_WITH_PACKET_PROJECT_ID}"

}

Run Terraform

1

2

3

terraform plan-out cluster

terraform apply

Note the names and IPs of the newly created machines

1

2

3

4

5

6

7

8

9

cat terraform.tfstate|grep-E'(hostname|network.0.address)'

# Returns something like this

# "hostname": "ceph-mon-0001",

# "network.0.address": "147.75.199.133",

# "hostname": "ceph-osd-0001",

# "network.0.address": "147.75.192.205",

# "hostname": "ceph-osd-0002",

# "network.0.address": "147.75.197.189",

Create an ~/.ssh/config file with the hostname to IP mappings. This allows you to refer to the machines via hostnames from your Dev box, as well as Ansible inventory definitions.

Step 3: Create and Attach Disks

Using the WebUI, create a 100GB disk and name it the be the same as the respective OSD node (e.g. ceph-osd-0001)

Using the WebUI, attach the disk to the respective OSD node

Using SSH, connect to the OSD node and run the on-machine attach command

1

2

3

4

5

# Connect to the box

sshroot@ceph-osd-{id}

# This will attach the disk and create a new entry under /dev/mapper/volume-{id}

packet-block-storage-attach

Using SSH, connect to the OSD node and rename the attached volume to be a consistent name. The Ansible scripts we run later will expect the same volume names (e.g. /dev/mapper/data01) and the same number of volumes across all of the OSD nodes.

1

2

3

4

5

# Connect to the box

sshroot@ceph-osd-{id}

# Rename the newly mapped volume from /dev/mapper/volume-{id} to /dev/mapper/data01

dmsetuprename/dev/mapper/volume-{id}data01

You may add additional disks to each OSD machine, so long as the number of and naming of disks on across all of the OSD machines are the consistent.

Configure and Run the ceph/ceph-ansible scripts

Fetch the ceph/ceph-ansible code

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

# Tested at ceph/ceph-ansible - branch master, commit 84dad9a7

git clonegit@github.com:ceph/ceph-ansible.git

cd ceph-ansible

# Work in your own branch

git checkout master

git checkout-bbare-metal

# Set links from the sample configs to your own configs

# This allows you to "git diff master" later to see your config changes

ln-ssite.yml.sample site.yml

pushd group_vars

ln-sall.sample all

ln-smons.sample mons

ln-sosds.sample osds

popd

Create an Ansible inventory file name “inventory”, and use the hosts and IPs recorded prior.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

# configure monitor IPs or Names

[mons]

ceph-mon-0001

# configure object storage daemon IPs or Names

[osds]

ceph-osd-0001

ceph-osd-0002

# configure clients

# Setup your local vagrant machine as a ceph client.

# If you have a Kubernetes cluster that needs access to

# Ceph volumes, add the nodes here as well.

[clients]

localhost ansible_connection=local

# kube-node-0001

# kube-node-0002

Test the ansible connection, and accept host-keys into your ~/.ssh/knownhosts

1

2

ansible all-uroot-iinventory-mping

Configure and edit group_vars/all. You may add the following
settings, or replace/uncomment existing lines:

# This is very specific to packet.net, whose primary interface is bond0.

# Other distributions may use eth0.

monitor_interface:bond0

# Set ceph-ansible journal_size (arbitrarily chosen)

journal_size:256

# Ceph OSDs may be setup to use the public_network for servicing

# clients (volume access, etc), and a cluster_network for OSD data

# replication. In our case on packet.net, we set these to be the

# same. Packet.net machines have a single bond0 interface (2 NICs

# ganged as one) with a public internet address. There is also a

# private IP alias created on the same interface that seems to

# simulate a "Rackspace servicenet" type of private network.

public_network:10.0.0.0/8

cluster_network:10.0.0.0/8

# Set the type of filesystem that Ceph should use for underlying Ceph

# RBD storage.

osd_mkfs_type:xfs

osd_mkfs_options_xfs:-f-isize=2048

osd_mount_options_xfs:noatime,largeio,inode64,swalloc

osd_objectstore:filestore

Configure and edit groups_vars/osds. You may add the following
settings, or replace/uncomment existing lines:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

#####################################################################

# Custom Configuration

# Set ceph-ansible to use the "data01" device across all OSD nodes.

# If you have added additional disks, you may include them below.

# Beware that we do not enable auto-discovery of disks.

# Ceph-ansible does not recognize handle multipath disks.

# Thus, the individual two /dev/sd{a,b} disks will be discovered

# instead of the single proper volume under

# /dev/mapper/volume-{id}.

devices:

-/dev/mapper/data01

# - /dev/mapper/data[XX] if you have added additional disks

# Set ceph-ansible to place the journal on the same device as the data

# disk. This is usually suboptimal for performace, but in our case

# we only have a single disk.

journal_collocation:true

Check if any “inventory” machine targets are running Centos 7. Disable yum-plugin-priorities, or else the ceph-ansible scripts will fail. If enabled, the scripts will fail to locate packages and dependencies because other repositories with the older versions of packages will take precedence over the ceph repository.

Test the Cluster by Creating Storage Volumes

Ensure a Healthy Ceph RBD Cluster

Your local vagrant machine should now have the Ceph CLI tools installed and configured. If for some reason the tools are not working, try the following from the Ceph MON machine.

Query Ceph cluster status

1

2

3

4

5

6

7

8

9

10

11

12

13

14

ceph-s

# cluster d1eab1dc-a050-46a4-9771-f2494714b96e

# health HEALTH_WARN

# 64 pgs degraded

# 64 pgs stuck unclean

# 64 pgs undersized

# monmap e1: 1 mons at {ceph-mon-0001=147.75.196.123:6789/0}

# election epoch 3, quorum 0 ceph-mon-0001

# osdmap e9: 2 osds: 2 up, 2 in

# flags sortbitwise

# pgmap v18: 64 pgs, 1 pools, 0 bytes data, 0 objects

# 72528 kB used, 199 GB / 199 GB avail

# 64 active+undersized+degraded

You can see that the cluster is working, but in a WARN state. Let us debug this.

List all osd pools, and find only the default “rbd” pool.

1

2

3

4

# list all OSD pools

ceph osd pool ls

# rbd

List the parameters of the default “rbd” osd pool.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

ceph osd pool get rbd all

# size: 3

# min_size: 2

# crash_replay_interval: 0

# pg_num: 64

# pgp_num: 64

# crush_ruleset: 0

# hashpspool: true

# nodelete: false

# nopgchange: false

# nosizechange: false

# write_fadvise_dontneed: false

# noscrub: false

# nodeep-scrub: false

# use_gmt_hitset: 1

# auid: 0

# min_write_recency_for_promote: 0

# fast_read: 0

Notice that size == 3 which means that a healthy state requires 3 replicas for all placement groups within the “rbd” pool. However, we only have 2 OSDs, and thus are running in a degraded HEALTH_WARN state. If we decrease the number of required replicas, the cluster will return to a healthy state.

1

2

3

# set redundancy of rbd default pool from 3 to 2, because we only have 2 OSDs

ceph osd pool set rbd size2

Check that Ceph cluster status has returned to HEALTH_OK. This may take a few seconds.

1

2

3

4

5

6

7

8

9

10

11

ceph-s

# cluster d1eab1dc-a050-46a4-9771-f2494714b96e

# health HEALTH_OK

# monmap e1: 1 mons at {ceph-mon-0001=147.75.196.123:6789/0}

# election epoch 3, quorum 0 ceph-mon-0001

# osdmap e11: 2 osds: 2 up, 2 in

# flags sortbitwise

# pgmap v26: 64 pgs, 1 pools, 0 bytes data, 0 objects

# 73432 kB used, 199 GB / 199 GB avail

# 64 active+clean

Create a Volume for Testing

We’ll test the Ceph cluster from the OSD nodes as Ceph clients. According to the earlier configuration:

public_network == cluster_network == 10.0.0.0/8

Ceph will only listen to requests on the 10.0.0.0/8 network, which is a pseudo-servicenet alias for a private network between the machines within the same packet.net project. (Note: Perhaps this may not work as expected. It seems that the ceph-monitor process is bound to the IP on the public network, which must be used in the Kubernetes Volume definitions).

Copy the credentials from the Ceph MON to the Ceph OSDs, so that they will be able to access the Ceph cluster as clients

From the first OSD node, create a volume, mount it, add data, and unmount the volume.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

# SSH to the first Ceph OSD node where we will simulate a client

ssh root@ceph-osd-0001

# Create an RBD device and specify *only* the layering feature

# Man Page: http://docs.ceph.com/docs/master/man/8/rbd/

# Many default features based on exclusive-lock are not supported

# by kernel mounts of non-cutting-edge operating systems. Thus,

# we dumb down the feature set to only support layering. Older

# kernels are able to support mounting the default image-feature's

# via the userspace utility "rbd-nbd" (apt-get install -y rbd-nbd;

# rbd-nbd map foo).

# Create the volume "vol01" within the "rbd" pool

rbd create rbd/vol01--size100M--image-feature layering

# List available volumes

rbd ls

# vol01

# Map the volume to the system, which creates /dev/rbd0

# All named mounts are found under /dev/rbd/rbd/*

# The resulting volume may be accessed at: /dev/rbd/rbd/vol01

rbd map rbd/vol01

# Create a filesystem on the new block device

mkfs.ext4/dev/rbd/rbd/vol01

# Mount the volume to the system

mkdir-p/mnt/tmp

mount-text4/dev/rbd/rbd/vol01/mnt/tmp

# Check that the device is actually mounted

df-h

# Create some files in the new volume

pushd/mnt/tmp

echo"Hello World"|tee welcome.txt

cat welcome.txt

dd if=/dev/urandom of=urandom.bin bs=1Mcount=5

popd

# Unmount the new volume

umount/mnt/tmp

# Unmap the RBD device

rbd unmap rbd/vol01

From the second OSD node, mount the volume that the first node created, read data, and unmount the volume.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

# SSH to the second Ceph OSD node where we will simulate a client

ssh root@ceph-osd-0002

# Map the volume to the system, which creates /dev/rbd0

# All named mounts are found under /dev/rbd/rbd/*

# The resulting volume may be accessed at: /dev/rbd/rbd/vol01

rbd map rbd/vol01

# Mount the volume to the system

mkdir-p/mnt/tmp

mount-text4/dev/rbd/rbd/vol01/mnt/tmp

# Check the contents of the volume

ls-la/mnt/tmp

# Unmount the new volume

umount/mnt/tmp

# Unmap the RBD device

rbd unmap rbd/vol01

Configure a Kubernetes Container to Mount a Ceph RBD Volume

The rest of this guide assumes that you have a working Kubernetes environment created by following this (Setup Guide).

Ensure that your Kubernetes environment is working.

1

2

3

4

5

6

export KUBECONFIG=~/.kube/config.pkt

kubectl config use-context pkt-east

kubectl config current-context

kubectl cluster-info

kubectl get svc,rc,po

First, base64-encode the ceph client key.

1

2

3

4

ssh ceph-mon-0001cat/etc/ceph/ceph.client.admin.keyring|\

grep key|awk'{print $3}'|base64

# VhyV0E9PQoVhyV0E9PQoVhyV0E9PQoVhyV0E9PQoVhyV0E9PQoVhyV0=

Create a file called “ceph-test.yaml”, which will contain definitions of the Secret, Volume, VolumeClaim, and ReplicationController. We will mount the “rbd/vol01” test volume created in the prior step into any test Container (in our case, nginx).

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

---

# Replace the following configuration items

# - {REPLACE_WITH_BASE64_ENCODED_CEPH_CLIENT_KEY}

# - {REPLACE_WITH_PUBLIC_IP_OF_CEPH_MON}

---

apiVersion:v1

kind:Secret

metadata:

name:ceph-secret

data:

key:{REPLACE_WITH_BASE64_ENCODED_CEPH_CLIENT_KEY}

---

apiVersion:v1

kind:PersistentVolume

metadata:

name:ceph-vol01

spec:

capacity:

storage:100Gi

accessModes:

-ReadWriteOnce

rbd:

monitors:

-"{REPLACE_WITH_PUBLIC_IP_OF_CEPH_MON}:6789"

#- LIST OTHER CEPH_MON'S HERE FOR REDUNDANCY

pool:rbd

image:vol01

user:admin

keyring:"/etc/ceph/ceph.client.admin.keyring"

secretRef:

name:ceph-secret

fsType:ext4

readOnly:false

---

apiVersion:v1

kind:PersistentVolumeClaim

metadata:

name:ceph-vol01

spec:

accessModes:

-ReadWriteOnce

resources:

requests:

storage:100Gi

---

apiVersion:v1

kind:ReplicationController

metadata:

name:ceph-test

namespace:default

spec:

replicas:1

selector:

instance:ceph-test

template:

metadata:

labels:

instance:ceph-test

spec:

containers:

-name:nginx

image:nginx

volumeMounts:

-mountPath:/data

name:ceph-vol01

volumes:

-name:ceph-vol01

persistentVolumeClaim:

claimName:ceph-vol01

Create the resources defined within “ceph-test.yaml”

1

2

cat ceph-test.yaml|kubectl create-f-

Check that the nginx Pod/Container is running, and locate its name to use in the next command

1

2

3

4

kubectl get po

# NAME READY STATUS RESTARTS AGE

# ceph-test-61cks 1/1 Running 0 1s

Verify that the earlier-created files are readable within the “rbd/vol01” mount inside of the POD/Container

1

2

3

4

5

6

7

kubectl exec-it ceph-test-61cks--find/data

# /data

# /data/lost+found

# /data/welcome.txt

# /data/urandom.bin

We now have a functioning Ceph cluster that is mountable by a Kubernetes environment.

If the Pod is destroyed and migrated to a different Kubernetes node, the volume mount will also follow.

Installing Kubernetes on bare-metal machines is dead simple, and a million times easier than installing OpenStack. The bulk of the instructions below involve setting up the bare-metal machines on packet.net.

What You’ll Get with These Instructions

One may use these instructions to create a basic Kubernetes cluster. In order to create a cluster environment equivalent to a hosted solution (GKE) or turn-key solutions (Kubernetes on AWS or GCE), you’ll need persistent volume and load-balancer support. A future post will cover how to setup persistent volume storage as a Ceph RBD cluster, and how to work around the need for external load-balancer integration by deploying a Kubernetes Ingress DaemonSet with DNS.

In the following guide, we’ll build a Kubernetes cluster with 1 master and 2 nodes.

Bare-Metal or Cloud VMs, All the Same

These instructions should also work cloud VMs, so long as the following criterial are met:

All machines are network reachable, without restriction

i.e. open iptables, open security groups

Root access through password-less SSH is enabled

i.e. configured /root/.ssh/authorized_keys

Because not many of us have multiple bare-metal machines laying around, we’ll rent them from packet.net.

Provision Bare-Metal Machines (on packet.net)

The contrib/ansible scripts are targeted to offical Redhat distributions including RHEL and Fedora. However, packet.net does not currently support these operating systems. Thus, we deploy Centos7 servers, which is the next-closest thing and happens to work.

If you enjoy clicking around Web UI’s, follow the manual instructions below. Otherwise, the only automated provisioning method supported by packet.net is to use Hashicorp Terraform. A CLI client does not yet exist.

Manual WebUI Instructions

Automated Instructions (using Terraform)

Instantiating servers via Hashicorp Terraform must happen in two steps if a “packet_project” has not yet been created.

This is because the “packet_device” (aka. bare metal machine) definitions require a “project_id” upon execution. However, one does not know the “project_id” until after the “packet_project” has been created. One might resolve this issue in the future by having the packet_device require a project_name instead of a project_id. Then we could “terraform apply” this file in one go.

See the Appendix for curl commands that will help you discover project_ids from the API.

Description

I’ve created a secure routed VPN network between all of my family’s home networks. Here’s what it looks like, followed by how I did it.

Here’s an overview of the components:

Home Network

OpenVPN concentrator

Netgear WNR3500L (480Mhz CPU, 8MB Flash, 64MB RAM)

DD-WRT Mega/Big (Includes OpenVPN), with jffs enabled

Local Network: 192.168.1.0/24

Dynamic DNS: xxxx.dyndns.org

Remote Client Networks 1 and 2

OpenVPN client

Linksys WRT54G (266Mhz CPU, 4MB Flash, 16MB RAM)

DD-WRT VPN (Includes OpenVPN), with jffs enabled

Client Network 1

Local Network: 192.168.2.0/24

Dynamic DNS: yyyy.dyndns.org

Client Network 2

Local Network: 192.168.3.0/24

Dynamic DNS: zzzz.dyndns.org

Generate the Certificates

First, install OpenVPN on your development machine

1

sudo apt-get install openvpn

Second, run this little script to create a server certificate and several client certificates. These certificates will be directly used in the OpenVPN concentrator and client configuration files. You will need a single unique client certificate for every OpenVPN client that will connect to the OpenVPN server. When running this script, accept defaults for all of the prompts.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

#!/bin/bash

mydir=`pwd`

mydate=`date+%Y%m%d%H%M`

mykeygendir="$mydir/${mydate}_keygen"

echo mykeysdir

cp-r/usr/share/doc/openvpn/examples/easy-rsa/2.0$mykeygendir

pushd.

cd$mykeygendir

source./vars

export KEY_COUNTRY="US"

export KEY_PROVINCE="CA"

export KEY_CITY="San Francisco"

export KEY_ORG="yourdomain.com"

export KEY_EMAIL="you@yourdomain.com"

export KEY_CN="yourdomain.com"

export KEY_OU="yourdomain.com"

export KEY_NAME="yourdomain.com"

yes""|./clean-all

yes""|./build-ca

./build-key client1

./build-key client2

./build-key client3

./build-key client4

./build-key client5

./build-key client6

./build-key client7

./build-key client8

./build-key client9

./build-key-server server

./build-dh

Home Network OpenVPN Concentrator

Add the following to DD-WRT -> Administration -> Commands -> Startup

1

2

sleep15

cat/jffs/openvpn_server.txt|sh

Customize and copy the file contents under the DD-WRT flash path: /jffs/openvpn_server.txt.

My Brother MFC-9840cdw multi-functional printer/scanner/coper/fax can scan to email. However gmail requires SSL, which the printer is not able to support. Using xinetd and openssl, my Linux machine is able to proxy local pop3 requests to gmail’s SSL pop3s service, and local smtp requests to gmail’s SSL smtps service.