How Does a K8s Cluster Work?

A k8s cluster is made up of master and worker nodes. A node corresponds to an AWS instance. Masters control the cluster and workers do the work (WARNING: very feudal, don’t let that affect your world view). A worker node can run multiple pods and each pod is comprised of one or more containers.

A pod is a vital k8s concept, it is set of one or more containerised applications that always deploy and run together.

Who gets their own IP address?

In the normal world (e.g. for Docker or AWS) usually physical machines or VMs get ip addresses like 192.0.2.0. All contactable processes on the machine get a port relative to this ip address, e.g. 192.0.2.0:123. To talk to a process directly you need to find out the machine’s IP address and the process’s port.

Kubernetes DOES NOT work like this. For excellent reasons that I agree with they decided on a different model. K8s decided that in an orchestrated world, every POD should have a unique (currently IPv4) ip address like 192.0.2.0 and all the containers running “inside” the pod should have port numbers relative to the pod’s IP address like 192.0.2.0:123. They recommend allocating a block of IPs to each node in the cluster for assigning individual IPs to each pod on the node.

You can get an address range from which to allocate all your node IP blocks via an AWS VPC.

This is subtle but revolutionary. Each pod has its own ip address just like a VM and yet multiple pods run on a single AWS instance. Some fancy footwork will be required because this is not AWS’s normal view of the world.

What does k8s do for you?

The first thing to realise is that k8s doesn’t just do all this for you. It assumes each pod can be given an IP that can be used for container-to-container communication and it assumes each node has an IP subnet from which IPs can be allocated to individual pods. K8s doesn’t provide this function. It relies on the network to provide it.

Note for a master you just assign a boring old static IP address unless you are doing something exciting like running pods on masters (personally, I avoid that kind of excitement).

AWS throws up it’s networked hands — Use an Overlay

k8s’ pod networking assumptions mean that to run k8s on AWS you really want to ALSO use an overlay-style network like Calico IPIP or Flannel or Weave, which is designed to meet k8s expectations. Out-of-the-box, AWS’s network doesn’t. (Note, some of the PaaS k8s distributions like OpenShift have built-in or pluggable overlays)

What’s the heck’s an overlay?

An overlay network obscures AWS’s underlying network architecture from pods using traffic encapsulation (i.e. by wrapping all the cluster packets, at the node-level, with additional TCP or UDP headers). This lets you, for example, host multiple containerised applications with different cluster-addressable IPs on a single AWS instance — i.e. just what k8s needs.

The overlay network uses etcd or equivalent k/v function to store mappings between the virtual IP addresses it’s using for the pods and the nodes’ own IP addresses. A daemon runs on each node and is responsible for checking information in etcd and routing packets to the correct pods (and ultimately containers). You must give the overlay network an address range to work with. You can choose how many IP addresses are available on each node for pods (for example, you can set this high if you want to run loads of pods on each node and you have plenty of IP addresses).

What’s the alternative?

If you don’t want an overlay, you could configure the underlying AWS network fabric (routers, etc.) via the route-tables to be aware of pod IP addresses, but that will be painful at scale & doesn’t work multi-AZ, also k8s is ultimately about a dynamic world in which pods and their associated ip addresses get created and destroyed a lot so in the long run you need to be able to handle this automatically to get full benefit from k8s.

Another alternative is to use a non-overlay solution like Calico (non-IPIP mode). That’s fast and lightweight compared to an overlay approach BUT only works within a single VPC subnet, so cannot handle a cluster that stretches over multiple AZs. To handle multi-AZ. you need to enable overlay, at least for packets that pass between those between those AZs (“microoverlay” is apparently the word for this in Calico).

What do we use?

We have a small, single AZ, k8s cluster on AWS for Microbadger and we used the default overlay that came with kops. Using an overlay has worked well (it’s been completely transparent to us so far, which is how we like it and what we’re doing is not that speed-sensitive).

Summary

Right now, we’re using overlay for our Microbadger deployment and it has been fine. We started out quite scared by networking k8s on AWS but it hasn’t been that bad!

Extras

I totally love Cisco’s Peter Packet song (below).

Please hit the Recommend button below if you found this article interesting or helpful, so that others might be more likely to find it.

Check out MicroBadger to explore image metadata, and follow Microscaling Systems on Twitter.