Share this:

You’ve heard of Kubernetes, but what is it, really? Can you explain it to your boss? Or your coworkers? Or your dog?

Kubernetes is an open source container orchestration tool developed by Google (source code in GitHub), where it has been in use for 15 years. But what does that mean? And why should you care?

Let me start by outlining the problems with running applications in container clusters. Then I’ll show you what Kubernetes is NOT. And finally, I’ll show you how Kubernetes solves the aforementioned problems.

When you’re finished you should be able to explain Kubernetes so well they’ll all be eating out of your hand.

The Problems

In this section, we’ll look at a three of the problems you’ll face when running container-based applications in a clustered environment. Any solution needs to address all of these (spoiler alert: Kubernetes does!).

Scheduling

You’ve got this great container-based application? Awesome! Now you need to make sure it runs when and where it’s supposed to. It’s important for your application to be running on the right machines in your cluster, but not all machines in the cluster are necessarily alike.

Load balancing

Your application is up and running. Great! Now you need to make sure that the client load is spread evenly among the nodes in your cluster. It is important that your application is making optimal use of the resources on each host to handle the client load. You don’t want some containers working at full throttle while others sit idle.

Application scaling

You have your containers running and the client load is balancing nicely between them. Super! Now you need to be able to bring containers online to handle the load (and spikes in demand), and kill them off when they are no longer needed. It is important to be able to handle spikes in client requests.

Cluster management and monitoring

Now that you have your application running efficiently on that giant cluster, you have to manage it. You need to define, launch, scale, load balance, and monitor the health of the containers that are running. Not an easy task.

What Kubernetes is NOT

Platform as a Service (PaaS)

While Kubernetes does a number of things that a PaaS offering would such as storage management and cluster logging and monitoring. But, Kubernetes is not really a PaaS offering because it does not provide components like the operating system, or supporting tools like Java and Docker. Kubernetes does, however, integrate nicely with PaaS offerings like Bluemix and OpenShift.

A data processing framework

Kubernetes is a framework that is definitely suitable for running Big Data applications, but does not perform – or provide services which perform – the same function as data processing frameworks like Apache Spark and Hadoop Map/Reduce. However, Kubernetes integrates nicely with both Spark and Hadoop (just to name two).

Continuous integration

Kubernetes doesn’t build your application’s containers like Jenkins and other CI tools, but (surprise!) does work well with CI to help manage updates to your application as it evolves through its lifecycle.

The Solutions

Kubernetes addresses each of the problems listed above (you’re not shocked, are you?). In the sections below I talk through how, and I’ll introduce Kubernetes terminology (in bold italics) along the way.

Scheduling

A Kubernetes Pod is a group of containers that work together to perform an application function (or set of functions), and is the unit of scheduling in Kubernetes.

When a pod is created, the scheduler finds the most suitable Node (host machine in the cluster) on which it should run. This is handled by the kube-scheduler component, which selects candidate nodes in the cluster, and makes sure that the resources provided by that node match those required by the containers in the pod.

Load balancing

A Kubernetes Service is a logical group of pods (called Replicas) that all provide the same functionality, and serves to decouple the pod replicas from their clients.

In Kubernetes, load balancing by default is handled by services. For each service you can provide a label selector, used to identify the pod’s replicas. Since the physical location of the replicas is immaterial, the clients that need require their functionality neither know nor care where they actually run. The scheduler uses the label selector to select the right service for the request, and make sure that the client load is always balanced.

Application scaling

A Kubernetes Replication Controller makes sure that the specified number of pod replicas are running in the cluster at all times.

The Replication Controller handles scaling the app by ensuring that the number of replicas you want to be running is in fact always running. If there are too few (maybe one or more died for some reason), the replication controller starts more until the target is reached. If too many are running (in the case of auto-scaling) it kills some off.

Cluster management and monitoring

The Kubernetes Dashboard is a web-based UI for monitoring that includes screens to manage the pods that are running, and view metrics like CPU and memory usage. It is not deployed by default, but with the kubectl command, you can deploy the dashboard and begin using it:

Conclusion

You should have a better idea of the problems Kubernetes solves, and how.

Now march into the next staff meeting and wow your boss and coworkers with your (high-level) understanding of Kubernetes. If they’re not eating out of your hand when you’re done, maybe you need a new job. But at least your dog will still love you.

References and other Kubernetes resources

I’ve sprinkled links throughout this document to help you learn more about Kubernetes, but I thought I would include a few here that are more overview-level. Enjoy!

The source: https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/

Steve Perry has been a professional software developer since 1991 using a variety of languages, but his favorite is Java. He is the author of Java Management Extensions (O'Reilly), Log4j Shortcut (O'Reilly) and has moderated the developerWorks Java Enterprise Open Source Application Architecture community. He has been a developerWorks contributor since 2009.
Steve is Principal Consultant for Makoto Consulting Group, Inc. in Little Rock, AR, USA.
When he isn't playing hockey or practicing yoga, Steve likes to make educational videos and post them on YouTube for Makoto TV. Check it out here and please subscribe!

I don’t get the argument that Kubernetes is not PaaS, especially if you are going to argue that OpenShift is. OpenShift is a Kubernetes distro! Kubernetes does provide a container runtime like Docker or rkt, and Bluemix doesn’t expose an underlying OS either so that’s not a fair comparison.

PaaS means that underlying details of the environment are abstracted and you just bring your application, in the case of K8s that is a container image. Kubernetes handles scheduling, networking, storage, and many other aspects of deployment such that all users have to do is author the resource definitions and run a kubectl command. That’s PaaS.

First off, to reduce OpenShift to saying it “…is a Kubernetes distro” is unfair to OpenShift. Second, as you (correctly) point out, OpenShift is a PaaS offering, so it provides more than the capabilities of Kubernetes. Third, I think comparing Bluemix to OpenShift is completely fair, as they are both PaaS offerings.

I only refer to OpenShift as PaaS (which you do as well) to argue that Kubernetes is PaaS. If OpenShift is PaaS, K8s must also be because core OpenShift is just K8s. What features of OpenShift do you think may it PaaS that base Kubernetes does not offer?

Lastly, Bluemix versus OpenShift is not a comparison I made. I was comparing Bluemix to Kubernetes. You mention “Kubernetes is not really a PaaS offering because it does not provide components like the operating system, or supporting tools like Java and Docker.” My point was that Bluemix doesn’t expose an underlying OS either (yet we both agree it is PaaS), and if you are using Kubernetes then there is an underlying container runtime as well like Docker or rkt.

I hope this helps you understand where I am coming from. I strongly disagree with the statement that Kubernetes is not PaaS.