Virtualisation, Storage and various other ramblings.

PKS is a comprehensive platform for the provisioning and management of Kubernetes clusters, which can be further enhanced by leveraging its extensibility options. In this post, we will modify a plan to deploy a yaml manifest file which provisions Prometheus, Grafana, and Alertmanager backed by NSX-T load balancers.

Why Prometheus, Grafana and Alertmanager?

The Cloud Native Computing Foundation accepted Prometheus as its second incubated project, the first being Kubernetes. Originally developed by SoundCloud. It has quickly become a popular platform for the monitoring of Kubernetes platforms. Built upon a powerful analytics engine, extensive and highly flexible data modeling can be accomplished with relative ease.

Grafana is, amongst other things, a visualisation tool that enables users to graph, chart, and generally visually represent data from a wide range of sources, Prometheus being one of them.

AlertManager handles alerts that are sent by applications such as Prometheus and performs a number of operations such as deduplicating, grouping and routing.

What I wanted to do, as someone unfamiliar with these tools is to devise a way to deploy these components in an automated way in which I can destroy, and recreate with ease. The topology of the solution is depicted below:

Prometheus – Quick tour

Logging into http://prometheus-lb-vip/targets will list the list of scrape targets for Prometheus, which have been configured via the respective configmap and include:

API server (Master)

Nodes (Workers)

cAdvisor (Pods)

Prometheus (self)

Which can be graphed / modelled / queried:

Grafana – Quick Tour

Out of the box, Grafana has very limited configuration applied – I struggled a little bit with constructing a configmap that would automatically add Prometheus as a data source, so a little bit of manual configuration is required (for now). Accessing http://grafana-lb-vip will prompt for a logon: (admin/admin) is the default

Add a data source:

Hint : use “prometheus.monitoring.svc.cluster.local” as the source URL

After creating a dashboard (or importing one) we can validate Grafana is extracting information from Prometheus

Alertmanager – Quick Tour

Alertmanager can be accessed via http://alertmanager-lb-vip. From the YAML manifest it has a vanilla config, but Prometheus is configured to use it as a alert target via the configmap:

Introduction

Over the long bank holiday weekend, I sat and passed the Certified Kubernetes Exam (CKA). This blog post goes over my experience (With respect to the NDA) together with a lab guide I’ve made which I’ve uploaded hoping it might help others.

Format

The online exams consist of a set of performance-based items (problems) to be solved on the command line. For the CKA there are 24 questions of varying difficulty. At the time of writing, the only option to sit this exam is through remote proctoring.

Experience

I’m a huge fan of practical exams. I’m so glad the powers at be decided to go down this route. I absolutely loathe multiple choice exams for many reasons. The remote proctoring was a new experience for me, and I wasn’t completely comfortable with it. Given the choice, I would have preferred to go to a test center. I hope The Linux Foundation adds this option in the future.

I sat the exam first time around feeling relatively confident, but knew I had some weaker areas. After a painstaking wait I received the following email:

I brushed myself off, crammed the areas I was weaker on, took the exam again and waited….

…and waited

From my experience, as a techie, I get a much more accomplished feeling when passing practical exams, such as this / VCAP’s etc. Either way, to say I was happy would be a gross understatement.

Takeaways

A lot of people say this exam is “hard”. I get really discouraged reading up on peoples exam experiences saying exams are “hard”. I would say a more accurate adjective for this exam would be “Fair”. Know the curriculum, practice your craft, and you’ll get there.

Lean on the documentation as much as you need to. You have access to kubernetes.io/docs during the exam.

It’s a practical exam, so practice, practice, and practice some more.

You get a free retake, so don’t worry if you don’t pass first time.

kubectl run somedeployment –image=nginx –replicas=5 –dry-run -o yaml. Output existing or new objects to a yaml file if you need to make finer adjustments or create objects from scratch.

For the following to work, your k8s infrastructure needs to leverage some kind of CNI that’s able to provision load balancers. For this example I’m leveraging PKS which has native integration with NSX-T.

The default way to access the Kubernetes dashboard is to leverage the kubectl proxy command. However, this is somewhat limiting for a production environment. An alternative way is to expose the dashboard through a load balancer.

Modify the Dashboard service by executing : kubectl -n kube-system edit service kubernetes-dashboard and modifying the “type” field from “ClusterIP” to “LoadBalancer”

Afterwards, the service will be reconfigured to be presented by a load balancer external VIP.

What are container registries and why do we need them?

A lot of the time, particularly when individuals and organisations are evaluating, testing and experimenting with containers they will use public container registries such as Docker Hub. These public registries provide an easy-to-use, simple way to access images. As developers, application owners, system admins etc gain familiarity and experience additional operational considerations need to be explored, such as:

Organisation – How can we organise container images in a meaningful way? Such as by environment state (Prod/Dev/Test) and application type?

RBAC – How can we implement role-based access control to a container registry?

Vulnerability Scanning – How can we scan container images for known vulnerabilities?

Efficiency – How can we centrally manage all our container images and deploy an application from them?

Security – Some images need to kept under lock and key, rather than using an external service like Docker Hub.

Introducing VMware Harbor Registry

VMware Harbor Registry has been designed to address these considerations as enterprise-class container registry solution with integration into PKS. In this post, We’ll have a quick primer on getting up and running with Harbor in PKS and explore some of its features. To begin, we need to download PKS Harbor from the Pivotal site and import it into ops manager.

After which the tile will be added (When doing this for the first time it will have an orange bar at the bottom. Press the tile to configure).

The following need to be defined with applicable parameters to suit your environment.

Availability Zone and Networks – This is where the Harbor VM will reside, and the respective configuration will be dependent on your setup.

VMware Harbor Registry – Vulnerability Scanning

The ability to identify, evaluate and remediate vulnerabilities is a standard operation is modern software development and deployment. Thankfully Harbor addresses this with integration with Clair – an open source project that addresses the identification, categorisation and analysis of vulnerabilities within containers. As a demonstration we need to first push an image to Harbor:

After initiating a scan, Harbor can inform us of what vulnerabilities exist within this container image

We can then explore more details about these vulnerabilities, including when they were fixed:

Conclusion

Harbor provides us with an enterprise level, container registry solution. This blog post has only scratched the surface, and with constant development being invested into the project, expect more features and improvements.

Preamble

A lot of organisations are looking towards containerising their applications and embracing the world of microservices. There are a number of ways to reach this goal, through a variety of tools and methodologies, this blog post goes through one way of approaching this task.

In a previous blog post I went through the process of taking a web application and putting it into a container, and whilst it got the job done, there wasn’t a lot of attention given to the image used for the container. So let’s address that.

Container images come in all shapes and sizes and choosing the right base for your application can be a difficult decision.

VM vs Ubuntu Container vs NGINX Container

In the aforementioned blog post, I took a simple web application and placed it into a Ubuntu-based container. It worked fine, but is there any way we can further optimise it?

I’ve got a generic 18.04 Ubuntu Server VM I spun up, installed Apache2 and added in my web application. According to the hypervisor (ESXi in this case), the VM consumed 5.84GB of storage

Next, I’ll do the same in a Ubuntu container, using the Ubuntu image as my base:

But what about different base images?

NGINX is is an open source reverse proxy server for HTTP, HTTPS, SMTP, POP3, and IMAP, as well as a load balancer, HTTP cache, and a web server. As such. it’s more specialised than the standard Ubuntu image. So let’s create another image based on NGINX.

Apache also has a image, so lets throw that in for good measure.

Comparing all three images yield different results with regards to image size:

Unsurprisingly, the Ubuntu base image comes in as the largest, followed by the Apache image, and NGINX being the smallest.

Is this such a big deal, though? If we consider:

The potential number of pods

The time taken for new nodes to download the image

The lifecycle of a pod

Then shaving off even 100MB or so from an image can have a significant impact on operations.

Thought process

From this exercise, we can determine one approach to choosing the right base image could be a result of the following:

Introduction

In this blog post, we take a look into the integration between PKS and vRealize Log Insight and how this integration benefits the enterprise. As a bit of a recap:

PKS – PKS is a purpose-built enterprise level container solution leveraging the capabilities of Kubernetes, BOSH, VMware NSX-T, Harbour and more to deliver a highly available, highly flexible container runtime that operates on a number of cloud platforms, both private and public, including vSphere, AWS, Azure and GCP.

VMware also released VMware Cloud PKS, a fully managed service that combines the technical capabilities of AWS, PKS and Kuberntes which can be consumed in a similar fashion to other cloud services.

vRealize Log Insight – vRealize Log Insight is a log management system that’s designed to operate within heterogeneous environments, however, it’s much more than a simple aggregator of logging information. vRealize Log Insight has analytical and trend-identification capabilities which allow operators to gain invaluable insight into the state, health, and events which are transpiring in the environment. vRealize Log Insight works across physical, virtual and cloud environments.

Containers and Coexistence with VM’s

VM’s have existed for a long time now. Consequently, there are very mature, battle-hardened tools and software which can be used to monitor a plethora of operating systems, software, components and more. Containers, on the other hand, are relatively new in the enterprise. Although there is an overlap, there are significant differences in the way we monitor and collect logs from VM’s and containers. How can this be addressed?

There are a number of ways to monitor a container based environment. Prometheus and Wavefront come to mind, but for environments that already leverage vRealize Log Insight, we can integrate PKS with it to facilitate a single plane of glass view of logging information from VM’s, their underlying infrastructure as well as containers and their underlying infrastructure.

What can we expect PKS to send to Log Insight

At a high level, the Integration between PKS and vRLI will facilitate the propagation of the following logs:

BOSH jobs

Core Kubernetes processes & nodes

Core BOSH processes

Kubernetes event logs

Individual Pod stdout and stderr

I’ve highlighted the last one as I can see real value in this. Imagine centralising all stdout and stderr from pods in combination with the analytics and trend identification capabilities from vRLI? Pretty interesting. Of course, we’re not that interested in what individual pods are logging, but if we have an example where some new code has been pushed out and 10’s / 100’s or 1000’s of pods start logging errors, we can identify, categorise and analyse these pretty easily with vRLI.

PKS and vRealize Log Insight in action

Talk is cheap, so let’s crack on.

Log into Ops Manager and select the PKS tile

Select “Logging” from the left and select “yes” under vRLI integration:

Enter the host and SSL settings where applicable in your environment:

Apply the changes:

if you keep an eye on the logs, references for the vRLI configuration will be shown:

After which, the following “hosts” can be observed, which in essence, is a reflection of the services within our Kubernetes cluster:

I also create a individual pod, named nginx-sleep. Below are the logs that were ingested for this event:

To validate the stdout capturing, create a cluster that writes to stdout:

And check the logs from the pod:

And also from Log Insight:

Conclusion

vRealize Log Insight provides a compelling platform for log ingestion, and it’s flexibility to ingest, analyse and interpret logs from physical, virtual and container based solutions makes it an extremely versatile tool in any admins repertoire.

Preamble (pun intended)

I’ve been eyeing up the VCAP6-NV exam for quite some time now, but due to work and personal projects I’ve not been able to focus on this exam. Having some time between jobs I decided to start revising and push myself into taking the exam. At time of writing, there is still no NV-Design exam, therefore, anyone who sits and passes the VCAP6-NV Deploy exam is automatically given the VCIX6-NV certification.

I’m not a full on networking person. Back in the day (~10 years ago) I wanted to be, but the opportunities didn’t exist for me and I consequently went down a generic sysadmin path until ending up where I am today, primarily focusing on SDDC and cloud technologies. There are a lot of people who dive into NSX that do come from traditional networking backgrounds. For me, I found this quite intimidating, however the point is you do not have to be some kind of Cisco God to appreciate NSX, or pass this exam.

Preparation

Read any blog post, forum or reddit post about any VMware-based exam and it won’t take long until someone says something like:

Go over each section of the blueprint via vZealand’s guide in my own lab, following the instructions on the site.

Go over each section of the blueprint without any initial external assistance, assess accuracy after each objective by checking the guide.

Go over each section of the blueprint in a way where I was confident in going over the objectives from practice.

Also, bear in mind the exam is based on 6.2 of NSX, therefore it’s a good idea to have a lab running on this version as there have been a number of significant changes since then.

After I accomplished all three, I felt confident to sit the exam

The exam itself

You’re looking at 205 minutes in total covering 23 questions that cover the blueprint in its entirety. Without breaking NDA my observations are:

Time management – This is the third VCAP exam I’ve taken and with each the time has flown by. It really doesn’t feel like a long time when you’ve finished. I personally found the NSX VCAP exam much more demanding on time compared to the VCAP-DCV Deploy exam. I didn’t complete all my questions in the NV exam whereas I had about 30-40mins left when I took the DCV exam.

Content – I feel like the exam was a pretty good reflection of the blueprint and was fairly well represented.

HOL Interface – The Exam simulation feels very similar to the VMware HOL, including (unfortunately) the latency. The performance for my exam wasn’t great, but wasn’t terrible.

Skip questions you’re not sure on – Time is an expensive commodity in this exam, if you’re struggling with some questions skip and move on. You may have time to come back to it later. I skipped a couple of questions.

Result

I passed, but not by much. But a pass is a pass and I was pretty chuffed. It’s definitely the hardest VCAP exam I’ve taken to date.

For the uninitiated, VMware NSX comes in two “flavours”, NSX-V which is heavily integrated with vSphere, and NSX-T which is more IaaS agnostic. NSX-T also has more emphasis on facilitating container-based applications, providing a number of features into our container ecosystem. In this blog post, we discuss the microsegmentation capabilities provided by NSX-T in combination with container technology.

What is Microsegmentation?

Prior to Software-defined networking, firewall functions were largely centralised, typically manifested as edge devices which were and still are, good for controlling traffic to and from the datacenter, otherwise known as north-south traffic:

The problem with this model, however, is the lack of control for resources that reside within the datacenter, aka east-west traffic. Thankfully, VMware NSX (-V or -T) can facilitate this, manifested by the distributed firewall.

Because of the distributed firewall, we have complete control over lateral movement within our datacenter. In the example above we can define firewall rules between logical tiers of our application which enforce permitted traffic.

But what about containers?

Containers are fundamentally different from Virtual machines in both how they’re instantiated and how they’re managed. Containers run on hosts that are usually VM’s themselves, so how can we achieve the same level of lateral network security we have with Virtual Machines, but with containers?

Introducing the NSX-T Container Plugin

The NSX-T container plugin facilitates the exposure of container “Pods” as NSX-T logical switch ports and because of this, we can implement microsegmentation rules as well as expose Pod’s to the wider NSX ecosystem, using the same approach we have with Virtual Machines.

Additionally, we can leverage other NSX-T constructs with our YAML files. For example, we can request load balancers from NSX-T to facilitate our application, which I will demonstrate further on. For this example, I’ve leveraged PKS to facilitate the Kubernetes infrastructure.

Microsegmentation in action

Talk is cheap, so here’s a demonstration of the concepts previously discussed. First, we need a multitier app. For my example, I’m simply using a bunch of nginx images, but with some imagination you can think of more relevant use cases:

Declaring Load balancers

To begin with, I declare two load balancers, one for each tier of my application. Inclusion into these load balancers is determined by tags.

Testing Microsegmentation

At this stage, we’re not leveraging the Microsegmentation capabilities of NSX-T. To validate this we can simply do a traceflow between two web-frontend containers over port 80:

As expected, traffic between these two containers is permitted. So, lets change that. In the NSX-T web interface go to inventory -> Groups and click on “Add”. Give it a meaningful name.

As for membership Criteria, we can select the tags we’ve previously designed, namely tier and app.

Click “add”. After which we can validate:

We can then create a firewall rule to block TCP 80 between members of this group:

Consequently, if we run the same traceflow exercise:

Conclusion

NSX-T provides an extremely comprehensive framework for containerised applications. Given the nature of containers in general, I think container + microsegmentation are a winning combination to secure these workloads. Dynamic inclusion adheres to the automated mentality of containers, and with very little effort we can implement microsegmentation using a framework that is agnostic – the principles are the same between VM’s and containers.

Preamble

Having spent a number of months familiarising myself with container technology I inevitably got “stuck in” with Kubernetes. Containers are brilliant, but I personally don’t see the value of managing individual containers – it’s still the pets vs cattle mentality. Orchestrating containers with the likes of Kubernetes, however, makes a ton of sense and reinforces the microservices approach to building and deploying applications.

To test myself, I decided to document end-to-end the entire journey from taking a web server residing on a standalone virtual machine, containerise it, and deploying it via Kubernetes.

Disclaimer – I’m not a developer so the application example I’ll be using is relatively simple – but the fundamentals would be similar for other applications of increasing complexity.

Current and Intended State

I currently have a simple HTTP web server running on an Ubuntu VM on an ESXi host. For many reasons, this is a suboptimal design. The web server is facilitated by Apache2. As far as configurations go, it’s almost as basic as you can get, but surprisingly (shockingly) is widespread, even with front-facing, live websites.

At the end of this exercise, we will have redeployed this application in the following fashion:

So, to quote Khan from Star Trek Into Darkness:

Install Docker

You can install Docker on a number of operating systems, however, I had a spare Ubuntu server box idling so I used this as a kind of “staging” box where I could tinker with creating the Docker components prior to installing Kubernetes, but you could do this on a Kubernetes worker node if desired.

To create a docker image (and consequently a container) we must create what’s known as a Dockerfile. In short, a Dockerfile is a human-readable document that acts as a guide on how to create your image. Think of it like an instruction manual you get with flat-pack furniture, it provides the steps required to get to the final, constructed model. Hopefully without any spare screws.

So, with your text editor of choice, create “Dockerfile” within the application directory containing your code. Below is one for my application, which we will break down:

FROM – All “Dockerfile” files must begin with a “FROM” statement. This defines the base image for our application, which is pulled from Dockerhub (https://hub.docker.com/explore/). Official releases are available from a number of companies, including Ubuntu, MySQL, Microsoft, NGINX, etc. These differ from your bog-standard OS install. Much more lightweight, hardneded and specifically engineered to cater for containerise workloads.

LABEL – This is metadata denoting the maintainer for this image.

RUN – When the docker image is instantiated, the following commands will be executed to compile this image. As you can see for this image I specify to update and upgrade the base OS as well as install Apache2.

COPY – This command copies over my application data (in this case, index.html) into /var/www/html, which is the root directory for the Apache2 service.

WORKDIR – Sets the working directory.

CMD – Runs a command within the container after creation. In this example, I’m specifying the Apache2 service to run in the foreground.

EXPOSE – Defines which port you want to open on containers from this image. As this is a webserver, I want TCP 80 open (TCP is the default). You can also add TCP 433, or whichever port your application requires.

The next step is to build our image using “Dockerfile” to do this, we can issue the following command. The “.” dictates we will use “Dockerfile” from the current directory.

docker build -t webapp:0.1 .

This command names the constructed image as “webapp”. The value after the colon determines the version of this image, in this case, I’m tagging this image as version 0.1. Should I make a change, I can recompile the image and increment the version number.

After issuing this command the terminal window will output the build process, similar to below. This includes Docker dragging down the base image and making modifications as per our Dockerfile:

Install Kubernetes

As shown in the diagram at the beginning of this post, Kubernetes is composed of master and worker nodes in a production deployment. For my own learning and development I wanted to recreate this, however, there are ways you can deploy single-server solutions. For my test environment I created the following:

3x VM’s

Ubuntu 18.04

2vCPU

2GB RAM

20GB Local Disk

Single IP – Attached to the management network

In an ideal world, you would flesh out the networking and storage requirements, but for internal testing, this was sufficient for me.

Once the VM’s are installed, create the master node by installing Kubernetes:

Next, we can initialise our master, but before we do so, consideration needs to be made with regards to the networking model we’ll be using. For my example, I used flannel, which states that we need to define the CIDR address range for our containers during the initialisation process.

sudo kubeadm init --pod-network-cidr=10.244.0.0/16

Which results in the following:

Follow the instructions to run the mkdir, cp and chown commands as a non-root user. At the bottom is a command to add worker nodes – keep this safe.

The process for creating worker nodes is similar – Install Docker (previously covered) and install the Kubernetes packages minus running the kubeadmin init command – replace it with the Kubeadm join command. Nodes can then be validated on the master node.

Deploy an application to the Kubernetes cluster

At this stage, we have a functioning, albeit empty K8s Cluster, but it’s ready to start hosting applications. For my application, I took a two-step process:

Configure the replication controller (how many containers should run for this app)

Configure the service object (how to access this cluster from the outside)

A Replication Controller in Docker ensures that a specified number of container replicas are running at any one time. In this example, I have created an account on Dockerhub and uploaded my image to it, so my worker nodes can pull it. We define a replication controller in a YAML file:

Spec: How this application should be deployed, including the image to be used.

Replicas: How many containers for this app should be running at any given time. Kubernetes constantly monitors the environment and if there’s a deviation between how many replicas should be running, and how many are currently running, it will reconcile automatically.

Label: Labels are very important. We label the containers in this replication controller so we can later tie them into a service object. This means as containers are created and destroyed, they are automatically included in the service object based on tags – After all, we don’t care much about containers as individual entities.

Type: Type of service object. In a cloud environment, for example, we can change this to “Loadbalancer” to leverage cloud platform-specific load balancers from the likes of GCP and AWS. But for this example, I don’t have an external load balancer so it’s not applicable.

What we’re accomplishing here are two fundamental operational aspects of our application:

We declare a minimum number of containers (pods) to be available at all times to facilitate our workload.

We’re establishing a relationship between containers (pods) with a service object via the use of tags. Therefore, any new containers that are created with the same tag will automatically be included in this service object. Think of the service object as a central point to access the application. We do not access the application by directly sending HTTP requests to containers.

We have labels to define the service and which containers to include, and we also have the current list of endpoints. Think of endpoints as loadbalancer members. Because of how nodeport works we can hit any of our K8S worker nodes on port 30813 and reach our service which will load balance across all endpoints.

I tested this on my two worker nodes (and I also added a bit of code to my index.html to return the hostname of the container servicing the HTTP request):

Conclusion

I had a lot of fun doing this, and the more I learn about containers and orchestration the more I believe it’s the next facilitator change in the way we manage applications, as significant as the change between physical and virtual machines.

On the 26th of June 2018, VMware publically announced VKE – VMware Kubernetes Engine in Beta (with GA planned for later on this year). For me, the development of this solution flew under the radar, and its subsequent release came as quite a surprise – albeit quite a good one. So, where exactly does this solution fit with other Kubernetes based solutions that currently exist?

VKE Overview

VKE sits within VMware’s portfolio of cloud-native solutions as is pitched as a fully managed, Kubernetes-as-a-service offering. Therefore we have multiple ways we can consume Kubernetes resources from the VMware ecosystem, depicted in the diagram below.

Which prompts some customers to ask – Why should I pick VKS over PKS or vice-versa? From a high level, some of the differences are listed below:

PKS

VKS

Management Responsibility

Customer/Enterprise

Fully Managed

Consumption Model

Install, Configure, Manage, Consume

Consume

Residence

Public and Private Cloud

Public Cloud only

What we can ascertain here is that VKE is designed to abstract away all the infrastructure components that are required for an operational Kubernetes deployment. As a reminder, PKS is composed of:

PKS Control Plane

Kubernetes core

BOSH

Harbor

GCP Service Broker

NSX-T

Which is quite a lot to manage and maintain. VKS however, takes away the requirement for us, the customers to manage such entities, and simply provides a Kubernetes endpoint for us to consume. Networking, storage and other aspects are abstracted. Of course, there are use cases for both VKE and PKS. VKE is not looking to be a replacement for PKS.

How does it work?

Under the hood, VKS is deployed on top of AWS (who recently announced EKS, Amazon’s own managed Kubernetes-as-a-service platform) but in fitting with VMware’s ethos of “Any app, any cloud”, this is likely to extend to other cloud platforms – notably Azure. In addition to simply leveraging the AWS backend, VKS adds a few new features:

VMware Smart Cluster – Essentially, this is a layer of resource management, designed to automate the allocation of compute resources for maximum efficiency and cost saving, as well as automatic remediation of nodes.

Full end-to-end encryption – Designed so that all data, be it in transit or at rest is encrypted by default.

Why shouldn’t I just use EKS if I wanted an AWS-backed Kubernetes instance?

To be honest, this is a good question. If you aren’t using any VMware services currently, then it makes sense to go with EKS. However, existing VMware customers can potentially gain a consistent operational experience with both on-premises and cloud-based resources using familiar tools. Plus, when VKS opens up to other cloud providers, this will add tremendous agility to the placement of Kubernetes workloads, facilitating a true multi-cloud experience.

This service is obviously very new, and no doubt will change a bit up to GA, but it’s definitely worth keeping an eye on, considering the growing adoption of Kubernetes in general.