thoughts on cloud native stuff

Fully automated creation of an AAD-integrated Kubernetes cluster with Terraform

Introduction

To run your Kubernetes cluster in Azure integrated with Azure Active Directory as your identity provider is a best practice in terms of security and compliance. You can give (and remove – when people are leaving your organisation) fine-grained permissions to your team members, to resources and/or namespaces as they need them. Sounds good? Well, you have to do a lot of manual steps to create such a cluster. If you don’t believe me, follow the official documentation 🙂 https://docs.microsoft.com/en-us/azure/aks/azure-ad-integration.

So, we developers are known to be lazy folks…then how can this automatically be achieved e.g. with Terraform (which is one of the most popular tools out there to automate the creation/management of your cloud resources)? It took me a while to figure out, but here’s a working example how to create an AAD integrated AKS cluster with “near-zero” manual work.

The rest of this blog post will guide you through the complete Terraform script which can be found on my GitHub account.

Create the cluster

To work with Terraform (TF), it is best-practice to store the Terraform state not on you workstation as other team members also need the state-information to be able to work on the same environment. So, first…let’s create a storage account in your Azure subscription to store the TF state.

Basic setup

With the commands below, we will be creating a resource group in Azure, a basic storage account and a corresponding container where the TF state will be put in.

AAD Applications for K8s server / client components

To be able to integrate AKS with Azure Active Directory, we need to register two applications in the directory. The first AAD application is the server component (Kubernetes API) that provides user authentication. The second application is the client component (e.g. kubectl) that’s used when you’re prompted by the CLI for authentication.

We will assign certain permissions to these two applications, that need “admin consent”. Therefore, the Terraform script needs to be executed by someone who is able to grant that for the whole AAD.

Service Principal for AKS Cluster

Last but not least, before we can finally create the Kubernetes cluster, a service principal is required. That’s basically the technical user Kubernetes uses to interact with Azure (e.g. acquire a public IP at the Azure load balancer). We will assign the role “Contributor” (for the whole subscription – please adjust to your needs!) to that service principal.

Create the AKS cluster

Everything is now ready for the provisioning of the cluster. But hey, we created the AAD applications, but haven’t granted admin consent?! We can also do this via our Terrform script and that’s what we will be doing before finally creating the cluster.

Azure is sometimes a bit too fast in sending a 200 and signalling that a resource is ready. In the background, not all services have already access to e.g. newly created applications. So it happens, that things fail although they shouldn’t 🙂 Therefore, we simply wait a few seconds and give AAD time to distribute application information, before kicking off the cluster creation.

Assign the AAD admin group to be cluster-admin

When the cluster is finally created, we need to assign the Kubernetes cluster role cluster-admin to our AAD cluster admin group. We simply get access to the Kubernetes cluster by adding the Kubernetes Terraform provider. Because we already have a working integration with AAD, we need to use the admin credentials of our cluster! But that will be the last time, we will ever need them again.

To be able to use the admin credentials, we point the Kubernetes provider to use kube_admin_config which is automatically provided for us.

In the last step, we bind the cluster role to the fore-mentioned AAD cluster group id.

Run the Terraform script

We now have discussed all the relevant parts of the script, it’s time to let the Terraform magic happen 🙂 Run the script via…

$ terraform init
# ...and then...
$ terraform apply

Access the Cluster

When the script has finished, it’s time to access the cluster and try to logon. First, let’s do the “negativ check” and try to access it without having been added as cluster admin (AAD group member).

After downloading the user credentials and querying the cluster nodes, the OAuth 2.0 Device Authorization Grant flow kicks in and we need to authenticate against our Azure directory (as you might know it from logging in with Azure CLI).

$ az aks get-credentials --resource-group <RESOURCE_GROUP> -n <CLUSTER_NAME>
$ kubectl get nodes
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code DP9JA76WS to authenticate.
Error from server (Forbidden): nodes is forbidden: User "593736cb-1f95-4f23-bfbd-75891886b05f" cannot list resource "nodes" in API group "" at the cluster scope

Great, we get the expected authorization error!

Now add a user from the Azure Active Directory to the AAD admin group in the portal. Navigate to “Azure Active Directory” –> “Groups” and select your cluster-admin group. On the left navigation, select “Members” and add e.g. your own Azure user.

Now go back to the command line and try again. One last time, download the user credentials with az aks get-credentials (it will simply overwrite the former entry in you .kubeconfig to make sure we get the latest information from AAD).

Wrap Up

So, that’s all we wanted to achieve! We have created an AKS cluster with fully-automated Azure Active Directory integration, added a default AAD group for our Kubernetes admins and bound it to the “cluster-admin” role of Kubernetes – all done by a Terraform script which can now be integrated with you CI/CD pipeline to create compliant and AAD-secured AKS clusters (as many as you want ;)).

Well, we also could have added a user to the admin group, but that’s the only manual step in our scenario…but hey, you would have needed to do it anyway 🙂

You can find the complete script including the variables.tf file on my Github account. Feel free to use it in your own projects.