Papo's log

Saturday, March 09, 2019

I decided to up my command line game, so this a 'remember how to do stuff' post for me. I do acknowledge that there are tons of different ways on doing things, especially when you have to deal with the command line. So don't shoot the pianist.

Step 0: I do use brew
Use brew to manage a lot of command line tools + gui apps. You can find it here. I also occasionally use, CakeBrew (just to check on deps)

Friday, December 07, 2018

I have been working in a team where we use kubernetes in production (not the nginx example- the real shit) for 2 years now. I have configured and used Kubernetes clusters from version 1.4.x with tools like kube-aws to 1.6-1.7 configured with kops. Amazon's EKS is the third breed of kubernetes provisioning solutions that I have the chance to try and this post is about my recent experiences for a week, trying to bring a production level EKS into life and check if it would cut it for our production needs.

This post would not have been possible without the contribution and hard work of my colleague JV - thanks!!!

EKS basics

For those who seek an executive summary of EKS. Its an AWS managed Service (like for example your Amazon Elastic Cache). Amazon provisions, updates and patches the brains of your cluster, aka the control planes + etcd. There is kind of a flat rate (price) for the masters + EC2 standard billing for your worker fleet. AWS also provides a custom networking layer, eliminating the need to use any additional overlay network solutions like you would do if you create the cluster on your own. You are responsible for provisioning and attaching - the worker nodes. AWS provides templates (Cloud-formation) with pre-configured workers. You are responsible for installing on top of the cluster all the other services or applications that are needed by your platform e.g how to collect logs, how to scrape metrics, other specific daemons etc. Also make note that once the cluster is up, there is nothing AWS specific, you get a vanilla experience (exception is the networking plugin).

How do I start?

There are a couple of options for spinning an EKS cluster

The infamous click click on the dashboard (its good if you want to play but not production ready, meaning if you want to re-provision and test)

We started the PoC with option 4.1 . So we used the official terraform guide (thank you Hashicorp) and then the worker provision was terraformed as well. So we did not keep the standard cloudformation extract from AWS. As you can understand, the tool of choice sometimes is dictated by the available levels of skills and experience within the team. In general we love terraform (especially us the developers) .

Other things to consider before I start?

So, as we discovered and of course it was very well documented, an EKS cluster due to the networking features that it brings (more on this later), really shines when it occupies its own VPC! Its not that you can not spin an EKS cluster on your existing VPCs but make sure you have enough free IPs and ranges available since by default the cluster - and specifically the workers, will start eating your IPs. No this is not a bug, its a feature and it actually makes really sense. It is one of the things that I really loved with EKS.

First milestone - spin the masters and attach workers

The first and most important step is to spin your masters and then provision your workers. Once the workers are being accepted and join the cluster you more or less have the core ready. Spinning just masters (like many articles out there feature is like 50% of the work). Once you can create an auto-scaling group where your workers will be created and then added to the cluster - this is like very close to the real thing.

Coming back to the Pod Networking feature

If you have ever provisioned a kubernetes clusters on AWS, using tools like kops or kube-aws, then you most probably have already installed or even configured the overlay network plugin that will provide pod networking in your clusters. As you know, pods have IPs, overlay networks on a kubenretes cluster, provide this abstraction see (calico, flannel etc). On an EKS cluster, by default you don't get this overlay layer. Amazon has actually managed to bridge the pod networking world (kubernetes networking) with its native AWS networking. In plain words, your pods (apps) within a cluster do get a real VPC IP. When I heard about this almost a year ago I have to admit I was not very sure at all, after some challenges and failures, I started to appreciate simplicity on the networking layer for any kubernetes cluster on top of AWS. In other words if you manage to remove one layer of abstraction, since your cloud can natively take over this, why keep having one extra layer of networking and hops where you can have the real thing?

But the workers pre-allocate so many IPs

In order EKS optimize Pod placement on the worker, uses the underlying EC2 worker capabilities to reserve IPs on its ENIs. So when you spin a worker even if you there are no pods or daemons allocated to them, you can see on the dashboard that they will have already pre-allocate a pool of 10 or depending on the class size, number of IPs. If you happen to operate your cluster on a VPC with other 'residents' your EKS cluster can be considered a threat! One way to keep the benefits of AWS CNI networking but make some room on VPCs that are running out of free IPs is to configure- after bringing up the masters - the 'aws-node' deamon set. This is an AWS specific deamon part of EKS magic that make all this happen. See here for a similar issue. So just

kubectl edit deamonset aws-node -n kube-system

and add the `WARM_IP_TARGET` to something smaller.

Make note as we discovered, setting the WARM IP TARGET to something smaller, does not limit the capacity of your worker to host more pods. If your worker does not have WARM IPs to offer to newly created and allocated pods will request a new one from the networking pool.

In case that that even this work around is not enough - then there is always the options to switch on calico on top of the cluster. See here. Personally after seeing CNI in action I would prefer to stick to this. After 2 years with cases of networking errors, I think I can trust better AWS networking. There is also the maintenance and trouble shooting side of things. Overlay networking is not rocket science but at the same time is not something that you want to be spending time and energy trouble shooting especially if you are full with people with these skills! Also the more complex your AWS networking setup is, the harder it becomes to find issues when packets jump from the kubernetes world to your AWS layer and vice versa. It is always up to the team and people making the decisions to choose the support model that they think fits to their team or assess the capacity of the team to provide real support on challenging occasions.

What else did you like? - the aws-iam-authenticator
Apart from appreciating the simplicity of CNI I really found very straight forward the integration of EKS with the existing IAM infrastructure. You can use your corporate (even SAML) based roles / users of your AWS account to give or restrict access to your EKS cluster(s). This is a BIG pain point for many companies out there and especially if you are an AWS shop. EKS as just another AWS managed service, follows the same principles and provides a bridge between IAM and kubernetes RBAC!. For people doing kubernetes on AWS, already know that in the early days, access to the cluster and distribution of kube configs - was and still is a very manual and tricky job since the AWS users and roles mean nothing to the kubernetes master(s). Heptio has done a very good job with this.

What is actually happening is that you install the aws-iam-authenticator and attach it to your kubectl , through ./kube/config. Every time you issue a command on kubectl, it is being proxied by the aws-iam-authenticator which reads your AWS credentials (./aws/credentials) and maps them to kubernetes RBAC rules. So you can map AWS IAM roles or Users to Kubernetes RBAC roles or create your own RBAC rules and map them. It was the first time I used this tool and actually works extremely well! Of course if you run an old kubernetes cluster with no RBAC it wont be useful but in the EKS case, RBAC is by default enabled! In your ./kube/config the entry will look like this.

What about all the other things that you have to install?
Once the cluster is ready so you have masters and workers running then the next steps are the following, and can be done by any admin user with appropriate `kubectl` rights.

And of course..all the things that you need or have as dependencies on your platform.

Should I use EKS?

If you are an AWS user and you have no plans on moving away, I think is the way to go!

If you are a company/ team that wants to focus on business delivery and not spend a lot of energy keeping different kubernetes clusters alive, then YES by all means. EKS reduces your maintenance nightmares and challenges 60-70% based on my experience.

If you want to get patches and upgrades (on your masters) for free and transparently - see the latest kubernetes security exploit and ask your friends around, how many they were pushed to ditch old clusters and start over this week (it was fun in the early days but it is not fun any more). So I am dreaming of easily patched clusters and auto upgrades as a user and not cases like - lets evacuate the cluster we will build a new one!

Is it locking you on a specific flavour? No the end result is a vanilla kubenetes, and even that you might be leveraging the custom networking, this is more less the case when you use a similar more advanced offering from Google (which is a more complete ready made offering).

If you have second thoughts about region availability, then you should wait until Amazon offers EKS on a broad range of regions, I think this is the only limiting factor now for many potential users.

If you already have a big organization tightly coupled with AWS and the IAM system - EKS is the a perfect fit in terms of securing and making your clusters available to the development teams!

Overall it was a very challenging and at the same time interesting week. Trying to bring up an EKS cluster kind of pushed me to read and investigate things on the AWS ecosystem that I was ignoring in the past.

Sunday, December 02, 2018

In the past 2,5 years while working for Ticketmaster in London, I had the chance to use extensively (still do) Fastly, one of the best CDN and Edge Compute services out there. Fastly is not just a web cache but a wholistic Edge service offering. It enables integrators (users) to actually bring their application (web service/site) closer to their end users and at the same time offload some of the logic and decision making that every web application has to do, from the main origins (your servers) and push it to the edge (a simplified explanation of edge computing).

Intro

The post is mostly a follow up (I know it took some time) of my short talk on Fastly's EU Altitude day here in London. You can find the video here. One of the problems we were facing at Ticketmaster is the extremely big number of services (domains) owned by different teams and parts of the company. Some times a website might span 3-5 different individual Fastly Configs e.g you have the dev site, the prod site, the staging site, or a slightly experimental version of the site, on all of them you maintain a different set of origins and you want to `proxy` them through fastly in order to test end to end the integration etc.

To cut a long story short, the number of domains (services) was getting big very big. In the early adoption days, most of the teams and individuals would login to the web console of Fastly and would start updating configs (like you would do with AWS yeas ago). Some other teams would use things like Ansible and some custom integrations. I was always overwhelmed by both approaches. I could see that having 200 different configs and people manually editing them was not going to scale, at the same time be-spoke integrations with different tools felt very complex +not out of the box support. After some time I discovered the Fastly Terraform Provider and I think this was like a milestone, we could finally, treat our Fastly configs as code, push them to repos, introduce CI/CD practises and have a very good grip on who is doing what at any given time with our services.

The example

You can find the example here. My main motivation is not to show you how you do Terraform, but to provide a simple project structure (is dead easy) and how the terraform provider can be applied. Also dont worry about the specifics, this example is about having a personal domain (www.javapapo.eu) which I own, and then serving using an S3 bucket some static content. You dont have to do the same, the domain could be your company's team domain and the backends do not have to be some simple S3 buckets.

The main points of this example are:

Provide a sample structure to start with

Illustrate the basics of the Fastly terraform provider

Illustrate how you can add and upload custom VCL to your terraformed Fastly Service

Pre-Req

You will need a Fastly account. Once you activate your account you will need to create a new an API TOKEN see here. This token will be used by the Terraform provider, to talk to Fastly.

You will need to export or provide to terraform this key!

export FASTLY_API_KEY=XXXXXXXXXXXXXXXXXXXX

Also you might notice that in my example I use a private S3 bucket to store my terraform state. You don't have to do the same, especially if you plan experimenting first. So you can remove the section where I configure a backend for the terraform provider.

In case you want to use an S3 bucket you will need to provide to terraform the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY on your environment.

In case you want to 100% replay the example, you need your own domain, attach a Fastly Service to the domain, use an S3 bucket the. So I had to:

I register a domain on AWS, example : www.javapapo.eu

I added a CNAME on my domain through the AWS console to point to Fastly (NON SSL) see here

I created an S3 bucket (with the appropriate policies, so I can serve some static html)

About the example

As you can see the structure of the project is quite flat. You have a couple of terraform files and then a subfolder with the custom VCL that we want to upload to our service. The Fastly terraform provider is actually a mirror, of the properties and settings you can configure manually through the Dashboard. If you are a newcomer I would just you open a browser tab on Fastly's Dashboard, then another one on the documentation of the provider see here so that you can see how each section in terraform correlates with the real thing. It is actually ok if you start experiment with Fastly using the console (dashboard). Once you gain some experience then you move to automating most of the configs that you had applied manually int he past.

Till now the most frequent question I get is related to combining Terraform and VCL together. On very advanced scenarios you will want to take advantage of the capability of fastly to be a platform where you can take smart decisions based on the incoming requests and actually offloading some processing time from your origins. The more logic you add the more you offload logic (but watch out - not to overcomplicate things, meaning your origins will still be the source of your business logic). Fastly can help you on executing some of this web related logic, like make redirects, make smart caching, load balance through the use of a somewhat primitive C like language called VCL. For those that have extensive experience with Varnish, well we are talking about Varnish 2.x VCL code.

You actually attach to your service pieces of your VCL code. Fastly will use all this code and will combine them with the standard code being executed for you per request. Here is an example on the provided main.vcl of my service where I do a redirect to my personal blog (here), when you try to hit the url [http://www.javapapo.eu/blog].

This is a typical example where I leveraged Fastly to make a smart decision upon an incoming request and perform a redirect without letting the request land on my origins or web servers. You can find the reference documentation of Fastly VCL here. So have a look on the example and use it a starting point....Happy Fastly-ing

Tuesday, October 23, 2018

I have been working with Kubernetes in production for 2 years now. Over this time we have migrated 3 or 4 times clusters, changed techinques and tools of deployments, changed habits or pushed back to some of the hype and trends coming up every day on the kubernetes scene. I guess nothing new here, this is what almost everybody does.

My personal take on this ever changing scene is to keep things simple - as much as possible, especially while your underlying infrastructure is changing from version to version. Kubernetes 1.4 (the first version I used in production) has the same basic constructs as version 1.7 and 1.10 currently in production, but many other other things come and go or change. When you are adapting or trying to adopt kubernetes on your org you need to rely on these simple and basic constructs that will make your transition from one version to the other painless, through with time you master more advanced and `cryptic` features. Anyway the point is, keep it simple.

On the early adoption days, we followed what felt natural based on our previous experiences. We were (still are) creating services spreading them around accomodating for the different business needs and features. Kubernetes is our golden cage and creating new services / apps never felt so easy. Going back to our previous experiences, while creating a new app/ service you feel the urge to `expose` it somehow, to your private network, to some namespace, to some context. Using the service type `LoadBalancer` while in AWS spinning ELB's felt like total magic! At first you have a couple of services, talking to each other, exposing some public API and it was good, you were provisioning your ELB's, you could related to the old concept a DNS entry for a specific service. Old school I know but it felt good.

Time goes by the 2 services are 15 now, they talk to each other, they integrate. Creating some many ELB's or Services becomes a messy and error prone business! Even within the kubernetes namespace context - dealing with some many different service entries is challenging - especially when you follow point to point integration. It was obvious that you need some kind of proxy. A step further

Personally I always thought that the proposed Ingress API / Solution from kubernetes despite the `good` intentions - was too much, overcomplicated for my over simplifying mind in its ever ending battle to keep things simple. Also, I never felt good with the idea to have 1 everything under the same ingress hose. So despite having examples of things like Traefik or Nginx as the reference Ingress controllers, personally I thought that I was increasing my platform's configuration entropy and complexity. Maybe a bit spoiled by all the nice things abstracted by Kubernetes after all this time, adding some serious config for doing something like simple never made it to my book. So yes to Ingresses and Proxies but most probably multiple and independent Ingresses within the same cluster and not one.

On the other end internal proxies within the kubernetes network - was always something OK to do, but kubernetes with its DNS service was always easing the pain- at the end of the day everything is just a label in a `service.yml` definition. Internal service names change or move easily.

One night I end up reading this article by R.Li (CEO of Datawire) and I was like `yes!! I totally agree with you`. This is how I decided to give Ambassador a go. What I really liked was the fact that Ambassador which is a kubernetes wrapper around Envoy proxy (a bit oversimplified definition apologies), takes away all the config that you would have to do using the Kubernetes Ingress resources, glues almost instantly on your existing deployment - being aware of your existing services and then leaving you only the task to configure its proxying rules and proxying behavior.

So in 1 place, you get to spin a horizontally scaled L7 proxy, no news here, the same way you scale your pods you would scale Ambassador, and either you can feed it your proxying config or you can enhance your existing service definitions with metadata that Ambassaor will detect (since it talks and understands the Kubernetes API). Currently I prefer (the first) - adding directly to the Ambassador deployment all the configs instead of adding labels and metadata to our existing services. In away our Services and the pods behind our services do not have a hint that are being proxied and this gives us great freedom to play around and change configs in 1 place without risking errors or mistakes to existing pieces of the puzzle!

Sunday, March 11, 2018

Disclaimer - don't shoot the pianist

It's been some time since I wanted to write this post. For those that are going to read it, I need to say that a) I hate the term DevOPs as a way to 'label' engineers b) I'm a software developer meaning that I am the type of guy who really enjoys high-level languages like Java/ Kotlin etc and has been following DevOps principles per occasion or workplace, but being honest I did not have a solid Ops/Admin background. In other words, not all shops out there do or encourage DevOps as a engineering culture, so it really depends if you re going to experience it or not.

It is evident though, that through time in many companies, the typical developer stereotype is changing, a developer is asked to engage more often with then overall release life cycle of software rather than limiting himself/herself to only writing the code and then delegating to others to configure, ship, deploy and maintain software in production.

IMHO there is no better person to bring the code alive and take care of it, from the person who wrote it and knows its strengths and weaknesses. So please don't shoot the pianist in case you don't agree with my views or my perception of things.

My experience with Terraform

I started seriously working with terraform 2 years ago. At first, it was like oh ok I don't know what I am doing, I don't understand what is going on. Then I progressed on copy-paste-ing this template run it and see what is happening, then I was like 'OK I am going to copy paste someone else's thing modify it a bit and support it'. Now I am able to follow through all the templates, modules and in general kind of find my way around most of the production deployments.

So definitely there is a learning curve, if you happen to work in a place that Terraform is being adopted the higher the chances are that you are going to step up your game once you seriously invest some time. It's not rocket science, after all from a user's perspective, its some simple notation, where you express how you want your cloud infrastructure to look like. Example, terraform please create for me a VM, some networking, a load balancer, some access right on the VM and please initialize them. Boom!

Having already lots of day to day experience with kubernetes and being very confidend with Kubernetes deployment descriptors and tools like Helm, Terraform feels familiar in its core. I would argue that terraform is very easy to use, the only thing you need is a couple of good resources, some reading and your personal cloud account (e.g AWS). It is not a corporate-only tool, neither requires some magic heavy infrastructure.

Just a command line utility and a couple of API keys to access your cloud.

I would highly suggest you start with Terraform Up and Running and them move to 'The Terraform Book'. I kind of did it the opposite way, which I think it was wrong, so I high recommend you go through the above resources with that order.

So what is Terraform? (for newbies)

Its a utility that you can download on your machine (I used brew) or use it as a docker container, which will enable you to change and shape your cloud infrastructure. This utility is fed with a set of files (deployment descriptors) where you will describe what things you want to run and instantiate on your cloud (e.g AWS or google cloud). E.g spin vms, databases, s3 buckets, etc.

Terraform will read these files and translate them to actual API calls to the cloud provider, will also do the dirty job instantiating the resources for you with the correct order and 'save' the state, meaning save somewhere what is already deployed so that next time that you run it can check what is already deployed and peform an more fine grained update (like a diff).

Some might argue, why you want to do that, since you can login to the console of your cloud provider and eventually perform all these actions. Well for sure you can do that but

Manual changes to your cloud infrastructure especially as it gets bigger and complex are error prone

Manual means - manual labor - which means that you need to do the same things all over again - where you could just automate and script them.

Every time you do a change you need to make sure that you dont break or alter something else, and we all know how complex the settings are in many cloud providers.

You dont have continuous deployment semantics. Repeatable builds and configuration.

Support more than one users, that try to 'mutate' the state, e.g development teams or organizations.

Should I invest time learning terraform as a skill?

Definitely yes if you ask me. As I said my background is not Ops so mastering the details of different cloud providers proves to be a full time job for many people. In my mind whatever tool or technology helps to reduce this learning curve is a good thing. Terraform abstracts in a way your interaction with the cloud provider. It does not abstract the cloud provider, meaning that if I write some terraform for AWS and I want to do more or less the same things for Google cloud, you will end up writting something a bit different, since cloud providers are different. But at least you will have a common way of interacting with the different clouds, and you will develop skills on how to quickly leverage and re-use all your terraform resources. Instead of becoming a master on the AWS cloud console and the google Cloud console, you will be a master on writting template files that instruct terraform to perform stuff for you.

Terraform is not only about AWS

Since AWS holds a large portion of the market ( 50/60%) meaning is the no1 cloud of choise for individuals and companies, many people assume that Terraform is somewhat AWS specific, which is of course not true. Terraform has this thing called providers, which is actually an abstraction for many different clouds or services. So yes the most commonly used is AWS but you will find an extensive list of different services that can be 'terraformed' . Lately I am spending lots of time with the Fastly Provider, fastly being a CDN!

If you know terraform you dont become a cloud expert! A small trap.

Well it is not a trap of Terraform, but no matter if you master the way terraform works and you quickly find yourself 'terraforming all the things', using its powerful template engine and all the nice utilities and resources, you must really know what you are doing.

In plain words, if you are good at terrraform does not mean that suddenly you became an AWS or Google Cloud expert! Becoming an expert on the different clouds, is a full time job and requires time and effort. Thats why we have AWS experts, Google cloud experts etc. Maybe more modern clouds, e.g the Google cloud abstract more things compared to traditional (older clouds) like AWS, but still at some point you are going to end up, searching or having to master the details.

I have to admint my journey on mastering thenAWS cloud is still on going and I feel I have a looong way, but this is part of the process I guess!

Sunday, October 29, 2017

It seems that over the years the services and applications I tend to use, for chatting and keeping in touch with friends and family change. Just some minutes ago, I removed from my MacOSX Dock my beloved 'Adium' which served me well for far too many years, being my main IM app for many services like MSN, Gtalk, Yahoo. Of course it is not about Adium, Adium is just the medium but is all about the services.

In the past few years, I can see that my somewhat vibrant contanct list on MSN and Gtalk has faded out, most of my friends (or even me) are not using them a lot and we have moved to either Facebook or Slack.

The latter seems to be the killer for all the above for me. For regular chat with close friends, we have been creating private channels. I have to admit this non -corp use of slack is kind of handy. I still hate facebook chat, but since almost everybody is in there, you end up using their service.