Archive

Tag: istio

I made some experiments on Istio by taking down some control plane components and observing what happens with the applications and the service mesh. Below you’ll find my notes.

Pilot

Pilot is responsible for the traffic management feature of Istio, and it also is responsible for updating all sidecars with the very latest mesh configuration.

When Pilot starts it listens on port 15010 (gRPC) and 8080 (legacy HTTP).

When the application sidecar (Envoy, Istio-Proxy) starts, it connects to pilot.istio-system:15010, gets initial config and keeps connected.
Whenever pilot detects a change in the mesh (it monitors kubernetes resources), it pushes new configuration to sidecars via this gRPC connection.

– If pilot goes down, this gRPC connection between pilot and sidecar is lost, and sidecars try to reconnect to pilot indefinitely.
– Traffic is not affected if pilot is down, because all configuration pushed to the sidecar lives in sidecar memory.
– Changes in the mesh (such as new pods, rules, services, etc) won’t reach sidecars, because pilot is not there to listen for changes and forward those to sidecars.
– Once pilot is back, sidecars connect (because they are always trying to reconnect) to it and grab the latest mesh config.

Mixer Policy

Policy enforces network policy.

Mixer reads configuration on startup and also monitors kubernetes for changes. Once new config is detected, mixer loads them into its memory.

Sidecars check (call) the mixer policy pod for every request targeting the service application.

If the mixer policy pod is down, then all requests to the service will fail with a “503 UNAVAILABLE:no healthy upstream” error – because sidecar couldn’t connect to the policy pod.

In Istio 1.1 there’s a new [global] setting (policyCheckFailOpen) that allows a “Fail Open” policy, i.e., if mixer policy pod is not reachable, all requests will succeed instead of failing with a 503 error. By default this config is set to false, i.e., “Fail Close”.

While mixer is down, everything we do in the mesh (like adding rules, changing any config, etc) won’t have effect in apps until mixer is up again.

Mixer Telemetry

Telemetry provides telemetry information to addons.

Sidecars call Telemetry pod after each request is completed, providing telemetry information to adapters (Prometheus, etc). It does that in batches of 100 requests or 1 second (in default configuration), whatever comes first, in order to avoid excessive calls to the Telemetry pod.

If Telemetry pod is down, sidecars log an error (in pod stderr) and discard the telemetry information. Requests are not affected by that, like they are if Policy pod is down. Once Telemetry pod is up again, it starts receiving telemetry information from sidecars.

Other notes

It’s worth to note that Istio allows a custom installation of its control plane components. For example, if you do not need Policy, you can entirely disable mixer policy. This modularity is getting better in Istio 1.1. For more info on that, check out the docs.

Also, pilot, mixer policy and mixer telemetry work fine in a HA setup, with multiple replicas running at the same time. In fact, the default configuration comes with a HorizontalPodAutoscaler that ranges from 1 to 5 for those pods. (Curious? See this and this).

Istio in Kubernetes works using a sidecar deployment model, where a helper container (sidecar) gets attached to your main container (service) within a single Pod. By doing that, your service and the sidecar container share the same network, and can be seen like two processes in a single host. Thus Istio can intercept all network calls to and from your main container and do its magic to improve service-to-service communication.

This sidecar container, named istio-proxy can be injected into your service Pod in two ways: manually and automatically. Even this manual technique is not 100% done by hand. Let’s see.

Manual injection

Istio ships with a tool called istioctl. Yeah, it seems it’s inspired in some other loved tool :). One of its feature is the ability to inject the istio-proxy sidecar into your service Pod. Let’s use it, using a simple busybox pod as an example:

As you can see above, that command generated another yaml file, similar to the input (busybox pod) but with the sidecar (istio-proxy) added into the pod. So, in some way this is not a 100% manual job, right? It saves us a bunch of typing. You are ready to apply this modified yaml into your kubernetes cluster:

One natural question that might come is: Where does that data come from? How does it know that the sidecar image is docker.io/istio/proxyv2:1.0.2? The answer is simple: All that data comes from a ConfigMap that lives in the Istio control plane, on the istio-system namespace:

You can edit this ConfigMap with the values you want to inject into your pods. As you can see, that is mostly a template that will get appended into your pod definition. If you want to use other image for the istio-proxy container, use other tag, or wish to tweak anything that will get injected, this is the stuff you need to edit. Remember that this ConfigMap is used for injection in all your pods in the service mesh. Be careful 🙂

Because istioctl reads a ConfigMap in order to know what to inject, that means you need to have access to a working Kubernetes cluster with Istio properly installed. If for some reason you don’t have such access, you can still use istioctl, by supplying a local configuration file:

# run this previosly, with proper access to the k8s cluster
$ kubectl -n istio-system get configmap istio-sidecar-injector -o=jsonpath='{.data.config}' > inject-config.yaml
# feel free to modify that file, and you can run at any point later:
$ istioctl kube-inject --injectConfigFile inject-config.yaml ...

Automatic injection

The other way of having istio-proxy injected into your pods is by telling Istio to automatically do that for you. In fact, this is enabled by default for all namespaces with the label istio-injection=enabled. This means that, if a namespace has such label, all pods within it will get the istio-proxy sidecar automatically. You don’t need to run istioctl or do anything with your yaml files!

The way it works is quite simple: It makes use of a Kubernetes feature called MutatingWebhook which consists in Kubernetes notifying Istio whenever a new pod is about to be created, and giving Istio the chance to modify the pod spec on the fly, just before actually creating that pod. Thus, Istio injects the istio-proxy sidecar using the template found in that ConfigMap we saw above.

Sounds nice, right?

You might be asking yourself: Hey, this is too intrusive! Yep, it might be, all depends on your needs. This automatic injection is very flexible though:

In the istio-sidecar-injector ConfigMap there is a boolean flag indicating whether this automatic injection is enabled or not.

Only namespaces labelled accordingly will get automatic injection. You can selectively choose which namespaces will have automatic injection.

Also, you can tweak this label, changing it or even removing that filter (meaning injection will happen automatically in all namespaces! Be careful!) by editing the MutatingWebhookConfiguration: kubectl -n istio-system edit MutatingWebhookConfiguration istio-sidecar-injector. Look for the namespaceSelector field.

You can prevent the injection to happen in selective pods. If a pod has the annotation sidecar.istio.io/inject: "false" then Istio will not inject the sidecar in it.

You can invert the logic if you are more conservative. Disable the automatic injection for everyone and enable it only for selected pods. For that you just need to set the policy to false (kubectl -n istio-system edit configmap istio-sidecar-injector) and annotate the pods you want to get injected with sidecar.istio.io/inject: "true"

I want more flexibility!

Here is a use case where the flexibility above was not enough:

For those of you who are not familiar with Openshift (Red Hat Kubernetes distribution), it has a feature called source-to-image – s2i, which magically creates container images based on source code. You supply a git repository (works with many programming languages!) as the input and gets a container image, running on Openshift as the result.

It’s an awesome feature! I’ll not get into details here, I’m just going to tell you what we need to know right now: In order to do that, Openshift creates one or more intermediate, auxiliary pods to build the source code. When build finishes, the binary artifacts goes to the resulting container image, ready to run on Openshift, and those auxiliary pods are then discarded.

If we enable automatic injection in a given namespace and use Openshift’s s2i feature, this means that all pods will get the sidecar injected. Even those builder (intermediate, auxiliary) pods! Worse, as they are not under our control (they are created by Openshift, not by us), we cannot annotate them to not get the sidecar injected. Builds do not behave well with the sidecar injected, so we don’t want they to be automatically injected. What to do now?

Well, a possible solution is to take the conservative approach explained in the last bullet above: Disabling the automatic injection for everyone and enable it only for selected pods. That works, but require you to actively annotate the pods you want the automatic injection to happen.

Or… There’s a new solution for this problem:

The new solution

Starting with 1.1.0, Istio automatic injection has a way to add exceptions based on labels, that mean: Do not inject the sidecar in pods that match those labels, even if the policy is true and this namespace is marked to have automatic injection. You can add those exceptions in the istio-sidecar-injector ConfigMap:

You can see above a field neverInjectSelector. It’s an array of Kubernetes label selectors. They are OR‘d, stopping at the first match. The above statement means: Never inject on pods that have the label openshift.io/build.nameoropenshift.io/deployer-pod-for.name – the values of the labels don’t matter, we are just checking if the keys exist. With this rule added, our Openshift s2i use case is covered, meaning auxiliary pods will not have sidecars injected (because s2i auxiliary pods do contain those labels).

For completeness, you can also use a field called alwaysInjectSelector, which will always inject the sidecar on pods that match that label selector, despite of the global policy.

The label selector approach gives a lot of flexibility on how to express those exceptions. Take a look at their docs to see what you can do with them!

It’s worth to note that annotations in the pods still have the preference. If a pod is annotated with sidecar.istio.io/inject: "true/false" then it will be honored. So, the order of evaluation is:

As this {Never,Always}InjectSelector is a recent addition, I still have to update the docs to mention it, but for all other stuff, for more information and examples check out the official documentation.

Why is my pod [not] getting injected?

This is a very common question. You have followed all the instructions (like labeling the namespace with istio-injection=enabled) and your pods are not receiving the automatic injection.

Or quite the opposite, you annotated your pod with sidecar.istio.io/inject: "false" and it is getting injected. Why?

One thing you can do in order to find out what’s going on is to look at the sidecar-injector pod logs:

Then you can create your pods and watch that log for any output. For a more verbose log output – trust me, it’s really useful – then we should edit the sidecar-injector deployment and append the argument --log_output_level=default:debug into the sidecar-injector container executable:

Hint

If even with the debug output enabled you did not see anything relevant in your logs, that means that the sidecar-injector pod is not even being notified about the pod creation. It’s not being invoked to do the automatic injection. This can be due to a misconfiguration regarding the namespace label. Check if the namespace is labeled according with what it’s in the MutatingWebhookConfiguration. By default the namespace should have the label istio-injection=enabled. Verify if this has changed by running kubectl -n istio-system edit MutatingWebhookConfiguration istio-sidecar-injector and checking the field namespaceSelector.

When you are finished with your debug session, you can edit the deployment again and remove that debug argument.

That’s it. Hope that helps. Feel free to ask or comment anything, here or on twitter! See you soon!

Hello folks. In this blog post I’ll share with you a problem that I had while trying out the Circuit Breaking tutorial in the Istio documentation. I’ll follow all the steps I did while troubleshooting this issue, and hopefully it will be useful for someone out there. It was for me at least, I learned a bit more about Istio in the process.

The steps of the task are quite simple:
1) Install a couple of pods (one serving httpbin + one having curl to communicate with the httpbin service)
2) Create a DestinationRule object so that calls to httpbin service are limited (Circuit Breaking)
Pretty straightforward, no? So, let’s the fun begin.

Hmmm, error 503… Why? According to the Circuit Breaking task, it should work fine. We only added a rule that sets the maximum number of TCP connections to 1, and indeed, with the curl command above we spawned just one connection. So, what’s wrong?

First thing that came into my mind was to issue the command to verify if Circuit Breaking was in effect:

So, the value 0 for the upstream_rq_pending_overflow variable confirms that no call has been trapped by the Circuit Breaking.

Explanation of the above command:
Istio sidecar (Envoy container named istio-proxy) exposes (locally) the port 15000, which is accessible via HTTP and has some utilities, such as printing some statistics about the service.
So, in the command above we executed curl (curl localhost:15000/stats) within the sidecar container (-c istio-proxy) of the helper pod (sleep-5b597748b4-77kj5), filtering the output by the service we want to investigate (| grep httpbin) then filtering for the circuit breaker pending state (| grep pending).

In order to confirm that the guilty guy is that DestinationRule, let’s delete it and try again:

What? We don’t see anything in the log output. It’s like the request is not reaching the server. So, what to do now?… Hey! What if we could increase the log verbosity? Maybe the request is coming but it’s just not being outputted? Let’s see.

Remember I told above about the Envoy’s port 15000 being exposed locally to the service pod? We used it to grab statistics. Let’s take a look at it to find out what else it offers:

The command above set all Envoy loggers to the trace level, the finest one. For more information about this administrative interface, checkout the Envoy docs. Now, let’s retry retrieving the Envoy server log, hopefully with the trace level we will get something (in fact, we get lots of logs!):

Wow, now that seems interesting! We can see that the request is indeed coming to the server, but it’s failing due to a handshake error and Envoy is closing the connection. The question now is: Why a handshake error? Why is SSL involved at all?

When speaking of SSL in the context of Istio, we remember of Mutual TLS. Then I went to Istio docs, trying to find something relevant to my problem. Reading the Security tutorial Task opened my eyes!

Those outputs above show that mTLS is installed in the cluster. Those objects only exist when mTLS is on.

OK, looking back at my installation scripts, I realized that I really messed up and installed Istio with mTLS on. However, the question is still there: Why the httpbin service is failing? Knowing that mTLS is active in the mesh, and reading the documentation, it’s not hard to deduce that the server is expecting a TLS connection and the client is issuing a plain text one. We changed the question again: Why is the client (sleep pod) connecting to the server (httpbin pod) using plain text?

Again, looking at the documentation we find the answer. The way mTLS works in Istio is simple: There is a default DestinationRule object (called “default”, as we can see in the command above) that instructs all traffic in the mesh to go through TLS. However, when we created our own DestinationRule, for the purpose of the Circuit Breaking task, we did overwrite that default configuration with our own, which has no TLS at all! This is stated at the TLS documentation for Istio:

Don’t forget that destination rules are also used for non-auth reasons such as setting up canarying, but the same order of precedence applies. So if a service requires a specific destination rule for any reason – for example, for a configuration load balancer – the rule must contain a similar TLS block with ISTIO_MUTUAL mode, as otherwise it will override the mesh- or namespace-wide TLS settings and disable TLS.

So, it’s clear now what we should do: Modify the DestinationRule for the Circuit Breaking task to include the TLS block (lines 20-21 below):

Some lessons learned:
– Confirm whether you are using mTLS or not; Enabling it opens the door to obscure errors 🙂
– DestinationRules have precedence order: More specific ones overwrite global ones
– We can sometimes make good use of the Sidecar’s administrative interface (local port 15000)
– Always read the docs 🙂