Archive

Tag: envoy

Hello folks. In this blog post I’ll share with you a problem that I had while trying out the Circuit Breaking tutorial in the Istio documentation. I’ll follow all the steps I did while troubleshooting this issue, and hopefully it will be useful for someone out there. It was for me at least, I learned a bit more about Istio in the process.

The steps of the task are quite simple:
1) Install a couple of pods (one serving httpbin + one having curl to communicate with the httpbin service)
2) Create a DestinationRule object so that calls to httpbin service are limited (Circuit Breaking)
Pretty straightforward, no? So, let’s the fun begin.

Hmmm, error 503… Why? According to the Circuit Breaking task, it should work fine. We only added a rule that sets the maximum number of TCP connections to 1, and indeed, with the curl command above we spawned just one connection. So, what’s wrong?

First thing that came into my mind was to issue the command to verify if Circuit Breaking was in effect:

So, the value 0 for the upstream_rq_pending_overflow variable confirms that no call has been trapped by the Circuit Breaking.

Explanation of the above command:
Istio sidecar (Envoy container named istio-proxy) exposes (locally) the port 15000, which is accessible via HTTP and has some utilities, such as printing some statistics about the service.
So, in the command above we executed curl (curl localhost:15000/stats) within the sidecar container (-c istio-proxy) of the helper pod (sleep-5b597748b4-77kj5), filtering the output by the service we want to investigate (| grep httpbin) then filtering for the circuit breaker pending state (| grep pending).

In order to confirm that the guilty guy is that DestinationRule, let’s delete it and try again:

What? We don’t see anything in the log output. It’s like the request is not reaching the server. So, what to do now?… Hey! What if we could increase the log verbosity? Maybe the request is coming but it’s just not being outputted? Let’s see.

Remember I told above about the Envoy’s port 15000 being exposed locally to the service pod? We used it to grab statistics. Let’s take a look at it to find out what else it offers:

The command above set all Envoy loggers to the trace level, the finest one. For more information about this administrative interface, checkout the Envoy docs. Now, let’s retry retrieving the Envoy server log, hopefully with the trace level we will get something (in fact, we get lots of logs!):

Wow, now that seems interesting! We can see that the request is indeed coming to the server, but it’s failing due to a handshake error and Envoy is closing the connection. The question now is: Why a handshake error? Why is SSL involved at all?

When speaking of SSL in the context of Istio, we remember of Mutual TLS. Then I went to Istio docs, trying to find something relevant to my problem. Reading the Security tutorial Task opened my eyes!

Those outputs above show that mTLS is installed in the cluster. Those objects only exist when mTLS is on.

OK, looking back at my installation scripts, I realized that I really messed up and installed Istio with mTLS on. However, the question is still there: Why the httpbin service is failing? Knowing that mTLS is active in the mesh, and reading the documentation, it’s not hard to deduce that the server is expecting a TLS connection and the client is issuing a plain text one. We changed the question again: Why is the client (sleep pod) connecting to the server (httpbin pod) using plain text?

Again, looking at the documentation we find the answer. The way mTLS works in Istio is simple: There is a default DestinationRule object (called “default”, as we can see in the command above) that instructs all traffic in the mesh to go through TLS. However, when we created our own DestinationRule, for the purpose of the Circuit Breaking task, we did overwrite that default configuration with our own, which has no TLS at all! This is stated at the TLS documentation for Istio:

Don’t forget that destination rules are also used for non-auth reasons such as setting up canarying, but the same order of precedence applies. So if a service requires a specific destination rule for any reason – for example, for a configuration load balancer – the rule must contain a similar TLS block with ISTIO_MUTUAL mode, as otherwise it will override the mesh- or namespace-wide TLS settings and disable TLS.

So, it’s clear now what we should do: Modify the DestinationRule for the Circuit Breaking task to include the TLS block (lines 20-21 below):

Some lessons learned:
– Confirm whether you are using mTLS or not; Enabling it opens the door to obscure errors 🙂
– DestinationRules have precedence order: More specific ones overwrite global ones
– We can sometimes make good use of the Sidecar’s administrative interface (local port 15000)
– Always read the docs 🙂