Hi.
I've set up a multiple upstream configuration with nginx as a load balancer.
And yup, I'm getting 'no live upstreams' in the error log. Like in 1-3% of
requests. And yes, I know how this works: nginx is marking a backend in an
upstream as dead when he receives an error from it, and these errors are
configured with proxy_next_upstream; plus, when all of the servers in an
upstream group are under such holddowns, you will get this error. So if
you're getting these errors, basically all you need is to fix the root cause
of them, like timeouts and various 50x, and 'no live upstreams' will be long
gone.
But in my case I'm getting these like all of a sudden. I would be happy to
see some timeouts, or 50x from backends and so on. Nope, I'm getting these:
2016/09/14 20:27:58 [error] 46898#100487: *49484 no live upstreams while
connecting to upstream, client: xx.xx.xx.xx, server: foo.bar, request: "POST
/mixed/json HTTP/1.1", upstream: "http://backends/mixed/json", host:
"foo.bar"
And in the access log these:
xx.xx.xx.xx - - [14/Sep/2016:20:27:58 +0300] foo.bar "POST /mixed/json
HTTP/1.1" 502 198 "-" "-" 0.015 backends 502 -
and the most funny thing is that I'm getting a bulk of these requests, and
previous ones are 200. It really looks like the upstream group is switching
for no reason to a dead state, and, since I don't believe in miracles, I
think that there must be a cause for that, only that nginx for some reason
doesn't log it.
So, my question is, if this isn't caused by the HTTP errors (since I don't
see the errors on the backends) - can this be caused by a sudden lack of l3
connectivity ? Like tcp connections dropped, some intermediary packet
filters and so on ?
Thanks.
Posted at Nginx Forum: https://forum.nginx.org/read.php?2,269577,269577#msg-269577