We have observed behavior where haproxy reloads cause a situation where the “finishing” PID never actually finishes, despite lsof showing that it has no active TCP connections.

If we disable HTTP/2 completely (IPv4 and IPv6) the problem goes away, and the finishing PIDs do indeed go away when they have no more connections.

After lots of lsof’ing later, we noticed that a child we expect to have finished had this line in lsof output:

haproxy 26670 haproxy 429u sock 0,7 0t0 31631114 protocol: TCPv6

but no actual IPv6 connections. We see the same behavior for “finishing” processes that never go away that are for IPv4. It has a high-numbered File Descriptor which makes me think it was from a connection that was being used to serve HTTP requests.

The same behaviour occurs if we’re in single-threaded or multi-threaded mode (modified through config, not recompiling).

are you using systemd in notify mode (starting with -Ws and Type=notify, as per contrib/systemd/haproxy.service.in)?

do you use hard-stop-after? You should set that in any case to something that matches your expections, otherwise a small client or attacker can keep your old processes handing around forever (just to be clear, that doesn’t mean this couldn’t be a bug)

Also there are two important bugs you will hit in a hosting environment with 1.8.3:

I can confirm that adding Type=notify and using the recommended haproxy version did not solve the problem.

Adding hard-stop-after did fix it, but we’d prefer not to use that: we’re happy to allow customers to run download sites and we wouldn’t want to ultimately impose a maximum time limit on how long a download can take. That said, we don’t believe it’s download sites causing this issue because the processes that remain don’t have any open connections to the Internet.