0000279: Opionally dismiss 57P01 as a clean disconnection when failover_on_backend_error is off and health checks are enabled

Description

Upstream session events such as TCP resets, administrative shutdowns (and unfortunately pg_terminate_backend() due to an SQL code overlap) are the only way PGPool can infer backend availability and trigger a failover if the periodic health checks are disabled. For shutdowns, this happens no matter what failover_on_backend_error is set to.

When health checks are enabled this behavior seems redundant and possibly harmful, as the ability to terminate sessions, or even quickly restart a master backend before health checks attempts exhaustion can trigger a failover is a valuable asset for safer maintenance.
Additionally, many Postgres admins are not in control of what SQL functions their users may be calling, which creates availability concerns.

Is there any other limitation to such change that I'm overlooking or is it just legacy behavior that could be changed relatively easily?

TIA

F

Steps To Reproduce

Just do anything causing 57P01 to be returned and the whole PG server will be written off as dead

Activities

In some cases pg_terminate_backend() now does not trigger a fail-over. (Muhammad Usama)

Because PostgreSQL returns exactly the same error code as postmaster down case and pg_terminate_backend() case, using pg_terminate_backend() raises a failover which user might not want. To fix this, now Pgpool-II finds a pid of backend which is the target of pg_terminate_backend() and does not trigger failover if so.

This functions is limited to the case of simple protocol and the pid is given to pg_terminate_backend() as a constant. So if you call pg_terminate_backend() via extended protocol (e.g. Java) still pg_terminate_backend() triggers a failover.

That's very clear, however pg_terminate_backend() is just one case (and could be called from, say, a query/function/trigger/rule , which would evade this detection. EG: we just accidentally caused a failover by calling SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE [...] ).

My point is mostly about ignoring any kind of backend session error/termination/reset and ONLY initiate failover when health checks fail and run out of retries. After all, health checks are all a lot of users count on.