FIX: An Always On secondary replica goes into a disconnecting state

Symptoms

When you use SQL Server Always On Availability Groups, the Always On secondary replica may go into a disconnecting state. Additionally, the following error message is logged in the SQL Server error log:

A connection timeout has occurred while attempting to establish a connection to availability replica availability_replica_name with id availability_replica_id. Either a networking or firewall issue exists, or the endpoint address provided for the replica.

When you try to reestablish the connection, you may receive the following error message:

This secondary replica is not connected to the primary replica. The connected state is DISCONNECTED.

When this behavior occurs, the issue is not fixed until you restart SQL Server Services on the secondary replica. In a rare scenario, you may have to restart SQL Server Services on the primary replica to resume Always On data movement.

Note This problem might occur only on very powerful computers and when SQL Server is very busy. For example, in one scenario, this problem occurred on a very busy system with 24 cores.

Cause

The problem occurs because of an internal race condition.

Resolution

This issue was fixed in the following cumulative updates of SQL Server.

Each new cumulative update for SQL Server contains all the hotfixes and all the security fixes that were included with the previous cumulative update. Check out the latest cumulative updates for SQL Server: