On Tuesday, 12 July 2011 11:59:52 Cyril GROSJEAN wrote:
> I randomly notice my OpenLDAP server freezes, and I can't udnerstand why.
> I have a few LDAP clients (ldapsearch, a legacy Java app. and
> ApacheDirectoryStudio), running from different systems, either locally on
> the OpenLDAP server, or on another OpenLDAP
> server, or on a remote workstation, and none manages to get an answer from
> OpenLDAP. The connection is established but each client gets stuck waiting
> for any result.
[...]
> Jul 12 10:20:05 dev-ldap1 slapd[28525]: connection_input: conn=3377
> deferring operation: binding
This is the code (at least in 2.4.26) that generates the message:
/* Don't process requests when the conn is in the middle of a
* Bind, or if it's closing. Also, don't let any single conn
* use up all the available threads, and don't execute if we're
* currently blocked on output. And don't execute if there are
* already pending ops, let them go first. Abandon operations
* get exceptions to some, but not all, cases.
*/
switch( tag ){
default:
/* Abandon and Unbind are exempt from these checks */
if (conn->c_conn_state == SLAP_C_CLOSING) {
defer = "closing";
break;
} else if (conn->c_writewaiter) {
defer = "awaiting write";
break;
} else if (conn->c_n_ops_pending) {
defer = "pending operations";
break;
}
/* FALLTHRU */
case LDAP_REQ_ABANDON:
/* Unbind is exempt from these checks */
if (conn->c_n_ops_executing >= connection_pool_max/2) {
defer = "too many executing";
break;
} else if (conn->c_conn_state == SLAP_C_BINDING) {
defer = "binding";
break;
}
/* FALLTHRU */
case LDAP_REQ_UNBIND:
break;
}
if( defer ) {
int max = conn->c_dn.bv_len
? slap_conn_max_pending_auth
: slap_conn_max_pending;
Debug( LDAP_DEBUG_ANY,
"connection_input: conn=%lu deferring operation:
%s\n",
conn->c_connid, defer, 0 );
conn->c_n_ops_pending++;
LDAP_STAILQ_INSERT_TAIL( &conn->c_pending_ops, op, o_next );
rc = ( conn->c_n_ops_pending > max ) ? -1 : 0;
} else {
... carry on and handle the op.
As far as I understand, the intention is to (among others) ignore operations
from connections where a BIND operation is still pending. However, some of the
comments now appear to be a bit misplaced (e.g. Unbind comment vs
LDAP_REQ_ABANDON). Also, the code appears (to me, not being very familiar with
it, and quite rusty at C) to not be doing the right thing. The portion
generating the "deferring operation: binding" message appears to be when an
abandon operation is received on a connection that has a pending BIND
operation. Shouldn't an abandon be allowed for a BIND? Or, am I reading it
wrong? Also, it looks as if the "too many executing" is also only applicable
to abandon?
Shouldn't the LDAP_REQ_ABANDON case be breaking without setting 'defer'?
Shouldn't the 'conn->c_conn_state == SLAP_C_BINDING' and 'conn-
>c_n_ops_executing >= connection_pool_max/2' conditions be handled by the
default case as well?
We have been running into both the "deferring: binding" and "deferring: too
many executing" messages, but I hadn't had time to trace what the LDAP client
software was doing, but now I wonder if maybe it was sending abandon requests
when some operations weren't returning in time (after > 18000 successful
operations on a connection. I think its behaviour regarding its use of LDAP
connections may be wrong, but I would prefer to be able to prove that its
behaviour is wrong to the vendor without other log entries that show its
correct behaviour being handled incorrectly.
Also, the hard-coded 'one connection may not use more pending operations than
half the number of threads' rule seems a bit arbitrary. Could we get a knob to
twiddle this?
Regards,
Buchan