On Wed, Jul 22, 2009 at 05:37:30AM -0700, David J. Andruczyk wrote:
> yes, we have been measuring latency when under the F5 vs RR. When we
> switched to RR DNS is DID drop quite a bit from around 100ms to about 20
> ms.
FWIW and IIRC, after switching to Performance (Layer 4), the observed
latency for LDAP operations to the VIP and to the nodes themselves was
essentially the same. I can't say what the latency difference was, since I
wasn't the one who was troubleshooting the BigIPs and don't have the numbers
handy.
> We do NOT yet have the VIP set to Performance layer 4 however. It was at
> "standard". F5 has since suggested performance layer 4, but we have not
> implemented it yet, only due to the fact that the connection deferred:
> binding messages cause severe annoyances, and lots of CS calls from users
> of the system (auth failures, misc issues), that mgmt is wary of trying
> anything else until they have proof that whatever we do WILL DEFINITELY
> WORK beforehand. (yes cart before the horse, I know, but they sign the
> checks as well...)
That seems short-sighted, unless you're implying that you've moved all LDAP
traffic off your BigIPs until you have a solution in hand that you *know*
will solve the problem.
They may sign the checks, but that doesn't mean that informed argument
shouldn't carry weight.
> When behind the F5 in the LDAP server logs all connections appear to come
> from the F5's IP, so, when pumping a hundred server's connections through
> that one Ip there are going to be many many binds/unbinds going on
> constanly, all coming from the same IP (the F5), so why doesn't it through
> "connection deferred: binding" constantly as the connection load is
> certainly very very high, it only throws them occasionnally (every few
> seconds), but it's enough to cause a major impact in terrms of failed
> queries. Are you saying hte F5 is dropping part of the session after
> binding on a port and retriying to bind?
+1 on what Philip mentioned:
On Tue, 21 Jul 2009 21:54:53 -0700, Philip Guenther wrote:
> Given the reported log message, this (latency) is very likely to be the
> cause of the problem. "connection deferred: binding" means that the
> server received a request on a connection that was in the middle of
> processing a bind. This means that the client sends a bind and then
> additional request(s) without waiting for the bind result. That's a
> violation by the client of the LDAP protocol specification, RFC 4511,
> section 4.2.1, paragraph 2:
[snip]
> Understanding _why_ clients are violating the spec by sending further
> requests while a bind is outstanding may help you understand how the F5 or
> the clients should be tuned (or beaten with sticks, etc).
>
> You presumably don't notice this under normal circumstances or with RR DNS
> because the server completes the BIND before the next request is received.
> My understanding (perhaps suspect) is that the F5 will increase the
> 'bunching' of packets on individual connections (because the first packet
> after a pause will see a higher latency than the succeeding packets).
>
> So, are you measuring latency through the F5? I would *strongly* suggest
> doing so *before* tuning the F5 in any way, such as by the VIP type
> mentioned by John Morrissey, so that you can wave that in front of
> management (and under the nose of the F5 saleman when negotiating your
> next support renewal...)
What I'm parsing from:
https://support.f5.com/kb/en-us/solutions/public/8000/000/sol8082.html
(only accessible with an F5 support contract, unfortunately), is that with
the "Standard" VIP type, the BigIP will wait for a three-way TCP handshake
before establishing a connection with the load-balanced node. The BigIP
becomes a "man in the middle" and establishes two independent connections:
one facing the client, another facing the load balanced node.
With "Performance (Layer 4)", the BigIP forwards packets between clients and
load-balanced nodes as they're received. As Philip says, the packet
"bunching" due to the MITM nature of the Standard VIP type is probably
teaming up with your LDAP client misbehavior. Under heavy load, the
likelihood of bunching increases and you "win" this race condition.
Out of curiosity, what LDAP client SDK is involved here?
john
--
John Morrissey _o /\ ---- __o
jwm@horde.net _-< \_ / \ ---- < \,
www.horde.net/ __(_)/_(_)________/ \_______(_) /_(_)__