RE: test007-replication failure (ITS#2272)

I've duplicated this behavior, but don't have a good explanation for it. I
altered the test007 script to not kill the daemons when this problem occurs,
so that I could attach to them and see what happened. However, it didn't get
to that.
Your slurp.log shows that slurpd first tried to bind to the slave using IPv6,
which failed, and then it tried IPv4, and also failed.
Mine shows the same result. However, since I left the daemons running, slurpd
retried a few seconds later and connected and bound successfully, and all the
updates were pushed across.
On the slave.log, you see an anonymous bind on IPv6 that is from the initial
ldapsearch command that was issued to check that the slave was running. Then
you see a successful bind from slurpd. What's important to notice here is
that both binds are on IPv6, the port numbers differ only by one, and there
is no other intervening connection logged on the slave. There is no
indication of slurpd's previous failed connection attempt.
I have no explanation for why the connection attempt fails, as the slave
clearly has listeners open on both IPv6 and IPv4. It is clear to me that this
problem has nothing to do with slapd - the listeners are there. It is
possible that this is a libldap problem. Certainly the client never actually
attempted to connect to the server; if it had, the port number of the
successful connection would have been (at least) 2 greater than the anonymous
connection.
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
http://www.symas.comhttp://highlandsun.com/hyc
Symas: Premier OpenSource Development and Support
> -----Original Message-----
> From: owner-openldap-bugs@OpenLDAP.org
> [mailto:owner-openldap-bugs@OpenLDAP.org]On Behalf Of
> h.b.furuseth@usit.uio.no
> Sent: Sunday, January 19, 2003 12:35 PM
> To: openldap-its@OpenLDAP.org
> Subject: test007-replication failure (ITS#2272)
>
>
> Full_Name: Hallvard B. Furuseth
> Version: 2.1.12 and HEAD
> OS: Solaris 2.8 (sparc)
> URL: ftp://ftp.openldap.org/incoming/Hallvard-Furuseth-030119.tgz
> Submission from: (NULL) (129.240.186.42)
>
>
> test007-replication sometimes fails with error code 32 (no
> such object).
> It fails about 6% of the time with BDB and 3% of the time with LDBM.
> To reproduce:
>
> make test # to make the symlinks in tests/
> ^C
> (cd tests; while scripts/test007-replication; do :; done)
>
> I've put the test-db and test-repl directories from a failed run
> (with the HEAD branch) in the URL provided with this report.
> Here is the output:
>
> running defines.sh
> Cleaning up in ./test-db...
> Cleaning up in ./test-repl...
> Starting master slapd on TCP/IP port 9009...
> Starting slave slapd on TCP/IP port 9010...
> Using ldapsearch to check that master slapd is running...
> Using ldapsearch to check that slave slapd is running...
> Starting slurpd...
> Using ldapadd to populate the master directory...
> Waiting 15 seconds for slurpd to send changes...
> Using ldapmodify to modify master directory...
> Waiting 15 seconds for slurpd to send changes...
> Using ldapsearch to read all the entries from the master...
> Using ldapsearch to read all the entries from the slave...
> ldapsearch failed (32)!
> 24813 Killed
>
>
>