Re: (ITS#5439) syncprov race condition seg. fault

rein@basefarm.no wrote:
> Full_Name: Rein Tollevik
> Version: CVS head
> OS: CentOS 4.4
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (81.93.160.250)
>
>
> We have bin hit by what looks like a race condition bug in syncprov. We got
> some core dumps all showing stack frames like the one at the end. As such nasty
> bugs tends to do it have behaved OK after I restarted slapd with more debug
> output :-( (trace + stats + stats2 + sync).
>
> The configuration is a master server with multiple bdb backend databases all
> being subordinate to the same glue database where syncprov is used. One of the
> backends is a syncrepl consumer from another server, the server is master for
> the other backends. There are multiple consumers for the syncprov suffix, which
> I assume is what causes the race condition to happen.
>
> Note the a=0xBAD argument to attr_find(), which I expect is the result of some
> other thread freeing the attribute list it was called with while it was
> processing it. The rs->sr_entry->e_attrs argument passed to attr_find() as the
> original "a" argument by findpres_cb() looks like a perfectly valid structure,
> as are all the attributes found by following the a_next pointer. The list is
> terminated by an attribute with a NULL a_next value, none of the a_next values
> are 0xBAD.
I don't believe that's the cause. Notice that arg0 in stack frame #9 is also
0xbad, even though it is shown correctly in frames 8 and 10. Something else is
going on.
> I'm currently trying to gather more information related to this bug, any
> pointers as to what I should look for is appreciated. I'm posting this bug
> report now in the hope that the stack frame should enlighten someone with better
> knowledge of the code than what I have.
Check for stack overruns, compile without optimization and make sure it's not
a compiler optimization bug, etc.
>
> Rein Tollevik
> Basefarm AS
>
> #0 0x0807d03a in attr_find (a=0xbad, desc=0x81e8680) at attr.c:665
> #1 0xb7a656f6 in findpres_cb (op=0xaf068ba4, rs=0xaf068b68) at syncprov.c:546
> #2 0x0808416d in slap_response_play (op=0xaf068ba4, rs=0xaf068b68) at
> result.c:307
> #3 0x0808555b in slap_send_search_entry (op=0xaf068ba4, rs=0xaf068b68) at
> result.c:770
> #4 0x080f2cdc in bdb_search (op=0xaf068ba4, rs=0xaf068b68) at search.c:870
> #5 0x080db72b in overlay_op_walk (op=0xaf068ba4, rs=0xaf068b68,
> which=op_search, oi=0x8274218, on=0x8274318) at backover.c:653
> #6 0x080dbcaf in over_op_func (op=0xaf068ba4, rs=0xaf068b68, which=op_search)
> at backover.c:705
> #7 0x080dbdef in over_op_search (op=0xaf068ba4, rs=0xaf068b68) at
> backover.c:727
> #8 0x080d9570 in glue_sub_search (op=0xaf068ba4, rs=0xaf068b68, b0=0xaf068ba4,
> on=0xaf068ba4) at backglue.c:340
> #9 0x080da131 in glue_op_search (op=0xbad, rs=0xaf068b68) at backglue.c:459
> #10 0x080db6d5 in overlay_op_walk (op=0xaf068ba4, rs=0xaf068b68,
> which=op_search, oi=0x8271860, on=0x8271a60) at backover.c:643
> #11 0x080dbcaf in over_op_func (op=0xaf068ba4, rs=0xaf068b68, which=op_search)
> at backover.c:705
> #12 0x080dbdef in over_op_search (op=0xaf068ba4, rs=0xaf068b68) at
> backover.c:727
> #13 0xb7a65ff4 in syncprov_findcsn (op=0x85c7e60, mode=FIND_PRESENT) at
> syncprov.c:700
> #14 0xb7a670a0 in syncprov_op_search (op=0x85c7e60, rs=0xaf06a1c0) at
> syncprov.c:2277
> #15 0x080db6d5 in overlay_op_walk (op=0x85c7e60, rs=0xaf06a1c0, which=op_search,
> oi=0x8271860, on=0x8271b60) at backover.c:643
> #16 0x080dbcaf in over_op_func (op=0x85c7e60, rs=0xaf06a1c0, which=op_search) at
> backover.c:705
> #17 0x080dbdef in over_op_search (op=0x85c7e60, rs=0xaf06a1c0) at
> backover.c:727
> #18 0x08076554 in fe_op_search (op=0x85c7e60, rs=0xaf06a1c0) at search.c:368
> #19 0x080770e4 in do_search (op=0x85c7e60, rs=0xaf06a1c0) at search.c:217
> #20 0x08073e28 in connection_operation (ctx=0xaf06a2b8, arg_v=0x85c7e60) at
> connection.c:1084
> #21 0x08074f14 in connection_read_thread (ctx=0xaf06a2b8, argv=0x59) at
> connection.c:1211
> #22 0xb7fb5546 in ldap_int_thread_pool_wrapper (xpool=0x81ee240) at tpool.c:663
> #23 0xb7c80371 in start_thread () from /lib/tls/libpthread.so.0
> #24 0xb7c17ffe in clone () from /lib/tls/libc.so.6
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/