Hi Ivan,
On 05/15/2012 10:35 AM, Ivan Maidanski wrote:
> Hi Jan Hans,
>> Jan -
>> I'd like to propose an alternative solution:
>> GC_ASSERT(hhdr -> hb_n_marks == 0);
> hhdr -> hb_n_marks = 1;
>> ->
>> # ifdef THREADS
> hhdr->hb_n_marks++; /* It might be already updated concurrently. */'
Who am I to think you are wrong? Still, I'll try :-) I think this is
wrong because another thread has done the mark between
GC_malloc_generic() released the lock and GC_malloc_uncollectable()
re-acquired it. So, hhdr->hb_n_marks = 1 when we get here. Surely,
a big block always has hhdr->hb_n_marks being either 0 or 1. You
make it 2 if I read this correctly.
My motivation to move this into GC_malloc_generic() was twofold: clearly
GC can come in between and it seems a waste to acquire the lock twice
for a single operation with only a few unlocked instructions in between.
With the proposed patch, my program has run its stress test with 8
threads overnight without problems. I think it would be enough to
simply drop the GC_ASSERT() if my assessment is correct that it is
not possible that the object gets collected by the GC doing its work
between the two locks.
Anyway, I don't care much as long as the crash is gone :-)
Regards --- Jan
> # else
> GC_ASSERT(hhdr -> hb_n_marks == 0);
> hhdr -> hb_n_marks = 1;
> # endif
>> Let me know if I'm wrong.
>> Hans -
> What's your opinion?
>> Regards.
>> Mon, 14 May 2012 15:55:23 +0200 Jan Wielemaker<J.Wielemaker at vu.nl>:
>> Hi,
>>>> Attached patch changes changes my program from crashing within a minute
>> to already running for an hour.
>>>> Patch is relative to git fetched a few hours ago. See below for details
>> on the rationale.
>>>> Regards --- Jan
>>>>>> On 05/14/2012 02:34 PM, Jan Wielemaker wrote:
>>> Hi,
>>>>>> The thing below is still bothering me, but I think I tracked it down to
>>> a wrong GC_ASSERT() in GC_malloc_uncollectable(). For large objects, we
>>> have this code:
>>>>>> op = (ptr_t)GC_generic_malloc((word)lb, UNCOLLECTABLE);
>>> if (0 == op) return(0);
>>>>>> GC_ASSERT(((word)op& (HBLKSIZE - 1)) == 0); /* large block */
>>> hhdr = HDR(op);
>>> /* We don't need the lock here, since we have an undisguised */
>>> /* pointer. We do need to hold the lock while we adjust */
>>> /* mark bits. */
>>> LOCK();
>>> GC_ASSERT(hhdr -> hb_n_marks == 0);
>>>>>> Where the last GC_ASSERT triggers for me. I consistently find that
>>> hb_last_reclaimed is one less than GC_gc_no. I added the same assertion
>>> to GC_generic_malloc() before the GC lock was released. In that case, it
>>> is still the assert above that triggers. This can be explained:
>>> GC_generic_malloc() releases the lock, a GC marks the object and the
>>> assertion in GC_malloc_uncollectable() fails. Right?
>>>>>> Now, I think that the assertion is simply wrong and there is no need
>>> to change anything else to the code, but I would be grateful if someone
>>> with in-depth knowledge can confirm this.
>>>>>> Cheers --- Jan
>>>>>> P.s. The crash happens consistently(?) while the block is allocated
>>> from the final for-loop in GC_allochblk() (i.e., allocating
>>> from a bigger free list and splitting. This might suggest
>>> there is more to it that just removing the GC_ASSERT().
>>>>>> On 05/02/2012 11:36 AM, Jan Wielemaker wrote:
>>>> Hi,
>>>>>>>> I'm experiencing assertion failures with GC_malloc_uncollectable() on a
>>>> big block (usually a few 100 Kb): GC_ASSERT(hhdr -> hb_n_marks == 0).
>>>>>>>> Now, I have to admit this is on a modified version (as discussed on this
>>>> list). The version is based on a1467f2140f9db3408f6214137c7dc31f10d0bd9
>>>> (Jan 16 git version). Using the unmodified version would require many
>>>> changes to the program :-(
>>>>>>>> This is merely a quick investigation to see whether this rings a bell
>>>> with someone. Platform is X64 Linux (Ubuntu). GC Config flags:
>>>> '--enable-threads=pthreads' '--enable-parallel-mark'
>>>> '--enable-large-config' '--enable-gc-assertions'
>>>>>>>> hhdr is this:
>>>>>>>> (gdb) p *hhdr
>>>> $1 = {hb_next = 0x0, hb_prev = 0x0, hb_block = 0x23ba000,
>>>> hb_obj_kind = 2 '\002', hb_flags = 0 '\000', hb_last_reclaimed = 7,
>>>> hb_sz = 196624, hb_descr = 196624, hb_large_block = 1 '\001',
>>>> hb_map = 0x638f60, hb_n_marks = 1, hb_n_uncollectable = 0,
>>>> _mark_byte_union = {_hb_marks = "\001", '\000'<repeats 255 times>,
>>>> "\001",
>>>> dummy = 1}}
>>>>>>>> Only one thread is doing work at the moment of the crash; several others
>>>> block waiting on the allocation lock. As I read it, hhdr suggests that
>>>> the block is marked and in use, so it shouldn't be handed out by a new
>>>> allocation call, right?
>>>>>>>> The program does lots of complicated stuff using the garbage collector,
>>>> but the crash is always in the big GC_malloc_uncollectable() called from
>>>> the same context. These allocations manage a big buffer that is resized
>>>> (doubled) using a new GC_malloc_uncollectable(), copy the data and call
>>>> GC_free() on the old data. Several threads play the same game, on their
>>>> own private datastructures. Of course, this means that an old buffer
>>>> released using GC_free() in one thread is typically a good candidate for
>>>> another thread. The crash is not very frequent: I need to make the
>>>> program go through the critical process several hundred times to get
>>>> a crash.
>>>>>>>> Thanks for any hint
>>>>>>>> --- Jan
>>>> _______________________________________________
>>>> Gc mailing list
>>>>Gc at linux.hpl.hp.com>>>>http://www.hpl.hp.com/hosted/linux/mail-archives/gc/>>>>>> _______________________________________________
>>> Gc mailing list
>>>Gc at linux.hpl.hp.com>>>http://www.hpl.hp.com/hosted/linux/mail-archives/gc/>>>>>> _______________________________________________
>> Gc mailing list
>>Gc at linux.hpl.hp.com>>http://www.hpl.hp.com/hosted/linux/mail-archives/gc/