Leif Hedstrom
added a comment - 25/Apr/12 18:11 Any chance you can test either 3.0.4 or better, trunk ? I don't know that this has been fixed, but I know there's been a number of fixes on the freelist code since 3.0.

I believe this might be a race in the event that the head is freed between the time of the load to item and the SET_FREELIST_POINTER_VERSION, then we would potentially be dereferencing item, which is no longer valid if it's freed as part of a dequeue operation. Does anyone who understands the freelists think this might be the situation?

Also, to support this idea, I see the following in the crash:

(gdb) p f->head
$17 =

{data = -6044627300393456023}

(gdb) p item
$18 =

{data = -6053071549694775703}

(gdb)

Which means that item != f->head meaning that head has changed since we first read it, and if it was changed as a result of a dequeue then the memory might have been freed explaining the crash.

Brian Geffon
added a comment - 26/Apr/12 08:48 - edited I have a possible theory for the crash, but I think jplevyak might be the only one who could say for sure. In 3.0.x inkfreelist_new follows the standard pattern which does a load followed by a CAS:
int result = 0;
do {
INK_QUEUE_LD64(item, f->head);
if (TO_PTR(FREELIST_POINTER(item)) == NULL) {
...
} else {\n
// crash happens on the following line
SET_FREELIST_POINTER_VERSION(next, *ADDRESS_OF_NEXT(TO_PTR(FREELIST_POINTER(item)), f->offset),
FREELIST_VERSION(item) + 1);
result = ink_atomic_cas64((int64_t *) & f->head.data, item.data, next.data);
}
}
while(result == 0)
I believe this might be a race in the event that the head is freed between the time of the load to item and the SET_FREELIST_POINTER_VERSION, then we would potentially be dereferencing item, which is no longer valid if it's freed as part of a dequeue operation. Does anyone who understands the freelists think this might be the situation?
Also, to support this idea, I see the following in the crash:
(gdb) p f->head
$17 =
{data = -6044627300393456023}
(gdb) p item
$18 =
{data = -6053071549694775703}
(gdb)
Which means that item != f->head meaning that head has changed since we first read it, and if it was changed as a result of a dequeue then the memory might have been freed explaining the crash.