Comments

Eric Dumazet a écrit :
> Peter Zijlstra a écrit :>> So by not using some memory barriers (would be nice to have it>> illustrated which ones), we can race and end up on the wrong chain, in>> case that happens we detect this by using this per-chain terminator and>> try again.>>>> It would be really good to have it explained in the rculist_nulls.h>> comments what memory barriers are missing, what races they open, and how>> the this special terminator trick closes that race.> > OK, maybe I should add a Documentation/RCU/rculist_nulls.txt file with> appropriate examples and documentation.> > (Say the lookup/insert algorithms, with standard hlist and memory barriers,> and with hlist_nulls without those two memory barriers.>
[PATCH 4/3] rcu: documents rculist_nulls
Adds Documentation/RCU/rculist_nulls.txt file to describe how 'nulls'
end-of-list can help in some RCU algos.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
Documentation/RCU/rculist_nulls.txt | 167 ++++++++++++++++++++++++++
1 files changed, 167 insertions(+)

On Thu, 2008-11-13 at 17:02 +0100, Eric Dumazet wrote:
>
Eric Dumazet a écrit :
> > Peter Zijlstra a écrit :> >> So by not using some memory barriers (would be nice to have it> >> illustrated which ones), we can race and end up on the wrong chain, in> >> case that happens we detect this by using this per-chain terminator and> >> try again.> >>> >> It would be really good to have it explained in the rculist_nulls.h> >> comments what memory barriers are missing, what races they open, and how> >> the this special terminator trick closes that race.> > > > OK, maybe I should add a Documentation/RCU/rculist_nulls.txt file with> > appropriate examples and documentation.> > > > (Say the lookup/insert algorithms, with standard hlist and memory barriers,> > and with hlist_nulls without those two memory barriers.> > > > [PATCH 4/3] rcu: documents rculist_nulls> > Adds Documentation/RCU/rculist_nulls.txt file to describe how 'nulls'> end-of-list can help in some RCU algos.> > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri, 14 Nov 2008 16:16:38 +0100
> On Thu, 2008-11-13 at 17:02 +0100, Eric Dumazet wrote:> > > Eric Dumazet a écrit :> > > Peter Zijlstra a écrit :> > >> So by not using some memory barriers (would be nice to have it> > >> illustrated which ones), we can race and end up on the wrong chain, in> > >> case that happens we detect this by using this per-chain terminator and> > >> try again.> > >>> > >> It would be really good to have it explained in the rculist_nulls.h> > >> comments what memory barriers are missing, what races they open, and how> > >> the this special terminator trick closes that race.> > > > > > OK, maybe I should add a Documentation/RCU/rculist_nulls.txt file with> > > appropriate examples and documentation.> > > > > > (Say the lookup/insert algorithms, with standard hlist and memory barriers,> > > and with hlist_nulls without those two memory barriers.> > > > > > > [PATCH 4/3] rcu: documents rculist_nulls> > > > Adds Documentation/RCU/rculist_nulls.txt file to describe how 'nulls'> > end-of-list can help in some RCU algos.> > > > > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>> > Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Since there seems to be consensus for these changes I'm going to merge this
stuff into net-next-2.6 so that I can add in the users that Eric has written.
Thanks everyone for the review and feedback.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

On Thu, Nov 13, 2008 at 05:02:48PM +0100, Eric Dumazet wrote:
> Eric Dumazet a écrit :>> Peter Zijlstra a écrit :>>> So by not using some memory barriers (would be nice to have it>>> illustrated which ones), we can race and end up on the wrong chain, in>>> case that happens we detect this by using this per-chain terminator and>>> try again.>>>>>> It would be really good to have it explained in the rculist_nulls.h>>> comments what memory barriers are missing, what races they open, and how>>> the this special terminator trick closes that race.>> OK, maybe I should add a Documentation/RCU/rculist_nulls.txt file with>> appropriate examples and documentation.>> (Say the lookup/insert algorithms, with standard hlist and memory >> barriers,>> and with hlist_nulls without those two memory barriers.>> [PATCH 4/3] rcu: documents rculist_nulls
Very good -- only one small suggestion below.
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Adds Documentation/RCU/rculist_nulls.txt file to describe how 'nulls'> end-of-list can help in some RCU algos.>>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>> ---> Documentation/RCU/rculist_nulls.txt | 167 ++++++++++++++++++++++++++> 1 files changed, 167 insertions(+)
> diff --git a/Documentation/RCU/rculist_nulls.txt b/Documentation/RCU/rculist_nulls.txt> new file mode 100644> index 0000000..5db5549> --- /dev/null> +++ b/Documentation/RCU/rculist_nulls.txt> @@ -0,0 +1,167 @@> +Using hlist_nulls to protect read-mostly linked lists and> +objects using SLAB_DESTROY_BY_RCU allocations.> +> +Please read the basics in Documentation/RCU/listRCU.txt> +> +Using special makers (called 'nulls') is a convenient way> +to solve following problem :> +> +A typical RCU linked list managing objects which are> +allocated with SLAB_DESTROY_BY_RCU kmem_cache can> +use following algos :> +> +1) Lookup algo> +--------------> +rcu_read_lock()> +begin:> +obj = lockless_lookup(key);> +if (obj) {> + if (!try_get_ref(obj)) // might fail for free objects> + goto begin;> + /*> + * Because a writer could delete object, and a writer could> + * reuse these object before the RCU grace period, we> + * must check key after geting the reference on object> + */> + if (obj->key != key) { // not the object we expected
In some cases, a "generation number" will be needed. For example, there
are algorithms that must detect when an object has been removed and then
re-inserted with the same key. One increments the generation number
on each free and sometimes also on each allocation.
> + put_ref(obj);> + goto begin;> + }> +}> +rcu_read_unlock();> +> +Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu()> +but a version with an additional memory barrier (smp_rmb())> +> +lockless_lookup(key)> +{> + struct hlist_node *node, *next;> + for (pos = rcu_dereference((head)->first);> + pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&> + ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); > + pos = rcu_dereference(next))> + if (obj->key == key)> + return obj;> + return NULL;> +> +And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb() :> +> + struct hlist_node *node;> + for (pos = rcu_dereference((head)->first);> + pos && ({ prefetch(pos->next); 1; }) &&> + ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });> + pos = rcu_dereference(pos->next))> + if (obj->key == key)> + return obj;> + return NULL;> +}> +> +Quoting Corey Minyard :> +> +"If the object is moved from one list to another list in-between the> + time the hash is calculated and the next field is accessed, and the> + object has moved to the end of a new list, the traversal will not> + complete properly on the list it should have, since the object will> + be on the end of the new list and there's not a way to tell it's on a> + new list and restart the list traversal. I think that this can be> + solved by pre-fetching the "next" field (with proper barriers) before> + checking the key."> +> +2) Insert algo :> +----------------> +> +We need to make sure a reader cannot read the new 'obj->obj_next' value> +and previous value of 'obj->key'. Or else, an item could be deleted> +from a chain, and inserted into another chain. If new chain was empty> +before the move, 'next' pointer is NULL, and lockless reader can> +not detect it missed following items in original chain.> +> +/*> + * Please note that new inserts are done at the head of list,> + * not in the middle or end.> + */> +obj = kmem_cache_alloc(...);> +lock_chain(); // typically a spin_lock()> +obj->key = key;> +atomic_inc(&obj->refcnt);> +/*> + * we need to make sure obj->key is updated before obj->next> + */> +smp_wmb();> +hlist_add_head_rcu(&obj->obj_node, list);> +unlock_chain(); // typically a spin_unlock()> +> +> +3) Remove algo> +--------------> +Nothing special here, we can use a standard RCU hlist deletion.> +But thanks to SLAB_DESTROY_BY_RCU, beware a deleted object can be reused > +very very fast (before the end of RCU grace period)> +> +if (put_last_reference_on(obj) {> + lock_chain(); // typically a spin_lock()> + hlist_del_init_rcu(&obj->obj_node);> + unlock_chain(); // typically a spin_unlock()> + kmem_cache_free(cachep, obj);> +}> +> +> +> +--------------------------------------------------------------------------> +With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()> +and extra smp_wmb() in insert function.> +> +For example, if we choose to store the slot number as the 'nulls'> +end-of-list marker for each slot of the hash table, we can detect> +a race (some writer did a delete and/or a move of an object> +to another chain) checking the final 'nulls' value if> +the lookup met the end of chain. If final 'nulls' value> +is not the slot number, then we must restart the lookup at> +the begining. If the object was moved to same chain,> +then the reader doesnt care : It might eventually> +scan the list again without harm.> +> +> +1) lookup algo> +> + head = &table[slot];> + rcu_read_lock();> +begin:> + hlist_nulls_for_each_entry_rcu(obj, node, head, member) {> + if (obj->key == key) {> + if (!try_get_ref(obj)) // might fail for free objects> + goto begin;> + if (obj->key != key) { // not the object we expected> + put_ref(obj);> + goto begin;> + }> + goto out;> + }> +/*> + * if the nulls value we got at the end of this lookup is> + * not the expected one, we must restart lookup.> + * We probably met an item that was moved to another chain.> + */> + if (get_nulls_value(node) != slot)> + goto begin;> + obj = NULL;> +> +out:> + rcu_read_unlock();> +> +2) Insert function :> +--------------------> +> +/*> + * Please note that new inserts are done at the head of list,> + * not in the middle or end.> + */> +obj = kmem_cache_alloc(cachep);> +lock_chain(); // typically a spin_lock()> +obj->key = key;> +atomic_set(&obj->refcnt, 1);> +/*> + * insert obj in RCU way (readers might be traversing chain)> + */> +hlist_nulls_add_head_rcu(&obj->obj_node, list);> +unlock_chain(); // typically a spin_unlock()
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/Documentation/RCU/rculist_nulls.txt b/Documentation/RCU/rculist_nulls.txt
new file mode 100644
index 0000000..5db5549--- /dev/null+++ b/Documentation/RCU/rculist_nulls.txt@@ -0,0 +1,167 @@+Using hlist_nulls to protect read-mostly linked lists and+objects using SLAB_DESTROY_BY_RCU allocations.++Please read the basics in Documentation/RCU/listRCU.txt++Using special makers (called 'nulls') is a convenient way+to solve following problem :++A typical RCU linked list managing objects which are+allocated with SLAB_DESTROY_BY_RCU kmem_cache can+use following algos :++1) Lookup algo+--------------+rcu_read_lock()+begin:+obj = lockless_lookup(key);+if (obj) {+ if (!try_get_ref(obj)) // might fail for free objects+ goto begin;+ /*+ * Because a writer could delete object, and a writer could+ * reuse these object before the RCU grace period, we+ * must check key after geting the reference on object+ */+ if (obj->key != key) { // not the object we expected+ put_ref(obj);+ goto begin;+ }+}+rcu_read_unlock();++Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu()+but a version with an additional memory barrier (smp_rmb())++lockless_lookup(key)+{+ struct hlist_node *node, *next;+ for (pos = rcu_dereference((head)->first);+ pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&+ ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); + pos = rcu_dereference(next))+ if (obj->key == key)+ return obj;+ return NULL;++And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb() :++ struct hlist_node *node;+ for (pos = rcu_dereference((head)->first);+ pos && ({ prefetch(pos->next); 1; }) &&+ ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });+ pos = rcu_dereference(pos->next))+ if (obj->key == key)+ return obj;+ return NULL;+}++Quoting Corey Minyard :++"If the object is moved from one list to another list in-between the+ time the hash is calculated and the next field is accessed, and the+ object has moved to the end of a new list, the traversal will not+ complete properly on the list it should have, since the object will+ be on the end of the new list and there's not a way to tell it's on a+ new list and restart the list traversal. I think that this can be+ solved by pre-fetching the "next" field (with proper barriers) before+ checking the key."++2) Insert algo :+----------------++We need to make sure a reader cannot read the new 'obj->obj_next' value+and previous value of 'obj->key'. Or else, an item could be deleted+from a chain, and inserted into another chain. If new chain was empty+before the move, 'next' pointer is NULL, and lockless reader can+not detect it missed following items in original chain.++/*+ * Please note that new inserts are done at the head of list,+ * not in the middle or end.+ */+obj = kmem_cache_alloc(...);+lock_chain(); // typically a spin_lock()+obj->key = key;+atomic_inc(&obj->refcnt);+/*+ * we need to make sure obj->key is updated before obj->next+ */+smp_wmb();+hlist_add_head_rcu(&obj->obj_node, list);+unlock_chain(); // typically a spin_unlock()+++3) Remove algo+--------------+Nothing special here, we can use a standard RCU hlist deletion.+But thanks to SLAB_DESTROY_BY_RCU, beware a deleted object can be reused +very very fast (before the end of RCU grace period)++if (put_last_reference_on(obj) {+ lock_chain(); // typically a spin_lock()+ hlist_del_init_rcu(&obj->obj_node);+ unlock_chain(); // typically a spin_unlock()+ kmem_cache_free(cachep, obj);+}++++--------------------------------------------------------------------------+With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()+and extra smp_wmb() in insert function.++For example, if we choose to store the slot number as the 'nulls'+end-of-list marker for each slot of the hash table, we can detect+a race (some writer did a delete and/or a move of an object+to another chain) checking the final 'nulls' value if+the lookup met the end of chain. If final 'nulls' value+is not the slot number, then we must restart the lookup at+the begining. If the object was moved to same chain,+then the reader doesnt care : It might eventually+scan the list again without harm.+++1) lookup algo++ head = &table[slot];+ rcu_read_lock();+begin:+ hlist_nulls_for_each_entry_rcu(obj, node, head, member) {+ if (obj->key == key) {+ if (!try_get_ref(obj)) // might fail for free objects+ goto begin;+ if (obj->key != key) { // not the object we expected+ put_ref(obj);+ goto begin;+ }+ goto out;+ }+/*+ * if the nulls value we got at the end of this lookup is+ * not the expected one, we must restart lookup.+ * We probably met an item that was moved to another chain.+ */+ if (get_nulls_value(node) != slot)+ goto begin;+ obj = NULL;++out:+ rcu_read_unlock();++2) Insert function :+--------------------++/*+ * Please note that new inserts are done at the head of list,+ * not in the middle or end.+ */+obj = kmem_cache_alloc(cachep);+lock_chain(); // typically a spin_lock()+obj->key = key;+atomic_set(&obj->refcnt, 1);+/*+ * insert obj in RCU way (readers might be traversing chain)+ */+hlist_nulls_add_head_rcu(&obj->obj_node, list);+unlock_chain(); // typically a spin_unlock()