==Because pre_destroy() handler is moved out to cgroup_lock() foravoiding dead-lock, now, cgroup's rmdir() does following sequence.

cgroup_lock() check children and tasks. (A) cgroup_unlock() (B) pre_destroy() for subsys;-----(1) (C) cgroup_lock(); (D) Second check:check for -EBUSY again because we released the lock. (E) mark cgroup as removed. (F) unlink from lists. cgroup_unlock(); dput() => when dentry's refcnt goes down to 0 destroy() handers for subsysmemcg marks itself as "obsolete" when pre_destroy() is called at (1)But rmdir() can fail after pre_destroy(). So marking "obsolete" is bug.I'd like to fix sanity of pre_destroy() in cgroup layer.

Considering above sequence, new tasks can be added while (B) and (C)swap-in recored can be charged back to a cgroup after pre_destroy() at (C) and (D), (E)(means cgrp's refcnt not comes from task but from other persistent objects.)

This patch adds "cgroup_is_being_removed()" check. (better name is welcome)After this,

- cgroup is marked as CGRP_PRE_REMOVAL at (A) - If Second check fails, CGRP_PRE_REMOVAL flag is removed. - memcg's its own obsolete flag is removed. - While CGROUP_PRE_REMOVAL, task attach will fail by -EBUSY. (task attach via clone() will not hit the case.)

By this, we can trust pre_restroy()'s result.

Note: if CGRP_REMOVED can be set and cleared, it should be used instead of CGRP_PRE_REMOVAL.