At accounting file events per memory cgroup, we need to find memory cgroupvia page_cgroup->mem_cgroup. Now, we use lock_page_cgroup() for guaranteepc->mem_cgroup is not overwritten while we make use of it.

But, considering the context which page-cgroup for files are accessed,we can use alternative light-weight mutual execusion in the most case.

At handling file-caches, the only race we have to take care of is "moving"account, IOW, overwriting page_cgroup->mem_cgroup. (See comment in the patch)

Unlike charge/uncharge, "move" happens not so frequently. It happens only whenrmdir() and task-moving (with a special settings.)This patch adds a race-checker for file-cache-status accounting v.s. accountmoving. The new per-cpu-per-memcg counter MEM_CGROUP_ON_MOVE is added.The routine for account move 1. Increment it before start moving 2. Call synchronize_rcu() 3. Decrement it after the end of moving.By this, file-status-counting routine can check it needs to calllock_page_cgroup(). In most case, I doesn't need to call it.

Following is a perf data of a process which mmap()/munmap 32MB of file cachein a minute.

static bool mem_cgroup_under_move(struct mem_cgroup *mem) {@@ -1462,35 +1502,62 @@ bool mem_cgroup_handle_oom(struct mem_cg /* * Currently used to update mapped file statistics, but the routine can be * generalized to update other statistics as well.+ *+ * Notes: Race condition+ *+ * We usually use page_cgroup_lock() for accessing page_cgroup member but+ * it tends to be costly. But considering some conditions, we doesn't need+ * to do so _always_.+ *+ * Considering "charge", lock_page_cgroup() is not required because all+ * file-stat operations happen after a page is attached to radix-tree. There+ * are no race with "charge".+ *+ * Considering "uncharge", we know that memcg doesn't clear pc->mem_cgroup+ * at "uncharge" intentionally. So, we always see valid pc->mem_cgroup even+ * if there are race with "uncharge". Statistics itself is properly handled+ * by flags.+ *+ * Considering "move", this is an only case we see a race. To make the race+ * small, we check MEM_CGROUP_ON_MOVE percpu value and detect there are+ * possibility of race condition. If there is, we take a lock. */ void mem_cgroup_update_file_mapped(struct page *page, int val) { struct mem_cgroup *mem;- struct page_cgroup *pc;+ struct page_cgroup *pc = lookup_page_cgroup(page);+ bool need_unlock = false;