Commit Message

jffs2_write_begin() first acquires the page lock, then f->sem. This causes an AB-BA deadlock with jffs2_garbage_collect_live(), which first acquires f->sem, then the page lock:
jffs2_garbage_collect_live
mutex_lock(&f->sem) (A)
jffs2_garbage_collect_dnode
jffs2_gc_fetch_page
read_cache_page_async
do_read_cache_page
lock_page(page) (B)
jffs2_write_begin
grab_cache_page_write_begin
find_lock_page
lock_page(page) (B)
mutex_lock(&f->sem) (A)
We fix this by restructuring jffs2_write_begin() to take f->sem before the page lock. However, we make sure that f->sem is not held when calling jffs2_reserve_space(), as this is not permitted by the locking rules.
The deadlock above was observed multiple times on an SoC with a dual ARMv7 (Cortex-A9), running the long-term 3.4.11 kernel; it occurred when using scp to copy files from a host system to the ARM target system. The fix was heavily tested on the same target system.
If the patch is accepted, please get it also pushed to 3.4; it applies cleanly both to linux-mtd.git and the current linux-3.4 tree.
Cc: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Thomas Betker <thomas.betker@rohde-schwarz.com>

Comments

>> jffs2_write_begin() first acquires the page lock, then f->sem. This causes an AB-BA deadlock with jffs2_garbage_collect_live(), which first acquires f->sem, then the page lock:>> jffs2_garbage_collect_live> mutex_lock(&f->sem) (A)> jffs2_garbage_collect_dnode> jffs2_gc_fetch_page> read_cache_page_async> do_read_cache_page> lock_page(page) (B)>> jffs2_write_begin> grab_cache_page_write_begin> find_lock_page> lock_page(page) (B)> mutex_lock(&f->sem) (A)>> We fix this by restructuring jffs2_write_begin() to take f->sem before the page lock. However, we make sure that f->sem is not held when calling jffs2_reserve_space(), as this is not permitted by the locking rules.>> The deadlock above was observed multiple times on an SoC with a dual ARMv7 (Cortex-A9), running the long-term 3.4.11 kernel; it occurred when using scp to copy files from a host system to the ARM target system. The fix was heavily tested on the same target system.>> If the patch is accepted, please get it also pushed to 3.4; it applies cleanly both to linux-mtd.git and the current linux-3.4 tree.>> Cc: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>> Signed-off-by: Thomas Betker <thomas.betker@rohde-schwarz.com>
Acked-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
David, are you happy with this?
Jocke

On Tue, 2012-10-09 at 21:11 +0200, Thomas Betker wrote:
> jffs2_write_begin() first acquires the page lock, then f->sem. This causes an AB-BA deadlock with jffs2_garbage_collect_live(), which first acquires f->sem, then the page lock:
This patch does not apply to l2-mtd.git, which is based on 3.7-rc1.
$:~/git/l2-mtd$ git apply --check ~/tmp/thomas.mbox
error: patch failed: fs/jffs2/file.c:138
error: fs/jffs2/file.c: patch does not apply
Could you please wrap commit message lines to 80 characters when you
re-send?
Also, would you please add Cc: stable@vger.kernel.org [vX.Y+], where
vX.Y is the starting version for which this patch is relevant, e.g.,
v3.0.
Thanks!

Hello Artem,
I have mailed theupdated patch to the list as "[PATCH v3] ...".
> This patch does not apply to l2-mtd.git, which is based on 3.7-rc1.>> $:~/git/l2-mtd$ git apply --check ~/tmp/thomas.mbox> error: patch failed: fs/jffs2/file.c:138> error: fs/jffs2/file.c: patch does not apply
Fixed, using another mailer ...
> Could you please wrap commit message lines to 80 characters when you> re-send?
Fixed -- sorry, this should not have happened.
> Also, would you please add Cc: stable@vger.kernel.org [vX.Y+], where> vX.Y is the starting version for which this patch is relevant, e.g.,> v3.0.
Fixed, using [v3.4+]. (The patch may also be relevant for earlier kernel versions, but I don't have the means to test them.)
Thanks,
Thomas