I ran some fuzz tests on a vfat filesystem on 3.17 and found afilesystem that differs from a pristine filesystem by one bit andcauses a kernel BUG. This seems to be an old bug; I can also replicateit on a 3.3.4 kernel I happened to have lying around.

The issue here is that this directory entry is faulty.Usually each entry has a "." entry which points to itself. In your imagethe ->name is different. Hence, there is no "." but a directory named". `"which points to itself.IOW a directory loop.

The vFAT file system misses to check whether there is really a "." and".." directory.These directories are the only ones which are allowed to be hard links.And it misses to detect loops which triggers the BUG_ON() in the VFS core.For the latter issue I'm not unsure where to fix this.Al?

--Thanks,//richard--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Do you know how this reproduce? testimg.vfat was empty, andtestimg.vfat.24.min was corrupted already.

We would need the way how make corrupted image like testimg.vfat.24.min,to find the cause of this problem. Base image for reproducing this bug,and way to do are very helpful.

Thanks.

--OGAWA Hirofumi <***@mail.parknet.co.jp>--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Post by Sami LiedesI ran some fuzz tests on a vfat filesystem on 3.17 and found afilesystem that differs from a pristine filesystem by one bit andcauses a kernel BUG. This seems to be an old bug; I can also replicateit on a 3.3.4 kernel I happened to have lying around.mount $TARGET_DEV /mnt -t vfatcd /mnttimeout 30 cp -r doc doc2 >&/dev/nulltimeout 30 find -xdev >&/dev/nulltimeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- >&/dev/nulltimeout 30 mkdir tmp >&/dev/nulltimeout 30 echo whoah >tmp/filu >&/dev/nulltimeout 30 rm -rf /mnt/* >&/dev/nullcd /umount /mntThe backtrace seems to indicate that the BUG happens at the rm phase.You can get the pristine filesystem fromhttp://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.bz2The broken filesystem is athttp://www.niksula.hut.fi/~sliedes/vfat/testimg.vfat.24.min.bz2

Hm, "." entry in directly seems to be corrupted. Maybe, possible causesare, memory error, memory corruption, fat bug, other bug?Well, this is 1 bit flip. Can you check memory by memtest or such?Do you know how this reproduce? testimg.vfat was empty, andtestimg.vfat.24.min was corrupted already.We would need the way how make corrupted image like testimg.vfat.24.min,to find the cause of this problem. Base image for reproducing this bug,and way to do are very helpful.

You misunderstood Sami's issue. He corrupted the vfat fs intentionallyto find issuesin the vfat driver.And as he reports he found an nasty issue.Any user can trigger a BUG_ON() using a crafted vfat image.Please note, if you mount exactly the same image using msdos fs the issuedoes not occur.

--Thanks,//richard--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Post by Richard WeinbergerYou misunderstood Sami's issue. He corrupted the vfat fs intentionallyto find issuesin the vfat driver.And as he reports he found an nasty issue.Any user can trigger a BUG_ON() using a crafted vfat image.Please note, if you mount exactly the same image using msdos fs the issuedoes not occur.

Yeah, you can think of it as either a security issue if you wish, orjust as a plain old robustness issue in the age of unreliable USBsticks etc. in that it just would be more ideal to fail gracefullyinstead of crashing.

Anyway, I'm not advocating for any measure of severity (I leave thatto others); I just find and report these in the hope that someonefinds the reports useful. I personally view these more as robustnessthan security bugs at the moment, but certainly they can be seen aseither.

And if some of these get fixed, I will rerun the tests, so I mightproduce a daunting stream of reports. I know it would be nicer toreport everything at once, but usually the issues found first areprevalent enough to mask other issues.

By the way, I find it interesting that once I implemented a tool tominimize the differences between a bad fs and a good fs, every bug Ihave found in any filesystem implementation has minimized to a singlebit flip. That suggests to me that fuzz testing is probably not veryeffective in finding bugs that require more than that.

Post by OGAWA HirofumiWe would need the way how make corrupted image like testimg.vfat.24.min,to find the cause of this problem. Base image for reproducing this bug,and way to do are very helpful.

You misunderstood Sami's issue. He corrupted the vfat fs intentionallyto find issuesin the vfat driver.And as he reports he found an nasty issue.Any user can trigger a BUG_ON() using a crafted vfat image.Please note, if you mount exactly the same image using msdos fs the issuedoes not occur.

Ah.

BTW, msdos doesn't allow ".*" as filename, so not trigger this. But rootcause of this is same as double linked dir, "." should notmatter. I.e. this issue would be able to reproduce on all FSes if madecorrupted image intentionally.

If we want to fix intentional corruption like this seriously, I guess wewould need something like online-fsck to detect like double link ofdir. If we want to avoid only Oops, it might be enough to removeBUG_ON().

I'm still not sure whether this is right direction or not though,because mount operation is root only and untrusted image should run fsckbefore. But, also, Oops is clearly unexpected. Hmmm...

--OGAWA Hirofumi <***@mail.parknet.co.jp>--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Post by OGAWA HirofumiWe would need the way how make corrupted image like testimg.vfat.24.min,to find the cause of this problem. Base image for reproducing this bug,and way to do are very helpful.

You misunderstood Sami's issue. He corrupted the vfat fs intentionallyto find issuesin the vfat driver.And as he reports he found an nasty issue.Any user can trigger a BUG_ON() using a crafted vfat image.Please note, if you mount exactly the same image using msdos fs the issuedoes not occur.

Ah.BTW, msdos doesn't allow ".*" as filename, so not trigger this. But rootcause of this is same as double linked dir, "." should notmatter. I.e. this issue would be able to reproduce on all FSes if madecorrupted image intentionally.If we want to fix intentional corruption like this seriously, I guess wewould need something like online-fsck to detect like double link ofdir. If we want to avoid only Oops, it might be enough to removeBUG_ON().I'm still not sure whether this is right direction or not though,because mount operation is root only and untrusted image should run fsckbefore. But, also, Oops is clearly unexpected. Hmmm...

This limitation is not true anymore. Plug in a USB stick into a recent Linux desktop,it will automatically mount it...Also think of user namespaces and FUSE.

Post by OGAWA HirofumiI'm still not sure whether this is right direction or not though,because mount operation is root only and untrusted image should run fsckbefore. But, also, Oops is clearly unexpected. Hmmm...

This limitation is not true anymore. Plug in a USB stick into a recentLinux desktop, it will automatically mount it... Also think of usernamespaces and FUSE.

Not really (well, true, some sort though). It is still controlled by rootor capability, right? I.e. still controlled by admin of system.

I read user namespaces last time, it doesn't allow to mount the blockdevice by namespace's root.

--OGAWA Hirofumi <***@mail.parknet.co.jp>--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Post by OGAWA HirofumiI'm still not sure whether this is right direction or not though,because mount operation is root only and untrusted image should run fsckbefore. But, also, Oops is clearly unexpected. Hmmm...

This limitation is not true anymore. Plug in a USB stick into a recentLinux desktop, it will automatically mount it... Also think of usernamespaces and FUSE.

Not really (well, true, some sort though). It is still controlled by rootor capability, right? I.e. still controlled by admin of system.

Fact is, I can plugin a USB stick to my buddies Laptop and make it trigger a BUG_ON. :)

Post by OGAWA HirofumiI read user namespaces last time, it doesn't allow to mount the blockdevice by namespace's root.FUSE is allowed to mount by true user (I.e. admin can't disallow it)? Istill didn't check it fully.

The question is how long these limits will stay...User namespaces uncovered already a pile of issues wrt. to mounting.

Thanks,//richard--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Post by OGAWA HirofumiI'm still not sure whether this is right direction or not though,because mount operation is root only and untrusted image should run fsckbefore. But, also, Oops is clearly unexpected. Hmmm...

This limitation is not true anymore. Plug in a USB stick into a recentLinux desktop, it will automatically mount it... Also think of usernamespaces and FUSE.

Not really (well, true, some sort though). It is still controlled by rootor capability, right? I.e. still controlled by admin of system.

Fact is, I can plugin a USB stick to my buddies Laptop and make it trigger a BUG_ON. :)

Post by OGAWA HirofumiI read user namespaces last time, it doesn't allow to mount the blockdevice by namespace's root.FUSE is allowed to mount by true user (I.e. admin can't disallow it)? Istill didn't check it fully.

The question is how long these limits will stay...User namespaces uncovered already a pile of issues wrt. to mounting.

Well, anyway, I don't object like that simple patch.

My worry is, I feel we need something like online-fsck finally if wetackled fully to avoid issues (I still didn't analyze about this issueseriously and fully), and measurable overheads.

And I myself have interest to online/runtime-fsck (i.e. detect and fix)though, I don't have interest to make it generic operations, and I wouldnot have interest to tackle for all FSes...

--OGAWA Hirofumi <***@mail.parknet.co.jp>--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Post by Richard WeinbergerThe question is how long these limits will stay...User namespaces uncovered already a pile of issues wrt. to mounting.

Well, anyway, I don't object like that simple patch.My worry is, I feel we need something like online-fsck finally if wetackled fully to avoid issues (I still didn't analyze about this issueseriously and fully), and measurable overheads.And I myself have interest to online/runtime-fsck (i.e. detect and fix)though, I don't have interest to make it generic operations, and I wouldnot have interest to tackle for all FSes...

At the end of the day we have to treat every filesystem provided by the user asuser input and must not trust it.

Thanks,//richard--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Post by OGAWA HirofumiI'm still not sure whether this is right direction or not though,because mount operation is root only and untrusted image should run fsckbefore. But, also, Oops is clearly unexpected. Hmmm...

This limitation is not true anymore. Plug in a USB stick into a recentLinux desktop, it will automatically mount it... Also think of usernamespaces and FUSE.

Not really (well, true, some sort though). It is still controlled by rootor capability, right? I.e. still controlled by admin of system.

Fact is, I can plugin a USB stick to my buddies Laptop and make it trigger a BUG_ON. :)

Post by OGAWA HirofumiI read user namespaces last time, it doesn't allow to mount the blockdevice by namespace's root.FUSE is allowed to mount by true user (I.e. admin can't disallow it)? Istill didn't check it fully.

The question is how long these limits will stay...User namespaces uncovered already a pile of issues wrt. to mounting.

Well, anyway, I don't object like that simple patch.My worry is, I feel we need something like online-fsck finally if wetackled fully to avoid issues (I still didn't analyze about this issueseriously and fully), and measurable overheads.And I myself have interest to online/runtime-fsck (i.e. detect and fix)though, I don't have interest to make it generic operations, and I wouldnot have interest to tackle for all FSes...

VFAT suffers from the issue because it is using dentry aliases to have fast lookupsfor short and long names. Without aliases the VFS will catch the loop just fine as on ext2 (I'vetested it).Let's change vfat_lookup() to verify that both the alias and dentry have the same parent otherwisewe're facing a loop.

Thanks,//richard--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Post by OGAWA HirofumiWell, anyway, I don't object like that simple patch.My worry is, I feel we need something like online-fsck finally if wetackled fully to avoid issues (I still didn't analyze about this issueseriously and fully), and measurable overheads.And I myself have interest to online/runtime-fsck (i.e. detect and fix)though, I don't have interest to make it generic operations, and I wouldnot have interest to tackle for all FSes...

What about this one?

Looks like strange. If we want to tackle this at per-FS. We should notreturn double linked dir at first. Since double linked breaks dirhierarchy, even if this one can avoid that Oops, double linked can beeasily the cause of another Oops, deadlock, etc.

Well, this patch is untested though. For example, somethings likefollowing. But, again, this fixes only one of cases in double linked.(And to fix fully, my mind was already talked.)

--OGAWA Hirofumi <***@mail.parknet.co.jp>--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Looks like strange. If we want to tackle this at per-FS. We should notreturn double linked dir at first. Since double linked breaks dirhierarchy, even if this one can avoid that Oops, double linked can beeasily the cause of another Oops, deadlock, etc.Well, this patch is untested though. For example, somethings likefollowing. But, again, this fixes only one of cases in double linked.(And to fix fully, my mind was already talked.)

Hmm... Why hadn't d_splice_alias() caught that, though? Look: in thatcase we see that inode is non-NULL, a directory and has an alias(namely, dentry->d_parent). So we hit this:new = __d_find_any_alias(inode);if (new) {if (!IS_ROOT(new)) {spin_unlock(&inode->i_lock);dput(new);return ERR_PTR(-EIO);}if (d_ancestor(new, dentry)) {spin_unlock(&inode->i_lock);dput(new);return ERR_PTR(-EIO);}and depending on whether that ->d_parent had been the filesystem root,we hit either the former or the latter. IOW, we should've doneexactly that...

FWIW, there *is* a bug in that path - we ought to have done iput(inode)on both failure exits in order to follow the calling conventions. But thatdoesn't look like it would oops right there...

Could somebody repost the oops stack trace? The bug in d_splice_alias()is real (and fairly old), but I'd like to understand if there's anythingelse in the game...--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Moreover, for directories we don't want to bother with that codepathat all - d_splice_alias() will do that d_move() just fine there.--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

--OGAWA Hirofumi <***@mail.parknet.co.jp>--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Hm, sounds like I'm missing something. what case has different->d_parent on alias if prevented by my check?

You meant the case of more complex double linked loop?

--OGAWA Hirofumi <***@mail.parknet.co.jp>--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Hm, sounds like I'm missing something. what case has different->d_parent on alias if prevented by my check?

You meant the case of more complex double linked loop?

Yes. This is why I came up with the alias->d_parent == dentry->d_parent check.

Thanks,//richard--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Hmm... Not in the current mainline (and not because of want_discon - that'sgone already). However, with the fixes I've got in the local tree itwill both find and move it - same as d_materialise_unique() would in thecurrent mainline.--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Hmm... Not in the current mainline (and not because of want_discon - that'sgone already). However, with the fixes I've got in the local tree itwill both find and move it - same as d_materialise_unique() would in thecurrent mainline.

Untested interim fix follows; as soon as d_splice_alias()/d_materialise_unique()merge happens, we'll be able to clean vfat_lookup() a bit more.

a) don't bother with ->d_time for positives - we only check it for negativesanyway.b) make sure to set it at unlink and rmdir time - at *that* point soon-to-benegative dentry matches then-current directory contentsc) don't go into renaming of old alias in vfat_lookup() unless it hasthe same parent (which it will, unless we are seeing corrupted image) *and*is a non-directoryd) use (for now) d_materialise_unique() instead of d_splice_alias() - that onewill do renames of old directory aliases just fine (and pretty soon so willd_splice_alias(), but this bug is -stable fodder)

mutex_unlock(&MSDOS_SB(sb)->s_lock);--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Hmm... Not in the current mainline (and not because of want_discon - that'sgone already). However, with the fixes I've got in the local tree itwill both find and move it - same as d_materialise_unique() would in thecurrent mainline.

Untested interim fix follows; as soon as d_splice_alias()/d_materialise_unique()merge happens, we'll be able to clean vfat_lookup() a bit more.a) don't bother with ->d_time for positives - we only check it for negativesanyway.b) make sure to set it at unlink and rmdir time - at *that* point soon-to-benegative dentry matches then-current directory contentsc) don't go into renaming of old alias in vfat_lookup() unless it hasthe same parent (which it will, unless we are seeing corrupted image) *and*is a non-directoryd) use (for now) d_materialise_unique() instead of d_splice_alias() - that onewill do renames of old directory aliases just fine (and pretty soon so willd_splice_alias(), but this bug is -stable fodder)

Ah, this calls d_move() for directory?

I recalled why it didn't change alias if directory. Changing result ofgetcwd() may become the surprise of apps... If doesn't surprise, looksgood.

--OGAWA Hirofumi <***@mail.parknet.co.jp>--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Hmm... Not in the current mainline (and not because of want_discon - that'sgone already). However, with the fixes I've got in the local tree itwill both find and move it - same as d_materialise_unique() would in thecurrent mainline.

Ah, I see. Checked latest linus tree, looks like added some checks (Iwas still working on 3.6.15, sorry).

So, Richard, can you add comment such as

/* Check FS corruption, will handle by d_splice_alias() */

for that less understandable check, and resend patch with yourSigned-off-by: (better to Cc: akpm)?

Acked-by: OGAWA Hirofumi <***@mail.parknet.co.jp>

Thanks.

--OGAWA Hirofumi <***@mail.parknet.co.jp>--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Hmm... Not in the current mainline (and not because of want_discon - that'sgone already). However, with the fixes I've got in the local tree itwill both find and move it - same as d_materialise_unique() would in thecurrent mainline.

What happened to the idea of failing with -EIO in that case?

In the case of a disk filesystem it seems like it can only happen due tocorruption and I'd assumed we're better off failing the first time wenotice it.

(That said, I haven't thought it through beyond that, so it's notsomething I have that strong an opinion on.)

--b.--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Post by Al ViroHmm... Not in the current mainline (and not because of want_discon - that'sgone already). However, with the fixes I've got in the local tree itwill both find and move it - same as d_materialise_unique() would in thecurrent mainline.

What happened to the idea of failing with -EIO in that case?

Quiet d_move() is definitely better in this case. Failing with EIO is whatd_splice_alias() does *now*, and that's why the patch I've posted usesd_materialise_unique(). Patches in my local tree switch that case ofd_splice_alias() to what d_materialise_unique() does (and killd_materialise_unique() completely - compat #define is left, but allin-tree users are switched to d_splice_alias())...--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Post by Al ViroHmm... Not in the current mainline (and not because of want_discon - that'sgone already). However, with the fixes I've got in the local tree itwill both find and move it - same as d_materialise_unique() would in thecurrent mainline.

What happened to the idea of failing with -EIO in that case?

Quiet d_move() is definitely better in this case.

Can you explain why? (Apologies if I'm missing the obvious.)

--b.

Post by Al ViroFailing with EIO is whatd_splice_alias() does *now*, and that's why the patch I've posted usesd_materialise_unique(). Patches in my local tree switch that case ofd_splice_alias() to what d_materialise_unique() does (and killd_materialise_unique() completely - compat #define is left, but allin-tree users are switched to d_splice_alias())...

--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html

Post by OGAWA HirofumiHm, sounds like I'm missing something. what case has different->d_parent on alias if prevented by my check?

A cross-directory link.--To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" inthe body of a message to ***@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html