Comments

On Fri, 2010-11-26 at 14:50 +0100, Wolfgang Wegner wrote:
> [<c0371e40>] (mutex_lock+0x4/0x14) from [<c015cfc4>] (make_reservation+0x74/0x364)> [<c015cfc4>] (make_reservation+0x74/0x364) from [<c015d79c>] (ubifs_jnl_write_inode+0x80/0x1e4)> [<c015d79c>] (ubifs_jnl_write_inode+0x80/0x1e4) from [<c0163c14>] (ubifs_write_inode+0x5c/0xbc)> [<c0163c14>] (ubifs_write_inode+0x5c/0xbc) from [<c00cfbec>] (writeback_single_inode+0x120/0x238)> [<c00cfbec>] (writeback_single_inode+0x120/0x238) from [<c00d08a8>] (writeback_inodes_wb+0x3d4/0x4a8)> [<c00d08a8>] (writeback_inodes_wb+0x3d4/0x4a8) from [<c00d0ad4>] (wb_writeback+0x158/0x1ec)> [<c00d0ad4>] (wb_writeback+0x158/0x1ec) from [<c00d0cb8>] (wb_do_writeback+0x6c/0x1cc)> [<c00d0cb8>] (wb_do_writeback+0x6c/0x1cc) from [<c00d0e38>] (bdi_writeback_task+0x20/0x98)> [<c00d0e38>] (bdi_writeback_task+0x20/0x98) from [<c0099020>] (bdi_start_fn+0x8c/0x100)> [<c0099020>] (bdi_start_fn+0x8c/0x100) from [<c00543bc>] (kthread+0x7c/0x84)> [<c00543bc>] (kthread+0x7c/0x84) from [<c002744c>] (kernel_thread_exit+0x0/0x8)> Code: ebfffd21 e28dd014 e8bd80f0 e3a03000 (e1001093)> ---[ end trace 162376f104dd0abc ]---
Well, for some reason the write-back code thinks UBIFS still has dirty
inodes, but UBIFS was re-mounted R/O and it should not have dirty
inodes. I do not know where the bug is.
> I am a bit puzzled about this all.> - is the flush-ubifs_0_1 process expected to run after the filesystem> has been mounted read-only?
Yes, in .32 it wakes up every 5 seconds. But it should find that there
is nothing to do and go sleep. In newer kernels it does not wake up
unless there is something to do.
> - What can I do to further debug this?
Difficult to say.
Firs of all, try to enable UBIFS debugging - just CONFIG_UBIFS_FS_DEBUG,
not messages. Is the problem still reproducible?
Also, it is interesting where exactly wb_writeback() is called - there
are 2 places.
And it is interesting which inode number is being written by write-back
code. And is it the same every time you oops or not?
Try to reproduce with the following patch:

Hi,
thank you for the hints!
I am currently trying to reproduce the error even without modified
kernel - it seems I changed enough of my init procedure to somehow
avoid the trigger condition. :-(
As soon as I can reproduce the error, I will try what you suggested.
Best regards,
Wolfgang

On Mon, 2010-11-29 at 14:18 +0100, Wolfgang Wegner wrote:
> Hi Artem,> > I can now reproduce the Oops with CONFIG_UBIFS_FS_DEBUG and your> patch.
OK, I thought one of ubi_asserts() would trigger. They do not.
> However, I had to manually apply the hunks because my tree> seems to differ enough for patch to not apply it automatically...
I sent the patch against the ubifs-v2.6.32 tree, which has all the UBIFS
patches back-ported, and which I recommend to use:
http://linux-mtd.infradead.org/doc/ubifs.html#L_source
(hmm, need to update that page, now there are more back-port trees)
> And here is the complete console log in case I missed some> important information:
Well, because of the ps output, it is difficult to read kernel prints.
Also, the situation becomes more difficult because you have several
UBIFS file-systems, so many messages are irrelevant. Would be nice to
print them only for the instance which oopses.
> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs> qqq: kupdated for ubifs
So, not 100% sure, but probably this is kuptdated writeback. For some
reason mm thinks ubifs has dirty data, although it should not have,
because re-mounting path has full sync.
> qqq: writing inode 3298
Does this always happen for inode 3298? Or the inode number changes?

Hi Artem,
On Mon, Nov 29, 2010 at 06:45:57PM +0200, Artem Bityutskiy wrote:
> > I cannot judge for sure, but this looks like non-UBIFS issue in 2.6.32,> so merging the stable 2.6.32.X tree can help. I make sense in general to> merge stable tree as well.
the problem is I still have some "marvell" tree
(from git.marvell.com/orion.git) and am not sure if there is anything
special in it.
Would the "stock" 2.6.35.x from
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
be an option in your opinion, too? I seem to remember the marvell tree
was completely merged in between (right now digging through the commits
to make it clear), so I could get an all shiny, new kernel when having
to do any kind of merge at all...
Regards,
Wolfgang

On Mon, 2010-11-29 at 18:02 +0100, Wolfgang Wegner wrote:
> Hi Artem,> > On Mon, Nov 29, 2010 at 06:45:57PM +0200, Artem Bityutskiy wrote:> > > > I cannot judge for sure, but this looks like non-UBIFS issue in 2.6.32,> > so merging the stable 2.6.32.X tree can help. I make sense in general to> > merge stable tree as well.> > the problem is I still have some "marvell" tree> (from git.marvell.com/orion.git) and am not sure if there is anything> special in it.
Mostly vendors change drivers and board files. They usually do not touch
core functionality. So you can try to merge 2.6.32
> Would the "stock" 2.6.35.x from> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git> be an option in your opinion, too? I seem to remember the marvell tree> was completely merged in between (right now digging through the commits> to make it clear), so I could get an all shiny, new kernel when having> to do any kind of merge at all...
Probably 2.6.35 would be good.
Anyway, the problem is that I do not really have time to drive you
through debugging of your issue. I only have time to get "big" results
from you and provide some help in form of my opinion :-(

On Thu, 2010-12-02 at 05:35 +0200, Artem Bityutskiy wrote:
> On Mon, 2010-11-29 at 18:02 +0100, Wolfgang Wegner wrote:> > Hi Artem,> > > > On Mon, Nov 29, 2010 at 06:45:57PM +0200, Artem Bityutskiy wrote:> > > > > > I cannot judge for sure, but this looks like non-UBIFS issue in 2.6.32,> > > so merging the stable 2.6.32.X tree can help. I make sense in general to> > > merge stable tree as well.> > > > the problem is I still have some "marvell" tree> > (from git.marvell.com/orion.git) and am not sure if there is anything> > special in it.> > Mostly vendors change drivers and board files. They usually do not touch> core functionality. So you can try to merge 2.6.32
Err, I meant 2.6.32.x, of course, the so-called -stable tree which Greg
KH maintains.

Hi Artem,
On Thu, Dec 02, 2010 at 05:35:34AM +0200, Artem Bityutskiy wrote:
> > Mostly vendors change drivers and board files. They usually do not touch> core functionality. So you can try to merge 2.6.32
there had in fact been some special ARM and/or kirkwood things in this
2.6.32 release, but meanwhile this is all integrated into mainline, as
far as I can see.
> Probably 2.6.35 would be good.
I merged our (very few) local changes into 2.6.36 and am now running
this.
> Anyway, the problem is that I do not really have time to drive you> through debugging of your issue. I only have time to get "big" results> from you and provide some help in form of my opinion :-(
Thank you very much for your hints and help so far!
The big result I can give is that I already got this oops once with
2.6.36, too - but could not reproduce it. I also have some other thing
to fix right now, but in case I get back to the problem, I will of
course post results (if any) - thanks to you I now got some pointers
where I can start debugging, this is already great and valuable help.
Regards,
Wolfgang

On Thu, 2010-12-02 at 10:17 +0100, Wolfgang Wegner wrote:
> Hi Artem,> > On Thu, Dec 02, 2010 at 05:35:34AM +0200, Artem Bityutskiy wrote:> > > > Mostly vendors change drivers and board files. They usually do not touch> > core functionality. So you can try to merge 2.6.32> > there had in fact been some special ARM and/or kirkwood things in this> 2.6.32 release, but meanwhile this is all integrated into mainline, as> far as I can see.> > > Probably 2.6.35 would be good.> > I merged our (very few) local changes into 2.6.36 and am now running> this.> > > Anyway, the problem is that I do not really have time to drive you> > through debugging of your issue. I only have time to get "big" results> > from you and provide some help in form of my opinion :-(> > Thank you very much for your hints and help so far!> > The big result I can give is that I already got this oops once with> 2.6.36, too - but could not reproduce it. I also have some other thing> to fix right now, but in case I get back to the problem, I will of> course post results (if any) - thanks to you I now got some pointers> where I can start debugging, this is already great and valuable help.
The same oops? If yes, then this is not 2.6.32-specific issue, then you
should preserve your 2.6.32 setup and dig further, I think. Having a
setup where you can reproduce the bug is very nice thing :-)