On Mon 15-02-10 16:41:17, Jan Engelhardt wrote:> > On Monday 2010-02-15 15:49, Jan Kara wrote:> >On Sat 13-02-10 13:58:19, Jan Engelhardt wrote:> >> >> > >> >> This fixes it by using the passed in page writeback count, instead of> >> >> doing MAX_WRITEBACK_PAGES batches, which gets us much better performance> >> >> (Jan reports it's up from ~400KB/sec to 10MB/sec) and makes sync(1)> >> >> finish properly even when new pages are being dirted.>> >> It seems so. Jens, Jan Kara, your patch does not entirely fix this.> >> While there is no sync/fsync to be seen in these traces, I can> >> tell there's a livelock, without Dirty decreasing at all.> >> > I don't think this is directly connected with my / Jens' patch.> > I start to think so too.> > >Similar traces happen even without the patch (see e.g.> >http://bugzilla.kernel.org/show_bug.cgi?id=14830). But maybe the patch> >makes it worse... So are you able to reproduce these warnings and> >without the patch they did not happen?> > Your patch speeds up the slow sync; without the patch, there was> no real chance to observ the hard lockup, as the slow sync would> take up all time.> > So far, no reproduction. It seems to be just as you say.> > > Where in the code is jbd2_journal_commit_transaction+0x218/0x15e0?> > 0000000000569554 <jbd2_journal_commit_transaction>:> 56976c: 40 04 ee 62 call 6a50f4 <schedule>> > Since there is an obvious schedule() call in jbd2_journal_commit_transaction's> C code, I think that's where it is. OK. Thanks. It seems some process is spending excessive time with atransaction open (jbd2_journal_commit_transaction waits for all handles ofa transaction to be dropped). If you see the traces again, try to obtainstack traces of all the other processes and maybe we can catch the processand see whether it's doing something unexpected. The patch can have an influence on this because we now pass largernr_to_write to ext4_writepages so maybe that makes some corner case morelikely.