On Thu, Mar 29, 2012 at 02:52:40PM -0700, Linus Torvalds wrote:> >> > up 30 minutes with that reverted, no problems so far..> > Goodie. Let it run for a while more, and really pound on it.> > Ted, are there any downsides to just reverting that commit (ie any> subtle interactions) for now? That's assuming that Dave's testing> continues to confirm that it is that one commit.

That commit fixes a race which is seen when you write into fallocated(and hence uninitialized) disk blocks under *very* heavy memorypressure. Furthermore, although theoretically it could trigger undernormal direct I/O writes, it only seems to trigger if you are issuinga huge number of AIO writes, such that a just-written page can getevicted from memory, and then read back into memory, before theworkqueue has a chance to update the extent tree.

This race has been around for a little over a year, and no one noticeduntil two months ago; it only happens under fairly exotic conditions,and in fact even after trying very hard to create a simple repro underlab conditions, we could only reproduce the problem and confirm thefix on production servers running MySQL on very fast PCIe-attachedflash devices.

Given that Dave was able to hit this problem pretty quickly, if weconfirm that this commit is at fault, the only reasonable thing to dois to revert it IMO.