On (01/07/08 18:04), Alexander Beregalov didst pronounce:> 2008/7/1 Mel Gorman <mel@csn.ul.ie>:> > I still have no useful reaction to this. According to Christoph Hellwig,> > this lockup has been appearing since lockdep was introduced but for some> > reason is easier to trigger now. It bisected to the two-zonelist changes> > but it still looks like a red herring as I cannot see how reclaim has> > changed significantly as a result of that patch.> > Do you wait reaction from me? Can I help?> As I mentioned, the lockup does not happen when lockdep is disabled.>

Sorry for the slow response Alexander.

This bug is likely fixed by commit 494de90098784b8e2797598cefdd34188884ec2ewhich will be visible publicly later when maintenance on master.kernel.orgfinishes. I included it below for convenience.The lockdep warning still exists but it is a false positive and should berelatively hard to trigger again. It would be nice to have confirmationof this.

The non-NUMA case of build_zonelist_cache() would initialize the zlcache_ptr for both node_zonelists[] to NULL.

Which is problematic, since non-NUMA only has a single node_zonelists[] entry, and trying to zero the non-existent second one just overwrote the nr_zones field instead.

As kswapd uses this value to determine what reclaim work is necessary, the result is that kswapd never reclaims. This causes processes to stall frequently in low-memory situations as they always direct reclaim. This patch initialises zlcache_ptr correctly.