On Friday January 27, chase.venters@clientec.com wrote:> Greetings,> Just a quick recap - there are at least 4 reports of 2.6.15 users > experiencing severe slab leaks with scsi_cmd_cache. It seems that a few of us > have a board (Asus P5GDC-V Deluxe) in common. We seem to have raid in common. > After dealing with this leak for a while, I decided to do some dancing around > with git bisect. I've landed on a possible point of regression:> > commit: a9701a30470856408d08657eb1bd7ae29a146190> [PATCH] md: support BIO_RW_BARRIER for md/raid1> > I spent about an hour and a half reading through the patch, trying to see if > I could make sense of what might be wrong. The result (after I dug into the > code to make a change I foolishly thought made sense) was a hung kernel.> This is important because when I rebooted into the kernel that had been > giving me trouble, it started an md resync and I'm now watching (at least > during this resync) the slab usage for scsi_cmd_cache stay sane:> > turbotaz ~ # cat /proc/slabinfo | grep scsi_cmd_cache> scsi_cmd_cache 30 30 384 10 1 : tunables 54 27 8 : > slabdata 3 3 0>

This suggests that the problem happens when a BIO_RW_BARRIER write issent to the device. With this patch, md flags all superblock writesas BIO_RW_BARRIER However md is not so likely to update the superblock oftenduring a resync.

There is a (rough) count of the number of superblock writes in the"Events" counter which "mdadm -D" will display.You could try collecting 'Events' counter together with the'active_objs' count from /proc/slabinfo and graph the pairs - see ifthey are linear.

I believe a BIO_RW_BARRIER is likely to send some sort of 'flush'command to the device, and the driver for your particular device maywell be losing scsi_cmd_cache allocation when doing that, but I leavethat to someone how knows more about that code.