On Sun, 2011-03-13 at 11:48 +1100, Dave Chinner wrote:
Thanks for your response, Dave.
<snip>
> As i said before, the debug check is known to be racy. Having it
> trigger is not necessarily a sign of a problem. I have only ever
> tripped it once since the way the check operates was changed.
> There's no point in spending time trying to analyse it and explain
> it as we already know why and how it can trigger in a racy manner.
Oh, may be I misunderstood. In your earlier reply you mentioned that you
wanted to know if the problem is consistently reproducible. Since it
was, I went on to debug the problem.
If it is not an issue, it will be a good idea to reduce that ASSERT to
WARN_ON_ONCE() as you mentioned.
>
> > Then I started comparing the behavioral difference bet the two ARCHs,
> > and I found that in POWER I see more number of threads at a time (max of
> > 4 threads) in the function xlog_grant_log_space(), whereas in x86_64 I
> > see max of only two and mostly it is only one.
> >
> > I also noted that in POWER test case 011 takes about 8 seconds whereas
> > in x86_64, it takes about 165 seconds.
> >
> > So, I ventured into the core of test case 011, dirstress, and found that
> > simply creating 1000s of files under a directory takes very long time in
> > x86_64 compare to POWER(1 min 15s Vs 2s)
>
> On my x86-64 boxes, test 011 takes 3s with CONFIG_XFS_DEBUG=y, all
> lock checking turned on, memory poisoning active, etc. With a
> prodution kernel, it usually takes 1s. Even on a single SATA drive.
>
> So, without knowing anything about your x86-64 machine, I'd say
> there's something wrong with it or it's configuration. Try turning
> off barriers and seeing if that makes it go faster....
Slowness happened in two x86_64 blades.
In the blade where the storage is a SSD device, nobarrier helped
drastically.
==========
[root@test27 chandra]# mount -o nobarrier
/dev/disk/by-id/wwn-0x5000a7203002f7e4-part1 /mnt/xfsMntPt/
[root@test27 chandra]# time ./b /mnt/xfsMntPt/d1/ 10000 1
i 0
real 0m1.983s
user 0m0.026s
sys 0m1.365s
===================
Whereas, in the blade where the storage is a SAN disk, it didn't help
much. Note that I verified the disk is performing fine by using a ext4
filesystem.
===================
[root@test65 chandra]# mount /dev/sdb1 /mnt/xfs
[root@test65 chandra]# mount /dev/sdb2 /mnt/ext4
[root@test65 chandra]# tail -2 /proc/mounts
/dev/sdb1 /mnt/xfs xfs rw,seclabel,relatime,attr2,noquota 0 0
/dev/sdb2 /mnt/ext4 ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
[root@test65 chandra]# time ./b /mnt/ext4/d1 10000 1
i 0
real 0m0.332s
user 0m0.006s
sys 0m0.264s
[root@test65 chandra]# time ./b /mnt/xfs/d1 10000 1
i 0
real 1m35.620s
user 0m0.012s
sys 0m0.735s
[root@test65 chandra]# mount -o nobarrier /dev/sdb1 /mnt/xfs
[root@test65 chandra]# tail -2 /proc/mounts
/dev/sdb2 /mnt/ext4 ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
/dev/sdb1 /mnt/xfs xfs rw,seclabel,relatime,attr2,nobarrier,noquota 0 0
[root@test65 chandra]# time ./b /mnt/xfs/d1 10000 1
i 0
real 1m6.772s
user 0m0.011s
sys 0m0.739s
========================
What else could affect the behavior like this ?
Also, note that in power I get the fast performace with barrier on.
Thanks,
chandra
<snip>