If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Hi Chris,
I have to adjust my Wow! statement of previous post. I could not get to openSUSE (2.6.32 kernel) last night. But today I did; Without changing the readahead value but using noatime and nodatasum, a new record here:

Not enuf data to draw the conclusion that the readahead default value is too small for near state of the art storage, i.e., SAS2 HDD and SSD, but it surely looks that way.
SO I changed the default 4096 to 12288 in the /sys/devices/virtual/bdi/btrfs-*/read_ahead_kb files and ran it again ...no love:

I am using the same IOzone parameters and getting the basically same results, so as to not appear too crazy* I changed it to drop the CPU Utilization(it is useless anyway ...), started mount/unmount between each test(there was some indication of previous cache being used) and set the stride to smaller value( the RAID uses 64k stripe).
I won't bore you with useless data. I tried several strides(1*64, 2*64, ... 192*64) and none mattered. READ is about the same.
I had to stop using the auto unmount & mount function in IOzone as every time it was done the readahead_kb was reset to the default 4096 value. I poked around a little but my guess is that is a kernel value I cannot change w/o rebuilding the kernel or module. I'll look a bit more later. ...

I also tried increasing the read_ahead to 32,768 ...even 64MB! No diff for the READ that way either:

While composing this I see you posted.
You're welcome and thank you for the suggestions.

Will try those suggestions, esp. the deadline as I meant to change that and forgot about it. Current scheduler is the default, CFQ.

Prbly should not get too much into the 9211 HBA card specifics but it is pretty typical HBA: no cache and does not have readahead or writeback.
It does allow setting the HDD cache as on or off for use, which is a new widget. It was set to on but I cannot verify it still is. ... LSI Linux software is not only lame but also proprietary => I cannot fix it.
I assume the HDD cache is being used because the boot log indicates the kernel thinks it is enabled:

Just so it is clear, I'm not complaining. I mean who can complain about 500MBps+|-35MBps?
I'm trying to assist. So if there is some other way you want this run just say so. (The IOzone test is ~ 1m15s on this new SAS2 setup so is painless, especially compared to *ATA and PAS disks. )
...or even some other ap if you think IOzone m/b fiddling with results somehow.

Just so it is clear, I'm not complaining. I mean who can complain about 500MBps+|-35MBps?
I'm trying to assist. So if there is some other way you want this run just say so. (The IOzone test is ~ 1m15s on this new SAS2 setup so is painless, especially compared to *ATA and PAS disks. )
...or even some other ap if you think IOzone m/b fiddling with results somehow.

-Ric

Thanks for trying this out, I think the best thing to do would be to nail down exactly how fast the device is.

dd if=/dev/xxx of=/dev/zero bs=20M iflag=direct count=409

/dev/xxx is whatever you built btrfs on top of. This should be a read only benchmark, and since we're running O_DIRECT it removes the kernel readahead from the picture.

Thanks for trying this out, I think the best thing to do would be to nail down exactly how fast the device is.

dd if=/dev/xxx of=/dev/zero bs=20M iflag=direct count=409

/dev/xxx is whatever you built btrfs on top of. This should be a read only benchmark, and since we're running O_DIRECT it removes the kernel readahead from the picture.

-chris

Hi, thanks for the post. I am happy to do whatever I can to assist.
The btrfs(which I pronounce "better f s") is, or at least the potential of, a truly world class fs. I thank you and all the develpers for doing the work and Oracle for funding it. I know it is in Oracle's best interest to have such but making it GPL-licensed ... gotta love'em for at least that.

I was involved with other tasks but got to this today.
Under openSUSE 11.2(kernal 2.6.32-3) the SAS2 IR RAID-0 is device sda.
Background:

READ is a little slower than target value but pretty close. I have no clue on what is margin of error for IOzone results ...

I'm not sure if this means the WRITE IOzone results are inflated or if the Hitachi algorithm and buffer are doing that great of a job for WRITEs, ...or something else.
At the least it appears that kernel 2.6.32-3 is not helping or I and openSUSE have a config that is keeping it from helping.

If the next thing is to use 2.6.33, I will have to build one. openSUSE factory version(for openSUSE 11.3) is broken (here) ... A build is fine; Just a little extra time.

Hi, thanks for the post. I am happy to do whatever I can to assist.
The btrfs(which I pronounce "better f s") is, or at least the potential of, a truly world class fs. I thank you and all the develpers for doing the work and Oracle for funding it. I know it is in Oracle's best interest to have such but making it GPL-licensed ... gotta love'em for at least that.

I was involved with other tasks but got to this today.
Under openSUSE 11.2(kernal 2.6.32-3) the SAS2 IR RAID-0 is device sda.
Background:

READ is a little slower than target value but pretty close. I have no clue on what is margin of error for IOzone results ...

I'm not sure if this means the WRITE IOzone results are inflated or if the Hitachi algorithm and buffer are doing that great of a job for WRITEs, ...or something else.
At the least it appears that kernel 2.6.32-3 is not helping or I and openSUSE have a config that is keeping it from helping.

If the next thing is to use 2.6.33, I will have to build one. openSUSE factory version(for openSUSE 11.3) is broken (here) ... A build is fine; Just a little extra time.

-Ric

Great, different parts of the drive can perform differently. Or, it could be an alignment issue the write cache is hiding.

The easiest way to tell is to do the read test farther down the drive. Where does sda16 start?

Lets pretend it starts 500GB into the drive. You can use rough numbers, we don't need it down to the KB.

500 * 1024 / 20 gives us the number of 20MB blocks into the drive that we need to skip to get to 500GB, which is 25600.

Hey Chris,
Did manage to get 2.6.33 on Mandriva running. Ran some iozone tests and are basically the same except both write and read are slower. The gap narrowed a bit as a result.

The odd thing is that the ext4 & ext3 partitions tested are now exhibiting the same thing: slower reads than writes.
Also, I did a cat of the max_hw_sectors and they were smaller than the max_sectors_kb.
It could be the different companies, Mandriva -v- SUSE, but I'll have to get the 2.6.33 from openSUSE running before I can check it to see if some fiddling was done.
I don't follow the kernel changes much anymore. Is somebody making changes to improve writes?

...
The partition is behind ~ 205GB on the RAID, i.e., about 35% is a clone of another system drive, then the 380+GB that is formatted as btrfs.

The
dd if=/dev/sda of=/dev/zero bs=20M skip=25600 count=409 iflag=direct
will be into the last 10% of the formatted space of the RAID-0.

I won't be able to test with this partition anymore.
I don't suppose it matters since the WRITE faster than READ issue is exhibited on all drive types here but if you want more tests run, it will have to be elsewhere.