On November 08, 2002 at 10:08, Greg Oster wrote:
> Byron Servies writes:
> > Hi there!
> >
> > I am having a serious performance problem with a raidframe
> > device. I was certain I had seen something like this on the list
> > before, but failed to find appropriate threads while searching the
> > netbsd site.
> >
> > The problem, I believe, is with the raidframe stripe device
> > (configuration and disklabel information below). I had hoped
> > that the raidframe geometry problem was in effect, but that does
> > not appear that this is the case unless I have mis-read the
> > PR. Initially, transfer to the array was fine, but at some
> > point after it passed through 40% full throughput dropped
> > dramatically (some dumpfs, df info below)
>
> If you run a benchmark (e.g pkgsrc/benchmarks/bonnie) on the RAID set,
> what sort of performance do you see?
For the raid set:
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
owl 2000 35731 81.5 41439 25.4 10259 10.0 25762 79.4 37772 19.3 134.3 1.4
I have 4 of the same drive (all of which will eventually be
in the raid set after I rebuild the machine), so I ran bonnie
on one of the standalone disks, too:
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
owl 2000 24777 56.9 25879 17.3 7235 7.4 23069 68.4 24427 15.8 120.1 1.1
systat iostat was consistent for both runs at about 90%.
>
> > When I open a new ftp connection to my netbsd server and send
> > a file to the raidframe device, transfer begins and then
> > immediately stalls for 10-20 seconds before continuing at a
> > reduced but relatively normal rate.
>
> If you monitor the progress of the benchmark (e.g. with 'systat iostat',
> do you see those pauses as well?
>
> > Using a non-raidframe device, transfer is very fast, as expected.
>
> Hmm.
At the beginning of an ftp session to the raid device, systat iostat
reports a lot of activity to the array: roughly 200KBps and 100 tps.
Almost all of it going to the first disk in the array (the second
disk is nearly idle).
The directory I am writing to has a large number of files.
Writing to it's parent also causes a long wait before transfer
begins, but writing to an empty subdirectory starts right
up. Perhaps this is the effect of the 63 secPerSU you mention
below?
I will re-configure the raid array and see if that helps.
> > My NIC is in full duplex mode and while I am receiving
> > CRC errors occasionally from the tlp driver, I do not
> > see them during ftp transfers.
>
> Hmm... do the NIC and the IDE controller share the same IRQ?
I am running a Promise 66 controller for these drives and
the NIC is on a separate card.
> Are the drives on different IDE channels?
No, which I know is a problem. They are master/slave on a
promise 66 card. I should be running them as masters on
separate channels, and when I move to all 4 drives in the
array, I will use both the promise cards 2 channels and the
2 highpoint channels so that all 4 drives will be masters
on their chains.
> Are the drives reporting any sort of read/write errors?
None are showing up on the console or in dmesg.
> Any sort of power-management on the drives?
I believe I have turned all that off, but I will check.
> > I am running kernel 1.6
> > beta 2 on a 1.5.2 base; I was to upgrade to 1.6 this
> > weekend, but I needed to complete this other operation
> > (backup of another machine) to the raid device first.
> >
> > If anyone has a pointers on what might be wrong or where
> > I should be looking, I would appreciate it.
> >
> > Byron
> >
> > -- raid0.conf
> >
> > START array
> > # numRow numCol numSpare
> > 1 2 0
> >
> > START disks
> > /dev/wd1h
> > /dev/wd2h
> >
> > START layout
> > # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
> > 63 1 1 0
>
> Is this really 63? 63 might help scatter directory bits better, but
> I'm not sure it would be better than 64 for general performance.
> (For a 32K write, for example, you'll be putting (at most) 31.5K on one
> disk, and then have to do a separate IO for the remaining 0.5K to the other
> disk... With an 8K write, even that could end up being split over both disks,
> which could be slightly slower. I'm not sure that this is the cause of the
> performance problems, but it probably isn't helping anything :( )
Yes, it's 63. I have forgotten why I chose this value; it was
after an afternoon of reading raidframe docs, though.
Thanks for the quick responce,
Byron