hello. Another thought I had while looking at the bug report for
Kern/46136. What if you swap to something other than the raid controller?
I wonder if you're running out of memory insome arena, and that arena is
the raid driver's memory and that's causing everyting to bottleneck
while the page daemon flails around looking for something to flush but it
can't because you're deadlocked trying to write swap pages you can't write
because the amr(4) driver is waiting for memory? I had a similar issue
some years ago with the raidframe driver, and the problem didn't manifest
itself in obvious ways. However, it was reeal enough, and eventually
discovered and fixed. Again, I don't know if this is the problem, but I
thought I'd toss it out there.
Also, I'm not sure about this, but it looks like the driver is
multithreaded with the older raid controllers, but with the newer ones,
it's not. I'm not sure if the one you have qualifies as old or new from
the driver's perspective.
Christoz's comments about async disks also seems worth looking at. Any
chance to look at the mode pages for each disk?
-thanks
-Brian
On May 30, 4:51pm, Brian Buhrow wrote:
} Subject: Re: Severe netbsd-6 NFS server-side performance issues
} Hello. I went back and re-read your original message. It would be
} interesting to know what those nfsd's are doing when they're in d state.
} What does ps -l show? Is it possible to build a kernel with debugging
} symbols and then get things in a plugged up state, then drop into the ddb
} debugger and get a crash dump? If you can do that, then gdb can be used on
} the crash file after the fact to determine what each process is doing at
} the time of the crash. I still think it has something to do with the raid
} controller even if it's not obvious. Is anyone else using the crazy
} version of firmware you picked up?
} Also, what happens if you run two or more bonnie processes
} simultaneously? That ought to stress the raid out and help demonstrate the
} problem.
} -Brian
}
} On May 30, 10:31pm, Hauke Fath wrote:
} } Subject: Re: Severe netbsd-6 NFS server-side performance issues
} } At 12:45 Uhr -0700 30.5.2012, Brian Buhrow wrote:
} } >Hmm. A panic a week? I wouldn't call that stable at all.
} }
} } That's relative... ;)
} }
} } >However, one issue at a time.
} } > Does the raid controller have some sort of battery backed up cache?
} }
} } Yes.
} }
} } >Which driver supports these raid controllers? mfi(4)?
} }
} } amr0 at pci4 dev 0 function 0: AMI RAID <MegaRAID SCSI 320-4X>
} } amr0: interrupting at ioapic2 pin 4
} } amr0: firmware 421D, BIOS H434, 256MB RAM
} } ld0 at amr0 unit 0: RAID 5, optimal
} } ld0: 1675 GB, 218759 cyl, 255 head, 63 sec, 512 bytes/sect x 3514368000
sectors
} }
} } -- I had fun upgrading the firmware (the original version would install the
} } RAID, but not boot), since the controller turned out to be a Fujitsu OEM,
} } although it said LSI all over the board, and LSI refused to support it. In
} } the end, I pried out a totally undocumented firmware from an iso image off
} } an obscure Fujitsu ftp server. It worked, and even fixed the immediate
} } problem.
} }
} } > I'm not sure of the problem, but it sounds like some sort of write
} } >back cacheing switch got flipped in the raid controller, and now writes,
} } >with interleaving reads, are really slow. I'm assuming you've looked at
} } >the BIOS of the raid controller to see if there are any settings which
} } >might have to do with caching?
} }
} } Yes, and they are as I wanted them, battery present, cache set to
} } write-back. I ran bonnie++ on the array last week to check whether the
} } dismal performance is disk related. From the results I got I'd say it is
} } not.
} }
} } > Is it possible to revert to the NetBSD-5.x system and see if your
} } >performance goes back to where it was after the raid upgrade but before
} } >the OS upgrade?
} }
} } Tricky. The installation is netbsd-6, and the disk array is well-filled, so
} } I'd have to attach a sata disk for the netbsd-5 system, plus rebuild the
} } packages installed. Might be worth a try, though, although as I said, the
} } netbsd-5 stability was not satisfactory.
} }
} } > If you export a non-raid disk, what kind of nfs performance do you get
} } >on the system when writing to that disk? (That will help you figure out if
} } >it's a raid problem or an OS problem.)
} }
} } Hm - I could try that quickly tomorrow. Might well give some information.
} }
} } OTOH, the bonnie++ run appears to have cleared the disk subsystem.
} }
} } > I realize that I'm more questions than answers, but I think the
} } >question you're trying to answer is, what changed? The answer is, I think,
} } >something about the raid system.
} }
} } Nothing really changed, that's what baffles me. The machine rebooted after
} } a panic at some point, and came up with bad performance. I had updated the
} } kernel in that context, but as I mentioned, the old kernel gave the same
} } result.
} }
} } At this point, I am very much banging my head against a wall, so questions,
} } by forcing me to re-think the situation from the ground up, are quite
} } helpful.
} }
} } hauke
} }
} }
} }
} }
} }
} } --
} } "It's never straight up and down" (DEVO)
} }
} }
} >-- End of excerpt from Hauke Fath
}
}
>-- End of excerpt from Brian Buhrow