Me and my colleagues in a university high performance computing lab are soon to attempt installing Gentoo on a dual Opteron workstation that will run visualizations on a GeoWall (see http://www.geowall.org) and on an additional dozen Athlon64 workstations in the same lab. I'm posting the specs below because I'm interested whether anyone knows of problems I'm likely to run into (despite trying to avoid hardware known to have trouble with Gentoo). We will be glad to report on our installation experience after we've done it. We wonder if (and hope) that Gentoo 2004-1 with official AMD64 support is just around the corner.

Which Tyan motherboard are you using? They have numbers, like 2885. Generally speaking, I've seen the Arima dual Opterons running better than the dual Opteron Tyan boards. Fewer problems in general. Also, if you're running DDR 333 or DDR 400, it pays to use vendor-approved memory. The cheap stuff can get you into trouble; it's worth the extra $20 /dimm or whatever the difference is.

Regarding the 3ware card, are you planning on RAID5? If you have the resources, do a benchmark between RAID5 through the 3ware card and JBOD with the 3ware card/software raid 5. My experience was that software RAID was almost twice as fast. We tried the 8506 with eight, six, and four 200GB SATA drives. That benchmark was actually done on a Xeon machine, but you should be able to see similar results on the dual Opteron.

And BTW, 4GB is a lot of ram, but it's not a monster machine anymore. The current quad opteron from Celestica supports 32GB if you use 2GB dimms.

Regarding the 3ware card, are you planning on RAID5? If you have the resources, do a benchmark between RAID5 through the 3ware card and JBOD with the 3ware card/software raid 5. My experience was that software RAID was almost twice as fast. We tried the 8506 with eight, six, and four 200GB SATA drives. That benchmark was actually done on a Xeon machine, but you should be able to see similar results on the dual Opteron.

Interesting. At my department we are buying one possibly two Dual 2GHz opterons with 6GB mem and 30 250GB Hard drives (6TB Raid5) attached using 4 8506 cards. They are going to be used for data analysis with large (100GB+) data sets, so we need every MB per sec we can get. This is a purely IO bound application.
Do you think it still would pay off to use a JBOD configuration or does it take up too many CPU cycles with 30 drives? I was wondering that for best performance a two layered structure would be optimal. Ie: Use hardware Raid0 and combine the raid sets of different controllers using software Raid5. Effectively giving raid50. In this way data read - typically a 2-4 GB chunk of data - would spread out over all the controllers and PCI busses.

Do you have any links to the test results
What would be the optimal filesystem, XFS? ReiserFS? (Reiser4 )

Interesting. At my department we are buying one possibly two Dual 2GHz opterons with 6GB mem and 30 250GB Hard drives (6TB Raid5) attached using 4 8506 cards. They are going to be used for data analysis with large (100GB+) data sets, so we need every MB per sec we can get. This is a purely IO bound application.

If this work can be spread across more than one machine your best performance might be to limit yourself to 15/16 disks per box. Is all this stuff going in a 6U chassis?!

2 8506-8s on one PCI bus is enough strain. I don't know what would happen with four. I know that dual channel-bonded Myrinet cards is enough to nearly chew up the bus capacity in a PC architecture.

haugboel wrote:

Do you think it still would pay off to use a JBOD configuration or does it take up too many CPU cycles with 30 drives? I was wondering that for best performance a two layered structure would be optimal. Ie: Use hardware Raid0 and combine the raid sets of different controllers using software Raid5. Effectively giving raid50. In this way data read - typically a 2-4 GB chunk of data - would spread out over all the controllers and PCI busses.

I envision a few problems in this scenario. First of all, definitely put labels on both ends of every cable you use. Second, what about heat? How are you going to cool this monster set of drives? And then finally your question about cpu cycles. The real answer is I don't know your application. You say it's data bound, rather than cpu bound, so you'll have to do your own benchmarking to get the answer to that. Even if your performance is okay, think about the capacity you're losing:
30x250GB drives = 7500GB
for raid 5, each member has to be a least common denominator large. With 30 drives, some controllers would have 7 drives, others would have 8. If you then set those controllers with raid0, they report to linux as 7*250=1750GB or 8*250=2000GB
then with raid 5 you get N disks-1 worth of capacity, and you also lose the extra capacity of the 8-disk controllers (what I'm saying here is buy two more disks and get yourself 8 disks per controller).
You then lose a full controller's worth of capacity, due to parity.
7*250*(4-1)=5250GB which is not the same as 6TB.

Then you format this with your FS of choice, and lose even more.

haugboel wrote:

Do you have any links to the test results
What would be the optimal filesystem, XFS? ReiserFS? (Reiser4 )

No links to the test results, but you can replicate everything I did yourself. Put the drives in, configure them how you want (press Alt-3 at boot, edit /etc/raidtab.conf, mkraid), then use multiple passes of hdparm, or even try out sample data with your actual applicaition.

I'm personally prejudiced towards reiser3. It's very solid, and how much trouble are you going to have if your data gets toasted? I don't think reiser4 is quite ready for production data. XFS is supposed to have some performance enhancements over reiserfs in certain tests. You should try both to see which gets better raw numbers on your data and the way you do your read/write patterns. One warning I've heard from others is that if you do choose XFS, have your box on a UPS, because it doesn't recover from disaster very well. Again, this is just second hand, I've never played with XFS.

If this work can be spread across more than one machine your best performance might be to limit yourself to 15/16 disks per box. Is all this stuff going in a 6U chassis?!

We are going to have it in a tower (or should I call it a cube ), we found one rack mount chassis which where big enough, but it was from a different supplier, so we have to stick with a tower. Problem is we need 6GB memory to keep our data sets in memory, so not only does the hard disks cost, but actually the rest of the hardware weights in at around 1/3 the price.
Therefore we try to stick in as many hard disk in one chassis as possible. Also it is easier to administer one big box than two small.

backebergd wrote:

2 8506-8s on one PCI bus is enough strain. I don't know what would happen with four. I know that dual channel-bonded Myrinet cards is enough to nearly chew up the bus capacity in a PC architecture.

The TyanS2880GNR motherboard we will use has two PCI-X busses, there should be enough bandwith. Depending on how it goes we may opt for a quad opteron for the second system, if it comes with three or four pci busses.

backebergd wrote:

haugboel wrote:

Do you think it still would pay off to use a JBOD configuration or does it take up too many CPU cycles with 30 drives? I was wondering that for best performance a two layered structure would be optimal. Ie: Use hardware Raid0 and combine the raid sets of different controllers using software Raid5. Effectively giving raid50. In this way data read - typically a 2-4 GB chunk of data - would spread out over all the controllers and PCI busses.

I envision a few problems in this scenario. First of all, definitely put labels on both ends of every cable you use. Second, what about heat? How are you going to cool this monster set of drives? And then finally your question about cpu cycles. The real answer is I don't know your application. You say it's data bound, rather than cpu bound, so you'll have to do your own benchmarking to get the answer to that.

The case is designed for being a file server. Cooling should be ok. We get the system already assembled and stress tested. It is going to be in a server room which has a wind tunnel like ventilation
After doing some research on the topic - ie. googling - I found this link http://home.fnal.gov/~yocum/storageServerTechnicalNote.html which shows that
- It is not trivial to get full performance
- A combination of hard+soft raid is indeed the best
After thinking about it myself I have reached the conclusion that maybe the best way to partition the system is to make two Raid5 arrays with 4 disks on the two first controllers and one on the third and fourth and then make a raid0 scratch disk out of the remaining 6 disks (3 on controllers 3 and 4). Then use software Raid0 - which should be less stressing CPU wise than software raid5 - to combine the different arrays. We would then end up with 6x750GB Raid5 + 2x750GB Raid0 ie. 6TB raw drive space in 4.5TB Raid5 + 1.5TB Raid0 and redundancy isn't that bad either. We have a 42 disk SCSI array partitioned into 7 Raid5 sets of 6 disks and have never had a fatal error. Only thing that makes me nervous is the stability of software raid0. The system is on a UPS.
We are limited to 30 disks by the amount of hot-swap cages we can fit in the system. 10 cages of three disks per cage, but maybe we can work around that with a dirty hack.

backebergd wrote:

No links to the test results, but you can replicate everything I did yourself. Put the drives in, configure them how you want (press Alt-3 at boot, edit /etc/raidtab.conf, mkraid), then use multiple passes of hdparm, or even try out sample data with your actual applicaition.

I'm personally prejudiced towards reiser3. It's very solid, and how much trouble are you going to have if your data gets toasted? I don't think reiser4 is quite ready for production data. XFS is supposed to have some performance enhancements over reiserfs in certain tests. You should try both to see which gets better raw numbers on your data and the way you do your read/write patterns. One warning I've heard from others is that if you do choose XFS, have your box on a UPS, because it doesn't recover from disaster very well. Again, this is just second hand, I've never played with XFS.

Mentioning Reiser4, was just for the hell of it ...After seeing the tests in the link above and other places I think we will try out both XFS and ReiserFS and see what works best. The box should be arriving in 3 weeks from now, but we want to be well prepared.

Which Tyan motherboard are you using? They have numbers, like 2885. Generally speaking, I've seen the Arima dual Opterons running better than the dual Opteron Tyan boards. Fewer problems in general. Also, if you're running DDR 333 or DDR 400, it pays to use vendor-approved memory. The cheap stuff can get you into trouble; it's worth the extra $20 /dimm or whatever the difference is.

Regarding the 3ware card, are you planning on RAID5? If you have the resources, do a benchmark between RAID5 through the 3ware card and JBOD with the 3ware card/software raid 5. My experience was that software RAID was almost twice as fast. We tried the 8506 with eight, six, and four 200GB SATA drives. That benchmark was actually done on a Xeon machine, but you should be able to see similar results on the dual Opteron.

And BTW, 4GB is a lot of ram, but it's not a monster machine anymore. The current quad opteron from Celestica supports 32GB if you use 2GB dimms.

Thanks for the feedback.

The board we're using is the Tyan 2885 "K8W"

Yes, we are intending to do RAID5. Our rationale in going with the 3ware card was that we didn't want to burden the Opterons with doing parity calculations and IO business (we want them to concentrate on live simulations) and 3ware seems to have a particularly good committment to Linux (and the Gentoo AMD64 tech notes says that it works). Also, RAID5 is providing us a sort of backup scheme, as we have no other. Our apps aren't mission critical or anything, but a half a terabyte of data is not easy to backup any other way without spending many extra $. Also the 8port 3ware card will let us grow our in-box data storage larger than the onboard software RAID would have let us. We'll be seeing if the 3ware card investment (about 8% of the entire system cost) was worth it.

And, I agree, 4GB isn't ridiculously huge these days (and we can upgrade to 8GB since we're using 1GB sticks. 2GB sticks are WAY too expensive to consider now). We do expect some of our visualization apps to make use of all of the memory we can throw at them. And yes we're using approved memory, which our vendor (www.reasonco.com) will stress test for us anyway (but we'll be doing the gentoo install).

Oh no.... your 8506-8 configuration seems identical to what we've ordered... I must look into this immediately and see if I can have my vendor ask 3Ware about this. I made my storage subsystem choice partly based on the fact that in the the Gentoo AMD664 Tech Notes by Brad House at http://dev.gentoo.org/~brad_mssw/amd64-tech-notes.html said that the 3Ware 85xx controllers were known to work and I also took to heart what Mr. House said about software vs hardware RAID. But if there's a conflict between the 3Ware boards and Tyan's boards, then I need to solve it or find another solution.

Can I ask (being a RAID noob), how bad (qualitatively) is the bad performance you experience in the "working" configuration with the board attached to PCI-X bus 2?

I will certainly share any info I can track down on this.

-- Ed

doerrfleischfee wrote:

Hello, may I ask you if you can tell me when you re encountering any problems regarding to 3ware escalade raid controllers an the tyan s2885 mobo?

My box has a 3ware 8506-8 attached to pci-x bus B on an K8W with four WD740GD discs in raid5.
On my way setting up this machine I am confronted with the following problems:

If you check the whole archive(!) from November until now, the 3ware question comes up multiple times, and it seems they found a work around. Furthermore it should only be a problem for people with more than 3.5GB of memory. There are also problems with the AGP controller on the TYAN board. In the SUSE mailing list people upgrade to the latest (beta) bios. YMMV, but if I had known, I wouldn't have ordered a TYAN board.
Just my 2c - Troels

I just did and I'm very concerned. I will also try to get answers from Tyan and 3Ware.

This leads me to another question: Is there another 8 port SATA controller out there that works with Gentoo under the 2.6x kernel on dual Opterons on a Tyan K8W? I wonder if the adaptec ones work without trouble.

Reading the 3Ware 8506 -- Tyan K8W incompatibility threads, I found reference to the following recent note in 3Ware's Knowledge Base! It seems to imply that the Escalade controllers won't work in Tyan dual Opteron workatation boards at all unless one uses a special riser card (not a good option for tower cases.

Quote:

FROM 3Ware's Knowledge Base:
Q10964 - Software Configuration: I am having trouble with 2880 and 2885 Tyan motherboards that support AMD Opteron chipsets.

3ware recommends installing the Escalade 7506/8506 series controllers with a 3ware recommended riser card and only in slots 1 or 2 on the Tyan S2880 and the Tyan S2885. These slots are located closest to the memory modules (DIMM sockets) on the motherboards. A 3ware recommended riser card from Adex is required (see attached tech brief). Please review the linked technical brief that describes issues that are known that affect interoperability between 3ware 66 MHz RAID controllers, and certain riser card and Motherboard combinations.

Placing a 66MHz 3ware Escalade RAID controller in slots 1 or 2 without a 3ware recommended riser card, or in slots 3,4 or 5 on these Tyan platforms may impact data integrity. The attached document describes these issues in more detail.

I will try early this week to get answers from 3Ware and Tyan. Either I'm going to find indications soon that this problem will be solve imminently or I'm going to have to select another ATA RAID controller (or another dual Opteron motherboard). (In case I've got to do that, are there any recommendations in terms of ATA RAID controllers that ARE known to simply work without problems on K8W's running Gentoo AMD64 with 2.6x kernels? Is this configuration just too bleeding edge at the moment?)

The place where we bought our box (see above for spec) is specialised in 3ware and only sells Tyan motherboards So for the moment he doesn't offer any opteron solutions
To me it sounds like a hardware problem (timing of PCI bus), so I don't see how you can fix that in software. Anyway we wanted to play it safe and he got a MSI mainboard for us. It is not the optimal solution - only has three PCI-X slots - so I would like to know if anybody knows of a mainboard there has 4 or more PCI-X slots on minimum 2 busses and is compatible with 3ware. It has to be opteron, because we need 64-bit adressing. It is ok with a quad CPU board (if they can be had somewhere). The reason for asking is that we probably - if everything works out fine - will build one box more. I will try to post some performance data here, whenever we get the box (approx 3 weeks from now)

3ware released their next generation of sata raid adapters. According to 3ware the new 9500 series are almost twice as fast as old ones, but I wasn't able to find any information about compatibility with tyan boards...