Which of these PVFS2 Lustre GPFS have some level of redundancy ?
===============================================
David Coornaert (dcoorna at dbm.ulb.ac.be)
Belgian Embnet Node (http://www.be.embnet.org)
Universite' Libre de Bruxelles
Laboratoire de Bioinformatique
12, Rue des Professeurs Jeener & Brachet
6041 Gosselies
BELGIQUE
Te'l: +3226509975
Fax: +3226509998
===============================================
DGS wrote:
>>Now that we have all 63 up and running it looks like we are
>>getting performance issues with NFS much in the same way
>>that others have reported here. Even moderate job loads
>>produce trouble - (nfsstats -c show lots of retransmissions),
>>>>>>Are you using NFS over TCP? If not, you probably should. That
>introduces some reliability problems, in that NFS/TCP is no
>longer stateless. If the file server goes down, clients may
>hang. But since your file server is your head node, it's mostly
>a moot point. Lose the head node, and you lose the cluster
>anyway.
>>>>>grid engine execds don't report back in so qhost shows nodes not
>>responding though eventually they will return. On occasion one of
>>the switches stops and that whole "side" of the cluster disappears.
>>so we reboot the switch and are back in action. Anyway here are my
>>questions (thanks for your patience in reading through this)
>>>>Has anyone had similar problems with these SMC switches ?
>>I'm not accustomed to having the switches die like this.
>>>>In terms of improving NFS performance I've already
>>put SGE spool onto the local nodes to try to improve things
>>but only helps a little. There are various NFS tuning
>>documents with respect to clusters ( using tcp, atime, rsize,
>>wsize, etc options to mount). I've experimented with a few of
>>these (rsize, wsize) though with only very marginal positive impact.
>>for those with larger clusters and similar issues have you found
>>a subset of these options to be more key or influential than others ?
>>>>>>If you use NFS/TCP, the "rsize" and "wsize" parameters are
>irrelevant. The Linux NFS how-to suggest raising the 'sysctl'
>values of "net.core.rmem_max" and "net.core.rmem_default" higher
>than their usual values of 64k. You should also pay attention
>to the number of 'nfsd' processes running on your server. The
>rule of thumb is eight per CPU. In principle, the more clients
>you have the more 'nfsd' processes you want. But multiple server
>processes contend for resources themselves, so you reach a point
>of diminishing returns in starting more.
>>>>>One scenario that has been discussed is bonding two NICs
>>on the v40z in conjunction with switch trunking. Does anyone
>>have any opinions or ideas on this ?
>>>>>>>If your switch can trunk, go ahead. I trunk together gigabit
>ethernet interfaces on a FreeBSD file server. I've some rumours
>to the effect that a four-way trunk on Linux can be slower than
>a two-way, due to problems in the bonding driver. Regard that
>as just hearsay, however, because I don't have any experience
>with such things on Linux. You might consider using jumbo
>frames, if your switches support that.
>>>>>Lastly is it even worth
>>it to keep messing with NFS ? And maybe go for GFS.
>>>>>>There are a number of parallel or cluster file systems in
>addition to GFS, like PVFS2 (free), Lustre (sort of free),
>GPFS (free to universities), TeraFS (commercial), and Ibrix
>(commercial). They may not work well for hosting home
>directories, because they're not optimized for that sort
>of I/O load. They're also, in my experience, rather less
>than stable. We built a fifty node cluster with just GPFS,
>no NFS and very little local disk. The results were quite
>disappointing.
>>File I/O is one of the major un-solved problems of cluster
>computing. Anybody who tells you otherwise is trying to
>sell you something.
>>David S.
>>>>>>>>>>>>>>_______________________________________________
>Bioclusters maillist - Bioclusters at bioinformatics.org>https://bioinformatics.org/mailman/listinfo/bioclusters>>-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://bioinformatics.org/pipermail/bioclusters/attachments/20050830/9d1f66d8/attachment.html