From timchipman at MailAndNews.com Tue Jan 1 15:09:45 2002
From: timchipman at MailAndNews.com (Tim Chipman)
Date: Tue, 1 Jan 2002 15:09:45 -0500
Subject: Query: PVM Setup on ScyldBeowulf?
Message-ID: <3C34813C@MailAndNews.com>
Can anyone give me pointers as to what is required to get PVM running
smoothly on a "typical" diskless slave node beowulf type setup (ie, one
head node with a disk that netboots all diskless slave nodes) ? [This is
using the most-current "freebie" version of Scyld as purchased from
CheapBytes]
I am able to locate the pvmd, start the thing, and monitor status of pvm,
but I can't see how to
-build a good setup file that will get slave nodes working smoothly
-where to put binaries on head node so they are available (via NFS
shares I guess?) to slave nodes?
-how to set environment variables required for PVM on the slave nodes..?
Any pointers on this would certainly be greatly appreciated.
Thanks very much!
Tim Chipman
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From Eugene.Leitl at lrz.uni-muenchen.de Wed Jan 2 03:37:01 2002
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl)
Date: Wed, 2 Jan 2002 09:37:01 +0100 (MET)
Subject: Dot (career update) (fwd)
Message-ID:
-- Eugen* Leitl leitl
______________________________________________________________
ICBMTO: N48 04'14.8'' E11 36'41.2'' http://www.leitl.org
57F9CFD3: ED90 0433 EB74 E4A9 537F CFF5 86E7 629B 57F9 CFD3
---------- Forwarded message ----------
Date: Tue, 1 Jan 2002 18:33:07 -0500
From: Eirikur Hallgrimsson
To: FoRK at xent.com
Subject: Re: Dot (career update)
Atoms. I'm in the software-frosting of atoms business at the moment.
I get to play around with Free Software, but the revenue definitely comes
from selling iron.
I've been meaning to post a career-update to FoRK.
I'm at Microway, (www.microway.com), purveyors of fine number-crunching to
the trade. Microway has taken over the boards/processors/systems
business from the former Alpha Processors, Inc (now API Networks).
http://www.microway.com/pr/microway-api.html
The volume business is Intel/AMD dual processor rackmount compute engines,
but a surprizing number of customers (200 box order from a state U the
other day) go for standard PC tower cases. My guess was that they think
they can re-use systems in tower cases more easily, but even the 1U boxes
come with motherboard video.
Microway actually makes Alpha mother and daughterboards. A 35-person
company with surface-mount and reflow machines, etc. It's a cute little
company, and doing really well. It's privately-held and mom and pop
still manage it personally.
They hired me in as a consultant because I knew Alpha processors and
Beowulf clustering from my gig doing an interactive demo of cluster
scaling for Compaq. I traveled to a can't-name national lab to install
the first big cluster with Microway-made motherboards.
Most orders are custom in some way. Particularly the software. I'm
currently working on putting together available cluster management and
monitoring tools to add to our standard Linux installs. "The market" for
clustered systems seems to be moving toward integrated systems with
management software and a GUI. Makes sense. I'm looking at what I can
put together from Free and Open Source bases to provide an offering that
is less spartan than just cloned Linux installs with the MPI/PVM parallel
programming libraries. Input very welcome. I'm thinking that we
shouldn't be doing a VA-Linux complex custom software solution. After
all, VA tanked and we are going strong with the present minimalist
approach. What I *think* we should do is to ship one of the free
cluster-on-a-CDROM configurations
http://rocks.npaci.edu/
www.openclustergroup.org/
with a little custom configuration and some of Microway's traditional
scripts, but I don't know of one that is complete enough. I may be better
off mixing and matching from the available components. Microway doesn't
need the killer feature (automatic remote installation of the OS on
clients) of these packages, because we'd probably just clone disks as we
do today. I'm not allergic to doing some actual development, but, as I
say, I don't know if it's justified. I would like to come up with a
pretty and functional GUI layered on top of the open tools. Maybe that
will be my background project.
My strawman cluster is presently Rocks-based, but I have to add a bunch
of stuff manually. What the customer gets with Rocks, though, is a
system where the sysadmin can say "reload node-93" and 93 will get a
complete reinstallation from the master. You update a directory on the
server and can reinstall all the nodes pretty easily.
This is a totally different work environment for me. I was working for a
small company (Ajilon/SQP) for a while, but mostly they hired me out back
to Compaq.
Eirikur
http://xent.com/mailman/listinfo/fork
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From per at computer.org Wed Jan 2 17:25:17 2002
From: per at computer.org (Per Jessen)
Date: Wed, 02 Jan 2002 23:25:17 +0100
Subject: beowulf list still alive ?
Message-ID: <3C3201A60004923F@mta1n.bluewin.ch> (added by postmaster@bluewin.ch)
All,
I haven't seen any mails from the list since Dec10 - also, I can't resolve
www.beowulf.org to anything.
Anyone listening ?
/Per
regards,
Per Jessen, Zurich
http://www.enidan.com - home of the J1 serial console.
Windows 2001: "I'm sorry Dave ... I'm afraid I can't do that."
regards,
Per Jessen, Zurich
http://www.enidan.com - home of the J1 serial console.
Windows 2001: "I'm sorry Dave ... I'm afraid I can't do that."
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From becker at scyld.com Wed Jan 2 19:20:00 2002
From: becker at scyld.com (Donald Becker)
Date: Wed, 2 Jan 2002 19:20:00 -0500 (EST)
Subject: beowulf list still alive ?
In-Reply-To: <3C3201A60004923F@mta1n.bluewin.ch> (added by postmaster@bluewin.ch)
Message-ID:
On Wed, 2 Jan 2002, Per Jessen wrote:
> I haven't seen any mails from the list since Dec10 - also, I can't resolve
> www.beowulf.org to anything.
Yes, the list is still alive, but the nameserver problems with
beowulf.org are continuing.
For some people that means that messages from beowulf at beowulf.org are
dropped by spam filters. Unfortunately those are exactly the people
that should be getting this message...
Hopefully this will all be cleared up by tomorrow, although with Network
Solutions the resolution is unpredictable.
Donald Becker becker at scyld.com
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210 Second Generation Beowulf Clusters
Annapolis MD 21403 410-990-9993
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From becker at scyld.com Wed Jan 2 19:20:00 2002
From: becker at scyld.com (Donald Becker)
Date: Wed, 2 Jan 2002 19:20:00 -0500 (EST)
Subject: beowulf list still alive ?
In-Reply-To: <3C3201A60004923F@mta1n.bluewin.ch> (added by postmaster@bluewin.ch)
Message-ID:
On Wed, 2 Jan 2002, Per Jessen wrote:
> I haven't seen any mails from the list since Dec10 - also, I can't resolve
> www.beowulf.org to anything.
Yes, the list is still alive, but the nameserver problems with
beowulf.org are continuing.
For some people that means that messages from beowulf at beowulf.org are
dropped by spam filters. Unfortunately those are exactly the people
that should be getting this message...
Hopefully this will all be cleared up by tomorrow, although with Network
Solutions the resolution is unpredictable.
Donald Becker becker at scyld.com
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210 Second Generation Beowulf Clusters
Annapolis MD 21403 410-990-9993
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From senthilk at engin.umich.edu Wed Jan 2 20:53:04 2002
From: senthilk at engin.umich.edu (Senthil Kandasamy)
Date: Wed, 02 Jan 2002 20:53:04 -0500
Subject: Skyld Beowulf/ Diskless nodes /Installation trouble
Message-ID: <5.1.0.14.0.20020102205242.00a9e360@mail.alum.rpi.edu>
Hi Guys,
Hopefully someone can help me out.
First of all, I am a Chemical Engineer/Biophysicist who is fairly familiar
with linux.
I am trying to install/fix beowulf on a cluster recently purchased in our
research group.
This cluster was bought before I joined the group and scyld beowulf had
been installed on it (improperly).
Since no one else in our group was interested in parallel computing, no
body had noticed the fact that though one could send computational jobs to
the individual nodes, it could not handle parallel jobs on multiple nodes (
could not connect to host..is the error I get when I mpirun)
We have 1 master +15 diskless nodes, all dual processors.
The Scyld Beowulf (without the support, i.e. the $2 version) has been
installed on it.
However, I suspect that the NFS mounting of the individual nodes has not
been done correctly.
Since I do not have any documentation (could not find any on the
installation disk) on how to setup diskless nodes, I am kind of helpless.
The resources on the net and newsgroups have not been very helpful.
I tried to reinstall the skyld/redhat cd on the cluster, but the setup
process never really seems to be concerned about NFS mounting.
Once the set up is finished, the nodes are up and running and can handle
individual jobs using bpsh.
But I can never connect to the nodes when I try to run a parallel job using
mpirun.
Is there any definitive (and upto date) documentation/howto on how to
install a diskless beowulf cluster?
Any help would be greatly appreciated. It just kills me to ~30 GFlops just
sitting there unutilized while I try to find computer time on other
supercomputers.
Thanks.
Senthil
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rgb at phy.duke.edu Wed Jan 2 23:12:18 2002
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 2 Jan 2002 23:12:18 -0500 (EST)
Subject: beowulf list still alive ?
In-Reply-To: <3C3201A60004923F@mta1n.bluewin.ch> (added by postmaster@bluewin.ch)
Message-ID:
On Wed, 2 Jan 2002, Per Jessen wrote:
> All,
>
> I haven't seen any mails from the list since Dec10 - also, I can't resolve
> www.beowulf.org to anything.
>
> Anyone listening ?
Sure. I think we are waiting for Scyld's nameservice problems (caused
more by networksolutions.com's general incompetence than by anything
else) to resolve, maybe by tomorrow.
Then we'll likely all get a deluge of messages, and be properly grateful
for a two week hiatus, cleverly timed to run over the winter
holidays...;-)
rgb
>
> /Per
>
>
> regards,
> Per Jessen, Zurich
> http://www.enidan.com - home of the J1 serial console.
>
> Windows 2001: "I'm sorry Dave ... I'm afraid I can't do that."
>
>
>
>
> regards,
> Per Jessen, Zurich
> http://www.enidan.com - home of the J1 serial console.
>
> Windows 2001: "I'm sorry Dave ... I'm afraid I can't do that."
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rgb at phy.duke.edu Wed Jan 2 23:12:18 2002
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 2 Jan 2002 23:12:18 -0500 (EST)
Subject: beowulf list still alive ?
In-Reply-To: <3C3201A60004923F@mta1n.bluewin.ch> (added by postmaster@bluewin.ch)
Message-ID:
On Wed, 2 Jan 2002, Per Jessen wrote:
> All,
>
> I haven't seen any mails from the list since Dec10 - also, I can't resolve
> www.beowulf.org to anything.
>
> Anyone listening ?
Sure. I think we are waiting for Scyld's nameservice problems (caused
more by networksolutions.com's general incompetence than by anything
else) to resolve, maybe by tomorrow.
Then we'll likely all get a deluge of messages, and be properly grateful
for a two week hiatus, cleverly timed to run over the winter
holidays...;-)
rgb
>
> /Per
>
>
> regards,
> Per Jessen, Zurich
> http://www.enidan.com - home of the J1 serial console.
>
> Windows 2001: "I'm sorry Dave ... I'm afraid I can't do that."
>
>
>
>
> regards,
> Per Jessen, Zurich
> http://www.enidan.com - home of the J1 serial console.
>
> Windows 2001: "I'm sorry Dave ... I'm afraid I can't do that."
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From per at computer.org Wed Jan 2 17:25:17 2002
From: per at computer.org (Per Jessen)
Date: Wed, 02 Jan 2002 23:25:17 +0100
Subject: beowulf list still alive ?
Message-ID: <3C3201A60004923F@mta1n.bluewin.ch> (added by postmaster@bluewin.ch)
All,
I haven't seen any mails from the list since Dec10 - also, I can't resolve
www.beowulf.org to anything.
Anyone listening ?
/Per
regards,
Per Jessen, Zurich
http://www.enidan.com - home of the J1 serial console.
Windows 2001: "I'm sorry Dave ... I'm afraid I can't do that."
regards,
Per Jessen, Zurich
http://www.enidan.com - home of the J1 serial console.
Windows 2001: "I'm sorry Dave ... I'm afraid I can't do that."
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From per at computer.org Wed Jan 2 17:11:32 2002
From: per at computer.org (Per Jessen)
Date: Wed, 02 Jan 2002 23:11:32 +0100
Subject: beowulf list still alive ?
Message-ID: <3C30468300065EE4@mta13n.bluewin.ch> (added by postmaster@bluewin.ch)
All,
I haven't seen any mails from the list since Dec10 - also, I can't resolve
www.beowulf.org to anything.
Anyone listening ?
/Per
regards,
Per Jessen, Zurich
http://www.enidan.com - home of the J1 serial console.
Windows 2001: "I'm sorry Dave ... I'm afraid I can't do that."
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From pesch at attglobal.net Thu Jan 3 09:22:54 2002
From: pesch at attglobal.net (pesch at attglobal.net)
Date: Thu, 03 Jan 2002 09:22:54 -0500
Subject: beowulf list still alive ?
References:
Message-ID: <3C34693E.F8A2306F@attglobal.net>
I seem to get many mails double - like this one...
"Robert G. Brown" wrote:
> On Wed, 2 Jan 2002, Per Jessen wrote:
>
> > All,
> >
> > I haven't seen any mails from the list since Dec10 - also, I can't resolve
> > www.beowulf.org to anything.
> >
> > Anyone listening ?
>
> Sure. I think we are waiting for Scyld's nameservice problems (caused
> more by networksolutions.com's general incompetence than by anything
> else) to resolve, maybe by tomorrow.
>
> Then we'll likely all get a deluge of messages, and be properly grateful
> for a two week hiatus, cleverly timed to run over the winter
> holidays...;-)
>
> rgb
>
> >
> > /Per
> >
> >
> > regards,
> > Per Jessen, Zurich
> > http://www.enidan.com - home of the J1 serial console.
> >
> > Windows 2001: "I'm sorry Dave ... I'm afraid I can't do that."
> >
> >
> >
> >
> > regards,
> > Per Jessen, Zurich
> > http://www.enidan.com - home of the J1 serial console.
> >
> > Windows 2001: "I'm sorry Dave ... I'm afraid I can't do that."
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
> --
> Robert G. Brown http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From pesch at attglobal.net Thu Jan 3 09:22:54 2002
From: pesch at attglobal.net (pesch at attglobal.net)
Date: Thu, 03 Jan 2002 09:22:54 -0500
Subject: beowulf list still alive ?
References:
Message-ID: <3C34693E.F8A2306F@attglobal.net>
I seem to get many mails double - like this one...
"Robert G. Brown" wrote:
> On Wed, 2 Jan 2002, Per Jessen wrote:
>
> > All,
> >
> > I haven't seen any mails from the list since Dec10 - also, I can't resolve
> > www.beowulf.org to anything.
> >
> > Anyone listening ?
>
> Sure. I think we are waiting for Scyld's nameservice problems (caused
> more by networksolutions.com's general incompetence than by anything
> else) to resolve, maybe by tomorrow.
>
> Then we'll likely all get a deluge of messages, and be properly grateful
> for a two week hiatus, cleverly timed to run over the winter
> holidays...;-)
>
> rgb
>
> >
> > /Per
> >
> >
> > regards,
> > Per Jessen, Zurich
> > http://www.enidan.com - home of the J1 serial console.
> >
> > Windows 2001: "I'm sorry Dave ... I'm afraid I can't do that."
> >
> >
> >
> >
> > regards,
> > Per Jessen, Zurich
> > http://www.enidan.com - home of the J1 serial console.
> >
> > Windows 2001: "I'm sorry Dave ... I'm afraid I can't do that."
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
> --
> Robert G. Brown http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From haohe at me1.eng.wayne.edu Fri Jan 4 02:19:20 2002
From: haohe at me1.eng.wayne.edu (Hao He)
Date: Fri, 4 Jan 2002 2:19:20 -0500
Subject: Help needed: MPI problem
Message-ID: <200201040728.CAA25493@me1.eng.wayne.edu>
Hi, there.
Just installed mpich 1.2.0 and PGI compiler 3.2-4 on my cluster with RH7.1.
Each node has 2 Pentium III processors and 2 3Com 905 NICs.
When I running some simple program such as myname.c on the cluster, sometimes I got error messages look like following:
.....
My name is w3
p12_1012: p4_error: OOPS: semop lock failed
: 7241731
My name is w7
.....
And this may happen more tha one time in one testing.
Seems there is somthing wrong with the semaphore operations.
But how it comes and how to resolve this problem?
I appreciate your suggestion and help.
Best regards,
Hao He
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From gropp at mcs.anl.gov Fri Jan 4 08:52:26 2002
From: gropp at mcs.anl.gov (William Gropp)
Date: Fri, 04 Jan 2002 07:52:26 -0600
Subject: Help needed: MPI problem
In-Reply-To: <200201040728.CAA25493@me1.eng.wayne.edu>
Message-ID: <5.1.0.14.2.20020104074905.04965dc0@localhost>
At 02:19 AM 1/4/2002 -0500, Hao He wrote:
>Hi, there.
>
>Just installed mpich 1.2.0 and PGI compiler 3.2-4 on my cluster with RH7.1.
>Each node has 2 Pentium III processors and 2 3Com 905 NICs.
>When I running some simple program such as myname.c on the cluster,
>sometimes I got error messages look like following:
>
>.....
>My name is w3
>p12_1012: p4_error: OOPS: semop lock failed
>: 7241731
>My name is w7
>.....
Bug reports about MPICH should be sent to mpi-maint at mcs.anl.gov.
To answer your question, this means that MPICH received an error return
when attempting to use a semaphore. This could be because the pool of
available semaphores has been exhausted; you can try the command
mpich/sbin/cleanipcs to free them. You should also update to the most
recent version of MPICH, 1.2.2.3 .
Bill
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From sshealy at asgnet.psc.sc.edu Fri Jan 4 16:21:39 2002
From: sshealy at asgnet.psc.sc.edu (Scott Shealy)
Date: Fri, 4 Jan 2002 16:21:39 -0500
Subject: Network Charteristics and Applications
Message-ID:
I know the question I am about to ask has the usual answer "It depends" ....
but I was hoping that some people could answer with some specific experiences.
The Question... Which specific parallel applications/algorithms/problem
classes benefit significantly from bandwidth increases,decreased network
latency or a combination of both?
I am trying to classify applications/algorithms/problem classes by wether
they would benefit significantly from network upgrades with the following
options
100MB switched running TCP/IP
n channel-bonded 100MB switched running TCP/IP
Gigabit ethernet switched running TCP/IP
Myrinet.
I know the first three solutions don't offer much in the way of latency
improvements. So..
Which applications/algorithms/problem classes scale no matter what.?
Which applications/algorithms/problem classes scale with increased bandwidth?
Which applications/algorithms/problem classes scale with decreased latency?
Which applications/algorithms/problem classes scale with decreased latency
and increased bandwidth?
I have read alot of theory on this and read a bunch of stuff by
Foster..etc.... and can make statements like an application that uses
frequent short messages will benefit from a lower latency network. I can
write formulas that use Tc and Ts and pull out ratios out the wazoo... But I
am looking for real world applications that people have experience with and
can share details with or infer some of the ratios for me...
Also if you know of applications/algorithms/problem classes that don't scale
at all that would be of interest too...
Also if anyone knows a general reference that discusses this... that would be
of great interest too.
Thanks for any experience you are willing to share...
Scott Shealy
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From lindahl at conservativecomputer.com Fri Jan 4 17:01:53 2002
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Fri, 4 Jan 2002 17:01:53 -0500
Subject: Network Charteristics and Applications
In-Reply-To: ; from sshealy@asgnet.psc.sc.edu on Fri, Jan 04, 2002 at 04:21:39PM -0500
References:
Message-ID: <20020104170153.A11190@wumpus.foo>
Yea! beowulf.org is back!
> The Question... Which specific parallel applications/algorithms/problem
> classes benefit significantly from bandwidth increases,decreased network
> latency or a combination of both?
Here are some gross generalizations that might help:
With most algorithms, the less data per cpu, the more bandwidth and
latency count. So runs with lots of cpus, or smaller datasets, are
harder.
Example: Climate modeling generally involves running a relatively
coarse grid for a large number of timesteps. It's hard to get a good
speedup unless you have a really great machine, and so there was some
bruhaha recently about how the US needed to buy (Japanese) vector
machines for this problem. (However, I don't think this is the case,
the climate people simply need to use best practices with MPI.)
Example: QCD, quantum chromodynamics. QCD computes on a 4 dimensional
grid. Sometimes people want to compute large grids, sometimes
small. Less data on a node means relatively more communications and
lower required latencies. Steve Gottlieb has a theoretical slide
demonstrating this:
http://physics.indiana.edu/~sg/utah/performance_model.html
If you want to build a QCD machine that sustains 10 TFlop/s over a
wide range of grid sizes, this is a hard problem. For example, if I
have a 200 MF/s sustained processor, I can get to a local grid size of
4^4 using Myrinet and 12^4 using fast ethernet. 12^4 is so large of a
grid that it isn't so useful for fast computations.
Example: Weather forecasting. Similar to climate, but there are
multiple kinds of forecasts: regional, national, global, each with
more data. The regional forecast is *hardest* to speed up because it
has the least data. You can get a speedup of say 8x today with fast
ethernet before you hit a wall. But if you're doing global forecasts,
you can get much bigger. The 10x number comes from an experiment that
the Utah people did for their upcoming Olympic forecasts. Meanwhile,
while doing the FSL bid, I computed that an extra 100 usec of latency
wouldn't hurt their 40km national forecast at all, and the average
bandwidth needed was 1/3 gigabit/sec, at 40-odd cpus.
2) With other algorithms, the range of data sizes people want to use
is in a fairly linear area of performance on some hardware. One
example of this is CHARMM on the Cray T3E, which has a great
interconnect (and a slow processor) by today's standards.
I actually built a little tool using the MPI profiling interface which
does some gross computations of compute/comm ratios. I'd like to turn
it into a tool usable by the community; would anyone like to volunteer
to help? With such a tool you could take existing MPI codes and find
out how they behave in practice.
greg
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From steve at helix.nih.gov Sat Jan 5 13:10:30 2002
From: steve at helix.nih.gov (Steve Fellini)
Date: Sat, 5 Jan 2002 13:10:30 -0500 (EST)
Subject: charmm scalability on 2.4 kernels
Message-ID: <200201051810.g05IAU65015061@helix.nih.gov>
Hi All,
Many of our users run charmm dynamics using Ewald/PME calculations,
which require 3D FFTs. When run in parallel on a cluster, these
calculations scale moderately well to around 2-4 nodes (4-8
processors) when run on either nodes on a Myrinet network, or on nodes
running on fast ethernet with Linux 2.2.16 kernels with Josip
Loncaric's tcp fix (see
http://biowulf.nih.gov/charmm-bench/image003.gif).
However for nodes running on ethernet with Linux 2.4.12 (i.e., without
the tcp fix) scalability is very poor - so much so that there's no
point in running on more than one node (see
http://biowulf.nih.gov/charmm-bench/image004.gif and image005.gif).
Has anyone successfully configured/tuned the 2.4 kernel to improve
scalability of parallel jobs with non-trivial communications?
Thanks,
Steve
--
Steven Fellini
Center for Information Technology
National Institutes of Health
steven.fellini at nih.gov
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From erayo at cs.bilkent.edu.tr Sat Jan 5 16:55:42 2002
From: erayo at cs.bilkent.edu.tr (Eray Ozkural (exa))
Date: Sat, 5 Jan 2002 23:55:42 +0200
Subject: charmm scalability on 2.4 kernels
In-Reply-To: <200201051810.g05IAU65015061@helix.nih.gov>
References: <200201051810.g05IAU65015061@helix.nih.gov>
Message-ID:
On Saturday 05 January 2002 20:10, Steve Fellini wrote:
> Hi All,
>
> Many of our users run charmm dynamics using Ewald/PME calculations,
> which require 3D FFTs. When run in parallel on a cluster, these
> calculations scale moderately well to around 2-4 nodes (4-8
> processors) when run on either nodes on a Myrinet network, or on nodes
> running on fast ethernet with Linux 2.2.16 kernels with Josip
> Loncaric's tcp fix (see
> http://biowulf.nih.gov/charmm-bench/image003.gif).
>
> However for nodes running on ethernet with Linux 2.4.12 (i.e., without
> the tcp fix) scalability is very poor - so much so that there's no
> point in running on more than one node (see
> http://biowulf.nih.gov/charmm-bench/image004.gif and image005.gif).
>
This is rather surprising, I'd expect 2.4.x to actually improve upon 2.2.x.
It was said before on this list that 2.4 already incorporates TCP fixes. Has
anybody met a similar situation before?
Thanks,
--
Eray Ozkural (exa)
Comp. Sci. Dept., Bilkent University, Ankara
www: http://www.cs.bilkent.edu.tr/~erayo
GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From ostampflee at mac.com Sun Jan 6 01:21:04 2002
From: ostampflee at mac.com (Owen Stampflee)
Date: Sat, 5 Jan 2002 22:21:04 -0800
Subject: Beowulf Application Frameworks
Message-ID: <8FE91D86-026D-11D6-8C2C-00039357A560@mac.com>
Hi All,
I am new comer to the Beowulf world and am building a small 5 node
cluster using Macintosh LCIIs running NetBSD. The network consists of a
10BASE-T shared-media. I dont expect much for performance but this is
just for fun.
I am looking for some theory behind message passing and load balancing
and general Beowulf application framework information. I intend to
develop applications in C++ if that makes any difference.
Thanks,
Owen Stampflee
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rgb at phy.duke.edu Sun Jan 6 08:21:10 2002
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sun, 6 Jan 2002 08:21:10 -0500 (EST)
Subject: Beowulf Application Frameworks
In-Reply-To: <8FE91D86-026D-11D6-8C2C-00039357A560@mac.com>
Message-ID:
On Sat, 5 Jan 2002, Owen Stampflee wrote:
> Hi All,
>
> I am new comer to the Beowulf world and am building a small 5 node
> cluster using Macintosh LCIIs running NetBSD. The network consists of a
> 10BASE-T shared-media. I dont expect much for performance but this is
> just for fun.
>
> I am looking for some theory behind message passing and load balancing
> and general Beowulf application framework information. I intend to
> develop applications in C++ if that makes any difference.
There are various resources on http://www.phy.duke.edu/brahma, including
talks and an online book on cluster computing (that includes some theory
chapters). There are also links to many other useful sites, especially
the primary beowulf website (www.beowulf.org), the beowulf underground
site, repositories (netlib), and an online book on parallel computing.
Finally, there are some excellent texts out there on parallel computing
itself, and of course the book "How to build a beowulf cluster" by the
builders of the original beowulf cluster.
I'm not exactly sure what you are looking for, but hopefully you'll find
some of the above useful. If not, ask the list again with a bit more
detail -- that's why it's here;-)
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From mudguy at speedfactory.net Sun Jan 6 12:35:11 2002
From: mudguy at speedfactory.net (Sam Harper)
Date: Sun, 6 Jan 2002 12:35:11 -0500
Subject: Firewire/IEEE-1394
Message-ID:
Has anybody considered or implemented the use of IEEE-1394 as a network
adapter? Will anybody consider when the revised 1394b is implemented?
-Sam Harper
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rlatham at plogic.com Sun Jan 6 17:44:50 2002
From: rlatham at plogic.com (Rob Latham)
Date: Sun, 6 Jan 2002 17:44:50 -0500
Subject: charmm scalability on 2.4 kernels
In-Reply-To: ; from erayo@cs.bilkent.edu.tr on Sat, Jan 05, 2002 at 11:55:42PM +0200
References: <200201051810.g05IAU65015061@helix.nih.gov>
Message-ID: <20020106174450.C22936@otto.plogic.internal>
On Sat, Jan 05, 2002 at 11:55:42PM +0200, Eray Ozkural (exa) wrote:
> On Saturday 05 January 2002 20:10, Steve Fellini wrote:
> This is rather surprising, I'd expect 2.4.x to actually improve upon 2.2.x.
> It was said before on this list that 2.4 already incorporates TCP fixes. Has
> anybody met a similar situation before?
it was said before that the tcp fixes don't make any difference.
with 2.2 you'd see some fast times and then a few really really high
times. with 2.4 you get really consitent times that are a bit slower
than the times you'd get with 2.2 + the tcp fixes.
==rob
--
Rob Latham
Paralogic Inc.
EAE8 DE90 85BB 526F 3181
1FCF 51C4 B6CB 08CC 0897
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From klight at appliedthermalsciences.com Mon Jan 7 09:55:03 2002
From: klight at appliedthermalsciences.com (Ken Light)
Date: Mon, 7 Jan 2002 09:55:03 -0500
Subject: Network Charteristics and Applications
Message-ID:
Along these lines, any comments on the performance/scalability of CFD codes?
-----Original Message-----
From: Greg Lindahl [mailto:lindahl at conservativecomputer.com]
Sent: Friday, January 04, 2002 5:02 PM
To: beowulf at beowulf.org
Subject: Re: Network Charteristics and Applications
Yea! beowulf.org is back!
> The Question... Which specific parallel applications/algorithms/problem
> classes benefit significantly from bandwidth increases,decreased network
> latency or a combination of both?
Here are some gross generalizations that might help:
With most algorithms, the less data per cpu, the more bandwidth and
latency count. So runs with lots of cpus, or smaller datasets, are
harder.
Example: Climate modeling generally involves running a relatively
coarse grid for a large number of timesteps. It's hard to get a good
speedup unless you have a really great machine, and so there was some
bruhaha recently about how the US needed to buy (Japanese) vector
machines for this problem. (However, I don't think this is the case,
the climate people simply need to use best practices with MPI.)
Example: QCD, quantum chromodynamics. QCD computes on a 4 dimensional
grid. Sometimes people want to compute large grids, sometimes
small. Less data on a node means relatively more communications and
lower required latencies. Steve Gottlieb has a theoretical slide
demonstrating this:
http://physics.indiana.edu/~sg/utah/performance_model.html
If you want to build a QCD machine that sustains 10 TFlop/s over a
wide range of grid sizes, this is a hard problem. For example, if I
have a 200 MF/s sustained processor, I can get to a local grid size of
4^4 using Myrinet and 12^4 using fast ethernet. 12^4 is so large of a
grid that it isn't so useful for fast computations.
Example: Weather forecasting. Similar to climate, but there are
multiple kinds of forecasts: regional, national, global, each with
more data. The regional forecast is *hardest* to speed up because it
has the least data. You can get a speedup of say 8x today with fast
ethernet before you hit a wall. But if you're doing global forecasts,
you can get much bigger. The 10x number comes from an experiment that
the Utah people did for their upcoming Olympic forecasts. Meanwhile,
while doing the FSL bid, I computed that an extra 100 usec of latency
wouldn't hurt their 40km national forecast at all, and the average
bandwidth needed was 1/3 gigabit/sec, at 40-odd cpus.
2) With other algorithms, the range of data sizes people want to use
is in a fairly linear area of performance on some hardware. One
example of this is CHARMM on the Cray T3E, which has a great
interconnect (and a slow processor) by today's standards.
I actually built a little tool using the MPI profiling interface which
does some gross computations of compute/comm ratios. I'd like to turn
it into a tool usable by the community; would anyone like to volunteer
to help? With such a tool you could take existing MPI codes and find
out how they behave in practice.
greg
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From lindahl at conservativecomputer.com Mon Jan 7 10:24:11 2002
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Mon, 7 Jan 2002 10:24:11 -0500
Subject: Network Charteristics and Applications
In-Reply-To: ; from klight@appliedthermalsciences.com on Mon, Jan 07, 2002 at 09:55:03AM -0500
References:
Message-ID: <20020107102411.A22897@wumpus.foo>
On Mon, Jan 07, 2002 at 09:55:03AM -0500, Ken Light wrote:
> Along these lines, any comments on the performance/scalability of CFD codes?
2 of the 3 examples I gave were CFD: climate and weather.
My CFD customers often have huge datasets (10 Gbytes, say), and those
scale pretty big (50-100 nodes) with just fast ethernet. The weather
datasets are in the 60 mbyte - 1 gbyte range.
greg
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From tru at pasteur.fr Mon Jan 7 13:22:39 2002
From: tru at pasteur.fr (Tru)
Date: Mon, 7 Jan 2002 19:22:39 +0100
Subject: charmm scalability on 2.4 kernels
In-Reply-To: <20020106174450.C22936@otto.plogic.internal>; from rlatham@plogic.com on Sun, Jan 06, 2002 at 05:44:50PM -0500
References: <200201051810.g05IAU65015061@helix.nih.gov> <20020106174450.C22936@otto.plogic.internal>
Message-ID: <20020107192239.A4682@xiii.bis.pasteur.fr>
On Sun, Jan 06, 2002 at 05:44:50PM -0500, Rob Latham wrote:
> On Sat, Jan 05, 2002 at 11:55:42PM +0200, Eray Ozkural (exa) wrote:
> > On Saturday 05 January 2002 20:10, Steve Fellini wrote:
>
> > This is rather surprising, I'd expect 2.4.x to actually improve upon 2.2.x.
> > It was said before on this list that 2.4 already incorporates TCP fixes. Has
> > anybody met a similar situation before?
>
> it was said before that the tcp fixes don't make any difference.
>
I think the bad speedup comes from dual VS single cpu nodes
regarding parallel behaviour of CHARMM.
YMMV but on single cpu athlon nodes over fast ethernet,
here is what we have:
#cpus speedup (elapsed time)
2 1.2
4 2.0
8 3.3
We still gain something although it is not good!
Best regards,
Tru
more details follow:
---------------------
100 steps run
CHARMM c27b4
LAM MPI 6.5.4 (recompiled)
redhat 7.1xfs (kernel 2.4.9-13SGI_XFS_1.0.2)
single cpu node 1.2GHz (9x133) athlon cpu
fast ethernet 3com 3c905C-TX
stock gnu compilers
k7-9x133_c27b4-lam-6.5.4_milan_pme.07-01_16h53.out-x1:
Parallel load balance (sec.):
Node Eext Eint Wait Comm List Integ Total
0 134.1 137.1 0.0 0.0 60.6 5.0 336.8
$$$$$ JOB ACCOUNTING INFORMATION $$$$$
ELAPSED TIME: 5.80 MINUTES
CPU TIME: 5.81 MINUTES
k7-9x133_c27b4-lam-6.5.4_milan_pme.07-01_15h01.out-x2:
PARALLEL> Average timing for all nodes:
Node Eext Eint Wait Comm List Integ Total
2 66.1 157.1 0.0 4.9 31.0 2.9 261.9
$$$$$ JOB ACCOUNTING INFORMATION $$$$$
ELAPSED TIME: 4.50 MINUTES
CPU TIME: 3.17 MINUTES
k7-9x133_c27b4-lam-6.5.4_milan_pme.07-01_16h16.out-x2:
PARALLEL> Average timing for all nodes:
Node Eext Eint Wait Comm List Integ Total
2 66.1 157.0 0.0 4.9 33.8 2.9 264.7
$$$$$ JOB ACCOUNTING INFORMATION $$$$$
ELAPSED TIME: 4.60 MINUTES
CPU TIME: 3.26 MINUTES
k7-9x133_c27b4-lam-6.5.4_milan_pme.07-01_14h38.out-x4:
PARALLEL> Average timing for all nodes:
Node Eext Eint Wait Comm List Integ Total
4 33.4 103.6 0.0 7.1 18.1 1.6 163.8
$$$$$ JOB ACCOUNTING INFORMATION $$$$$
ELAPSED TIME: 2.90 MINUTES
CPU TIME: 1.80 MINUTES
k7-9x133_c27b4-lam-6.5.4_milan_pme.07-01_15h53.out-x4:
PARALLEL> Average timing for all nodes:
Node Eext Eint Wait Comm List Integ Total
4 33.4 102.9 0.0 7.3 18.0 1.6 163.1
$$$$$ JOB ACCOUNTING INFORMATION $$$$$
ELAPSED TIME: 2.88 MINUTES
CPU TIME: 1.80 MINUTES
k7-9x133_c27b4-lam-6.5.4_milan_pme.07-01_14h22.out-x8:
PARALLEL> Average timing for all nodes:
Node Eext Eint Wait Comm List Integ Total
8 17.0 58.5 0.1 8.2 10.8 0.9 95.5
$$$$$ JOB ACCOUNTING INFORMATION $$$$$
ELAPSED TIME: 1.75 MINUTES
CPU TIME: 1.00 MINUTES
k7-9x133_c27b4-lam-6.5.4_milan_pme.07-01_15h38.out-x8:
PARALLEL> Average timing for all nodes:
Node Eext Eint Wait Comm List Integ Total
8 17.0 58.0 0.1 9.4 10.8 1.0 96.2
$$$$$ JOB ACCOUNTING INFORMATION $$$$$
ELAPSED TIME: 1.75 MINUTES
CPU TIME: 1.03 MINUTES
--
Dr Tru Huynh | http://www.pasteur.fr/recherche/unites/Binfs/
mailto:tru at pasteur.fr | tel/fax +33 1 45 68 87 37/19
Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris CEDEX 15 France
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From fant at vpharm.com Mon Jan 7 13:41:25 2002
From: fant at vpharm.com (Andrew Fant)
Date: Mon, 07 Jan 2002 13:41:25 -0500 (EST)
Subject: aggregate cluster stats
Message-ID:
I am currently running a 118 processor cluster, with bigbrother and larrd to monitor
system status and gather performance and utilization data on a node by node basis.
However, my management is now requesting aggregate statistics, and a web page
showing load, etc, across the entire cluster.
Has anybody hacked something like this themselves? I would rather stick close to
bigbrother and larrd, just to simplify implementation, but I have been playing with
SGI's open source release of PCP, and I am not adverse to switching to another
(free) solution if it can simplify the process.
Thanks for any suggestions,
Andy
--
Andrew Fant | | email: fant at vrtx.com
HPC Geek | | phone: (617)444-6100
Vertex Pharmaceuticals| Disclaimer: Who would be crazy |
Cambridge, MA 02139 | enough to claim these opinions? | ICBM: 42.35N 71.09W
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From lindahl at conservativecomputer.com Mon Jan 7 14:01:34 2002
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Mon, 7 Jan 2002 14:01:34 -0500
Subject: charmm scalability on 2.4 kernels
In-Reply-To: <20020107192239.A4682@xiii.bis.pasteur.fr>; from tru@pasteur.fr on Mon, Jan 07, 2002 at 07:22:39PM +0100
References: <200201051810.g05IAU65015061@helix.nih.gov> <20020106174450.C22936@otto.plogic.internal> <20020107192239.A4682@xiii.bis.pasteur.fr>
Message-ID: <20020107140134.A23200@wumpus.foo>
On Mon, Jan 07, 2002 at 07:22:39PM +0100, Tru wrote:
> I think the bad speedup comes from dual VS single cpu nodes
> regarding parallel behaviour of CHARMM.
If so, that's easy enough to check: You can run only one process on a
dual cpu node, for benchmarking purposes. I don't think that's his
problem, though.
greg
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From jon at minotaur.com Mon Jan 7 14:36:42 2002
From: jon at minotaur.com (Jon E. Mitchiner)
Date: Mon, 7 Jan 2002 14:36:42 -0500
Subject: Shared diskspace between nodes
Message-ID: <042a01c197b2$a2165530$0302a8c0@Roaming>
Greetings!
I presently run a 40-node cluster, Dual 1GHz with 20GB hard drive on each
system. This gives me roughly 15GB (safe estimate) after the OS, installed
programs, some data, etc on each machine. This gives me roughly 600GB of
space that I am not currently utilizing on 40 nodes.
Right now, we are saving data on various nodes, and moving it around when
space gets tight on a machine. This is getting time consuming as some of us
have to look on different nodes to find out where your data is currently
residing. I am considering saving all directory names in a database and
then making a GUI interface via the web so its easy to find the location of
data directories, rather than looking for it (especially if someone moved my
directory to another machine without letting me know).
I am curious if there is a program out there that might be able to utilize
the space that we are not utilizing -- such as linking the file space
between nodes so that way I can set up a "large" data partition sharable by
all nodes. Some redunancy would be nice. Im curious if there is a software
solution (either GPL licensed, or commercial) to utilize the space better.
Optimally, it would be nice to see all "shared" drives as one large
partition to be mounted to all nodes and all the data is handled by a daemon
or something like that.
Does anyone have any ideas, suggestions, or programs that might be able to
do something similar?
Thanks!
Regards,
Jon E. Mitchiner
Minotaur Technologies
http://www.minotaur.com
AOL IM [http://www.aol.com/aim] MinotaurT
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rgb at phy.duke.edu Mon Jan 7 14:40:16 2002
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 7 Jan 2002 14:40:16 -0500 (EST)
Subject: aggregate cluster stats
In-Reply-To:
Message-ID:
On Mon, 7 Jan 2002, Andrew Fant wrote:
>
> I am currently running a 118 processor cluster, with bigbrother and larrd to monitor
> system status and gather performance and utilization data on a node by node basis.
> However, my management is now requesting aggregate statistics, and a web page
> showing load, etc, across the entire cluster.
>
> Has anybody hacked something like this themselves? I would rather stick close to
> bigbrother and larrd, just to simplify implementation, but I have been playing with
> SGI's open source release of PCP, and I am not adverse to switching to another
> (free) solution if it can simplify the process.
You can look at procstatd, available on http://www.phy.duke.edu/brahma.
The GUI it comes with is clunky and will soon be replaced, but the
daemon itself returns an easily parsed ascii packet, and you can chop it
up and turn it into anything you like -- realtime web page, GUI, ascii
report, graph(s).
I've been working hard on procstatd 2.0 for the last week (and hope to
finish it over the next week). This is a near-complete rewrite that
will make it easier to parse fields, easier to add new statistics to
monitor (from /proc or elsewhere), possible to add job-specific
tracking, and possible to request a raw dump of any /proc file. (Now it
only delivers "cooked" statistics but I'm adding a command that should
allow raw proc files or lines from files to be monitored).
Accompanying the new procstatd should be a new GUI (gwatchman) in native
C and GTk that will use the new simpler and more consistent field
schema. But be warned that although I have a functioning shell of
gwatchman cut, the guts will have to await the new procstatd API
(which it largely inspired). It might be a few weeks before all this is
out of pre-alpha.
rgb
>
> Thanks for any suggestions,
> Andy
>
>
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From siegert at sfu.ca Mon Jan 7 14:47:11 2002
From: siegert at sfu.ca (Martin Siegert)
Date: Mon, 7 Jan 2002 11:47:11 -0800
Subject: aggregate cluster stats
In-Reply-To: ; from fant@vpharm.com on Mon, Jan 07, 2002 at 01:41:25PM -0500
References:
Message-ID: <20020107114711.A17896@stikine.ucs.sfu.ca>
On Mon, Jan 07, 2002 at 01:41:25PM -0500, Andrew Fant wrote:
>
> I am currently running a 118 processor cluster, with bigbrother and larrd to monitor
> system status and gather performance and utilization data on a node by node basis.
> However, my management is now requesting aggregate statistics, and a web page
> showing load, etc, across the entire cluster.
>
> Has anybody hacked something like this themselves? I would rather stick close to
> bigbrother and larrd, just to simplify implementation, but I have been playing with
> SGI's open source release of PCP, and I am not adverse to switching to another
> (free) solution if it can simplify the process.
I am doing this using bigbrother and larrd. You only need to change the script
that calculates the load (probably bb-local.sh) to use ruptime instead of
uptime and then simply add up the numbers. I don't know whether you want to
to fancier things, but I just have a bigbrother client running on the master
node and collect all data from the slave nodes using r-commands. The results
are sent to the webserver.
Martin
========================================================================
Martin Siegert
Academic Computing Services phone: (604) 291-4691
Simon Fraser University fax: (604) 291-4242
Burnaby, British Columbia email: siegert at sfu.ca
Canada V5A 1S6
========================================================================
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From pdiaz88 at terra.es Mon Jan 7 16:33:02 2002
From: pdiaz88 at terra.es (Pedro =?iso-8859-1?q?D=EDaz=20Jim=E9nez?=)
Date: Mon, 7 Jan 2002 21:33:02 +0000
Subject: Shared diskspace between nodes
In-Reply-To: <042a01c197b2$a2165530$0302a8c0@Roaming>
References: <042a01c197b2$a2165530$0302a8c0@Roaming>
Message-ID: <02010721330203.02888@duero>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Maybe the Parallel Virtual Filesystem could be a good start:
http://parlweb.parl.clemson.edu/pvfs/
- From my experience it's pretty stable. There are others, like GFS (
http://www.sistina.com/products_gfs.htm) but I have not tried them yet
Cheers
Pedro
On Monday 07 January 2002 19:36, Jon E. Mitchiner wrote:
> Greetings!
>
> I presently run a 40-node cluster, Dual 1GHz with 20GB hard drive on each
> system. This gives me roughly 15GB (safe estimate) after the OS, installed
> programs, some data, etc on each machine. This gives me roughly 600GB of
> space that I am not currently utilizing on 40 nodes.
>
> Right now, we are saving data on various nodes, and moving it around when
> space gets tight on a machine. This is getting time consuming as some of
> us have to look on different nodes to find out where your data is currently
> residing. I am considering saving all directory names in a database and
> then making a GUI interface via the web so its easy to find the location of
> data directories, rather than looking for it (especially if someone moved
> my directory to another machine without letting me know).
>
> I am curious if there is a program out there that might be able to utilize
> the space that we are not utilizing -- such as linking the file space
> between nodes so that way I can set up a "large" data partition sharable by
> all nodes. Some redunancy would be nice. Im curious if there is a
> software solution (either GPL licensed, or commercial) to utilize the space
> better.
>
> Optimally, it would be nice to see all "shared" drives as one large
> partition to be mounted to all nodes and all the data is handled by a
> daemon or something like that.
>
> Does anyone have any ideas, suggestions, or programs that might be able to
> do something similar?
>
> Thanks!
>
> Regards,
>
> Jon E. Mitchiner
> Minotaur Technologies
> http://www.minotaur.com
> AOL IM [http://www.aol.com/aim] MinotaurT
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
- --
/*
* Pedro Diaz Jimenez: pdiaz88 at terra.es, pdiaz at acm.asoc.fi.upm.es
* http://acm.asoc.fi.upm.es/~pdiaz
*
* GPG KeyID: E118C651
* Fingerprint: 1FD9 163B 649C DDDC 422D 5E82 9EEE 777D E118 C65
*
*/
- --
"Make crime pay. Become a Lawyer."
-- Will Rogers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org
iD8DBQE8OhQTnu53feEYxlERAv0QAKCkZ7yzFC30C07Dm+liS6kH7IgI6QCfYkk/
e0+AUXpXFZDJ94DcwwmDQxo=
=C84r
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From math at velocet.ca Mon Jan 7 15:35:35 2002
From: math at velocet.ca (Velocet)
Date: Mon, 7 Jan 2002 15:35:35 -0500
Subject: Shared diskspace between nodes
In-Reply-To: <042a01c197b2$a2165530$0302a8c0@Roaming>; from jon@minotaur.com on Mon, Jan 07, 2002 at 02:36:42PM -0500
References: <042a01c197b2$a2165530$0302a8c0@Roaming>
Message-ID: <20020107153535.E73014@velocet.ca>
On Mon, Jan 07, 2002 at 02:36:42PM -0500, Jon E. Mitchiner's all...
> Greetings!
>
> I presently run a 40-node cluster, Dual 1GHz with 20GB hard drive on each
> system. This gives me roughly 15GB (safe estimate) after the OS, installed
> programs, some data, etc on each machine. This gives me roughly 600GB of
> space that I am not currently utilizing on 40 nodes.
>
> Right now, we are saving data on various nodes, and moving it around when
> space gets tight on a machine. This is getting time consuming as some of us
> have to look on different nodes to find out where your data is currently
> residing. I am considering saving all directory names in a database and
> then making a GUI interface via the web so its easy to find the location of
> data directories, rather than looking for it (especially if someone moved my
> directory to another machine without letting me know).
>
> I am curious if there is a program out there that might be able to utilize
> the space that we are not utilizing -- such as linking the file space
> between nodes so that way I can set up a "large" data partition sharable by
> all nodes. Some redunancy would be nice. Im curious if there is a software
> solution (either GPL licensed, or commercial) to utilize the space better.
>
> Optimally, it would be nice to see all "shared" drives as one large
> partition to be mounted to all nodes and all the data is handled by a daemon
> or something like that.
>
> Does anyone have any ideas, suggestions, or programs that might be able to
> do something similar?
raid 5 over nbd! :)
not sure how practical that is, but it works in theory.
/kc
>
> Thanks!
>
> Regards,
>
> Jon E. Mitchiner
> Minotaur Technologies
> http://www.minotaur.com
> AOL IM [http://www.aol.com/aim] MinotaurT
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
--
Ken Chase, math at velocet.ca * Velocet Communications Inc. * Toronto, CANADA
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From pdiaz88 at terra.es Mon Jan 7 17:12:10 2002
From: pdiaz88 at terra.es (Pedro =?iso-8859-1?q?D=EDaz=20Jim=E9nez?=)
Date: Mon, 7 Jan 2002 22:12:10 +0000
Subject: Shared diskspace between nodes
In-Reply-To: <20020107153535.E73014@velocet.ca>
References: <042a01c197b2$a2165530$0302a8c0@Roaming> <20020107153535.E73014@velocet.ca>
Message-ID: <02010722121004.02888@duero>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I've tried it...worked fine for trivial operations, but under heavy use was
prone to locks...that was some time ago (maybe 6 months), so it could be
better now
Cheers
Pedro
On Monday 07 January 2002 20:35, Velocet wrote:
> On Mon, Jan 07, 2002 at 02:36:42PM -0500, Jon E. Mitchiner's all...
>
> > Greetings!
> >
> > I presently run a 40-node cluster, Dual 1GHz with 20GB hard drive on each
> > system. This gives me roughly 15GB (safe estimate) after the OS,
> > installed programs, some data, etc on each machine. This gives me
> > roughly 600GB of space that I am not currently utilizing on 40 nodes.
> >
> > Right now, we are saving data on various nodes, and moving it around when
> > space gets tight on a machine. This is getting time consuming as some of
> > us have to look on different nodes to find out where your data is
> > currently residing. I am considering saving all directory names in a
> > database and then making a GUI interface via the web so its easy to find
> > the location of data directories, rather than looking for it (especially
> > if someone moved my directory to another machine without letting me
> > know).
> >
> > I am curious if there is a program out there that might be able to
> > utilize the space that we are not utilizing -- such as linking the file
> > space between nodes so that way I can set up a "large" data partition
> > sharable by all nodes. Some redunancy would be nice. Im curious if
> > there is a software solution (either GPL licensed, or commercial) to
> > utilize the space better.
> >
> > Optimally, it would be nice to see all "shared" drives as one large
> > partition to be mounted to all nodes and all the data is handled by a
> > daemon or something like that.
> >
> > Does anyone have any ideas, suggestions, or programs that might be able
> > to do something similar?
>
> raid 5 over nbd! :)
>
> not sure how practical that is, but it works in theory.
>
> /kc
>
> > Thanks!
> >
> > Regards,
> >
> > Jon E. Mitchiner
> > Minotaur Technologies
> > http://www.minotaur.com
> > AOL IM [http://www.aol.com/aim] MinotaurT
> >
> >
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
- --
/*
* Pedro Diaz Jimenez: pdiaz88 at terra.es, pdiaz at acm.asoc.fi.upm.es
* http://acm.asoc.fi.upm.es/~pdiaz
*
* GPG KeyID: E118C651
* Fingerprint: 1FD9 163B 649C DDDC 422D 5E82 9EEE 777D E118 C65
*
*/
- --
If one morning I walked on top of the water across the Potomac River, the
headline that afternoon would read: PRESIDENT CAN'T SWIM.
-- Lyndon B. Johnson
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org
iD8DBQE8Oh1Bnu53feEYxlERAn+jAKDMn7q+FIvObokcbCPYMOSRcSpKIACcDWaq
rwHf9upNW4ux8SEa1onpVtc=
=yVhV
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From lindahl at conservativecomputer.com Mon Jan 7 16:22:21 2002
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Mon, 7 Jan 2002 16:22:21 -0500
Subject: BWBUG meeting Tue, January 8
Message-ID: <20020107162221.A23510@wumpus.foo>
The next BWBUG (Baltimore/Washington Beowulf Users Group) meeting is
Tuesday January 8 (that's tomororw), 2002, at 3pm, in the Logicon
Greenbelt auditorium.
The speaker will be Bill Carlson, and the topic is:
Unified Parallel C (UPC), and Beowulf Clusters
Since Bill didn't give me an abstract, I took the liberty of writing
one:
Unified Parallel C (UPC) is a compiler-based approach to explicit
parallel programming. Unlike most parallel programming languages and
libraries, the syntax for UPC's additions to C fit on a business
card. The underlying machine model exploited by UPC is termed SALC:
Shared Address, Local Consistency. This is exactly what the Cray T3E
provided. However, modern cluster interconnects such as Quadrics,
Myrinet, and even Ethernet with MVIA drivers, offer the capability of
supporting UPC, and in fact Compaq supports UPC on their Alpha SC
clusters.
See http://bwbug.org/ for directions and links to more info about UPC.
You can also sign up for a mailing list to hear about future meetings.
Our February meeting will be Tuesday Feb 12, at 3pm.
If you'd like to attend the meeting via videoconference, please
contact me.
-- greg (lindahl at conservativecoputer.com)
----- End forwarded message -----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From lindahl at conservativecomputer.com Mon Jan 7 16:34:30 2002
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Mon, 7 Jan 2002 16:34:30 -0500
Subject: Shared diskspace between nodes
In-Reply-To: <042a01c197b2$a2165530$0302a8c0@Roaming>; from jon@minotaur.com on Mon, Jan 07, 2002 at 02:36:42PM -0500
References: <042a01c197b2$a2165530$0302a8c0@Roaming>
Message-ID: <20020107163430.A23531@wumpus.foo>
On Mon, Jan 07, 2002 at 02:36:42PM -0500, Jon E. Mitchiner wrote:
> Right now, we are saving data on various nodes, and moving it around when
> space gets tight on a machine. This is getting time consuming as some of us
> have to look on different nodes to find out where your data is currently
> residing. I am considering saving all directory names in a database and
> then making a GUI interface via the web so its easy to find the location of
> data directories, rather than looking for it (especially if someone moved my
> directory to another machine without letting me know).
Other people have suggested PVFS, which is a bit less reliable than
what you have now. Another route to go is a simple perl script,
with an interface like:
mkdatadir foo 100M
This creates a symlink "foo" to diskspace somewhere on the cluster
that has 100 megabytes free. The script starts by checking the local
node, and if there isn't enough room there, it can randomly look
around the cluster for a node that's up, and create a symlink to the
appropriate NFS directory (assuming you have N^2 automounts.)
That's basically automating what you do today, and it's as reliable as
what you do today. BTW, if you move someone's files, leave behind a
symlink.
greg
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From scheinin at crs4.it Tue Jan 8 03:19:34 2002
From: scheinin at crs4.it (Alan Scheinine)
Date: Tue, 8 Jan 2002 09:19:34 +0100 (MET)
Subject: Shared diskspace between nodes
Message-ID: <200201080819.JAA04087@dylandog.crs4.it>
What about automount? When files move only one master record
on a NIS server needs to be changed, and automount makes the
symbolic links on the fly. There was a long thread about NIS
not being able to handle many requests at one time, which might
be a drawback.
Alan Scheinine scheinin at crs4.it
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From JIHAMO at uk.ibm.com Tue Jan 8 06:38:34 2002
From: JIHAMO at uk.ibm.com (Jens Ihamo)
Date: Tue, 8 Jan 2002 11:38:34 +0000
Subject: Shared diskspace between nodes
Message-ID:
Hi, have a look @ AFS - can be configured to do exactly what you described
in the scenario.
Theres both commercial IBM AFS with support etc, and free OpenAFS
available.
Other options include Coda and Arla, but AFS seems to be considered mature
code and
most widely used in the field NCSA,CERN etc.
http://www.transarc.ibm.com/Product/EFS/index.html
http://www.ibm.com/developerworks/oss/
http://www.OpenAFS.org
Regards
Jens Ihamo
>From: "Jon E. Mitchiner"
>To:
>Subject: Shared diskspace between nodes
>Date: Mon, 7 Jan 2002 14:36:42 -0500
>
>Greetings!
>
>I presently run a 40-node cluster, Dual 1GHz with 20GB hard drive on each
>system. This gives me roughly 15GB (safe estimate) after the OS,
installed
>programs, some data, etc on each machine. This gives me roughly 600GB of
>space that I am not currently utilizing on 40 nodes.
>
>...
>
>Jon E. Mitchiner
>Minotaur Technologies
>http://www.minotaur.com
>AOL IM [http://www.aol.com/aim] MinotaurT
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From mack.joseph at epa.gov Tue Jan 8 07:56:34 2002
From: mack.joseph at epa.gov (Joseph Mack)
Date: Tue, 08 Jan 2002 07:56:34 -0500
Subject: beowulf at RDU airport (Raleigh-Durham, NC, USA)
Message-ID: <3C3AEC82.BC51393E@epa.gov>
I was surprised to see a beowulf at RDU airport on a recent departure.
Rather than a stack of commodity compute nodes, it is a fibreglass statue
of a wolf, about 4' long x 2' high. The wolf's coat has printed
circuit board patterns on it. Being somewhat mystified and
having initially assumed I had just a statue of a wolf in front of me,
I looked at the sign to see the title "Beowulf" with the name of the artist
and sponsor on it. The beowulf is on a platform edged by rectangular
circuit boards alternately on-end:on-side to give the profile of the
top of a castle wall.
The beowulf is just beyond the pick-up and drop-off area for terminal A.
Unfortunately it's in an area where most people are driving and trying
to avoid colliding with unpredictably moving objects; police, cars
pulling in and out and people on the crosswalks. Most people won't
notice the statue. I only saw it as I had time to kill and was going
for a walk outside the terminal.
The artist is Greg Carter and the sponsor is the Greater Raleigh Chamber
of Commerce.
A google search for Greg Carter finds
http://www.cyberpiggy.com/
with a link to more information and a picture of the statue at
http://www.cyberpiggy.com/beowulf/index.html
Except for the circuit board motifs, I would have regarded the
name "beowulf" on the statue as a co-incidence. Clicking on the
various photos on the website brings up text, and reveals that
the artist knows about beowulf computers.
Phone calls to the Chamber of Commerce have failed to get any
more information. They're on another line even at the early hours
of the morning (busy people) and haven't been able to back
to me despite their earnest assurances that they would do so
at the earliest opportunity.
I've cc'ed Greg on this. Hi Greg, thanks for the statue.
Joe
--
Joseph Mack PhD, Senior Systems Engineer, Lockheed Martin
contractor to the National Environmental Supercomputer Center,
mailto:mack.joseph at epa.gov ph# 919-541-0007, RTP, NC, USA
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From agrajag at linuxpower.org Tue Jan 8 09:17:42 2002
From: agrajag at linuxpower.org (Jag)
Date: Tue, 8 Jan 2002 06:17:42 -0800
Subject: beowulf at RDU airport (Raleigh-Durham, NC, USA)
In-Reply-To: <3C3AEC82.BC51393E@epa.gov>; from mack.joseph@epa.gov on Tue, Jan 08, 2002 at 07:56:34AM -0500
References: <3C3AEC82.BC51393E@epa.gov>
Message-ID: <20020108061742.A12802@kotako.analogself.com>
On Tue, 08 Jan 2002, Joseph Mack wrote:
> I was surprised to see a beowulf at RDU airport on a recent departure.
>
> Rather than a stack of commodity compute nodes, it is a fibreglass statue
> of a wolf, about 4' long x 2' high. The wolf's coat has printed
> circuit board patterns on it. Being somewhat mystified and
> having initially assumed I had just a statue of a wolf in front of me,
> I looked at the sign to see the title "Beowulf" with the name of the artist
> and sponsor on it. The beowulf is on a platform edged by rectangular
> circuit boards alternately on-end:on-side to give the profile of the
> top of a castle wall.
>
> The beowulf is just beyond the pick-up and drop-off area for terminal A.
> Unfortunately it's in an area where most people are driving and trying
> to avoid colliding with unpredictably moving objects; police, cars
> pulling in and out and people on the crosswalks. Most people won't
> notice the statue. I only saw it as I had time to kill and was going
> for a walk outside the terminal.
As a resident of Raleigh, NC, let me try to shed some light on this. At
some point last year, all over Raleigh statues of wolves, similar to the
one you saw, were put up all over Raleigh in public places. There are
several on NCSU campus, as well as some downtown and in various other
places in the city, such as the airport. Each statue is a wolf, but
they're all done with a different theme. As I understand it, local
artists were commisioned to make these statues.
I really like the idea of one of these statues being devoted to beowulf
computing, I just wish it was in a better location. I drove by this
statue the other day and noticed it, but was unable to make out the
smaller wolves and thus realize its meaning.
Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL:
From rmcgaugh at atipa.com Tue Jan 8 09:32:15 2002
From: rmcgaugh at atipa.com (Rocky McGaugh)
Date: Tue, 08 Jan 2002 08:32:15 -0600
Subject: beowulf at RDU airport (Raleigh-Durham, NC, USA)
References: <3C3AEC82.BC51393E@epa.gov>
Message-ID: <3C3B02EF.8010305@atipa.com>
Joseph Mack wrote:
> with a link to more information and a picture of the statue at
>
> http://www.cyberpiggy.com/beowulf/index.html
>
his site sure isnt very mozilla friendly...:(
--
Rocky McGaugh
Atipa Technologies
rocky at atipatechnologies.com
rmcgaugh at atipa.com
1-785-841-9513 x3110
http://1087800222/
perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");'
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rmcgaugh at atipa.com Tue Jan 8 09:54:51 2002
From: rmcgaugh at atipa.com (Rocky McGaugh)
Date: Tue, 08 Jan 2002 08:54:51 -0600
Subject: beowulf at RDU airport (Raleigh-Durham, NC, USA)
References: <3C3AEC82.BC51393E@epa.gov> <3C3B02EF.8010305@atipa.com>
Message-ID: <3C3B083B.2090402@atipa.com>
Rocky McGaugh wrote:
> Joseph Mack wrote:
>
>
>> with a link to more information and a picture of the statue at
>>
>> http://www.cyberpiggy.com/beowulf/index.html
>>
>
>
> his site sure isnt very mozilla friendly...:(
>
Whoops! i over-reacted. After first having a very nasty moz crash,
i went back and all looks to be working cool now.
Moz sometimes acts funny when you leave it open for weeks at a time..heh
My apologies Greg..:)
--
Rocky McGaugh
Atipa Technologies
rocky at atipatechnologies.com
rmcgaugh at atipa.com
1-785-841-9513 x3110
http://1087800222/
perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");'
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From josip at icase.edu Tue Jan 8 10:41:06 2002
From: josip at icase.edu (Josip Loncaric)
Date: Tue, 08 Jan 2002 10:41:06 -0500
Subject: charmm scalability on 2.4 kernels
References: <200201051810.g05IAU65015061@helix.nih.gov>
Message-ID: <3C3B1312.6B71388A@icase.edu>
"Eray Ozkural (exa)" wrote:
>
> It was said before on this list that 2.4 already incorporates TCP fixes. Has
> anybody met a similar situation before?
Actually, Linux kernel 2.4 incorporates an improved yet very different
TCP stack where my TCP fixes do not help much in my usual test case
(point-to-point streaming of small messages). However, Steve's
scalability problems with the stock 2.4 TCP are very interesting, since
they involve many machines, i.e. a completely different communication
pattern from my usual point-to-point tests.
Old 2.2+fix combination was pretty efficient at aggregating small
messages into larger TCP packets before sending, in fact better than
stock 2.4. Packet aggregation is something that depends on delicate
timing of congestion control events on both sender and receiver; this is
very sensitive to the application's communication pattern. Perhaps a
2.4 TCP fix would need to be developed after all...
Sincerely,
Josip
--
Dr. Josip Loncaric, Research Fellow mailto:josip at icase.edu
ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From bogdan.costescu at iwr.uni-heidelberg.de Tue Jan 8 13:32:18 2002
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Tue, 8 Jan 2002 19:32:18 +0100 (CET)
Subject: charmm scalability on 2.4 kernels
In-Reply-To: <20020107140134.A23200@wumpus.foo>
Message-ID:
On Mon, 7 Jan 2002, Greg Lindahl wrote:
> On Mon, Jan 07, 2002 at 07:22:39PM +0100, Tru wrote:
>
> > I think the bad speedup comes from dual VS single cpu nodes
> > regarding parallel behaviour of CHARMM.
>
> If so, that's easy enough to check: You can run only one process on a
> dual cpu node, for benchmarking purposes.
That is actually what I have observed during the last 3 years of running
different versions of kernels, MPI libraries and CHARMM. Running using
only one transport (TCP or shared mem) is always better than mixing them,
f.e (using LAM-6.5.6):
CPUs nodes real time (min) transports
4 4 5.95 TCP
4 2 7.08 TCP+USYSV
As you can see, the difference is quite significant.
With 8 single CPU nodes over FE, the scalability of PME goes down to 50%;
using Myrinet (with SCore), it's around 75% - so the algorithm is not
quite Beowulf-friendly. However, I haven't noticed any significant change
in scalability between runs with 2.2.x and 2.4.x kernels.
I obtained a behaviour similar with that from the graphs when I used TCP
as IPC on the same node instead of shared memory. Apropos, could the
zero-copy kernel stuff be used to improve this situation ?
--
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From spoel at xray.bmc.uu.se Tue Jan 8 13:51:27 2002
From: spoel at xray.bmc.uu.se (David van der Spoel)
Date: Tue, 8 Jan 2002 19:51:27 +0100 (CET)
Subject: charmm scalability on 2.4 kernels
In-Reply-To: <3C3B1312.6B71388A@icase.edu>
Message-ID:
On Tue, 8 Jan 2002, Josip Loncaric wrote:
>Actually, Linux kernel 2.4 incorporates an improved yet very different
>TCP stack where my TCP fixes do not help much in my usual test case
>(point-to-point streaming of small messages). However, Steve's
>scalability problems with the stock 2.4 TCP are very interesting, since
>they involve many machines, i.e. a completely different communication
>pattern from my usual point-to-point tests.
>
>Old 2.2+fix combination was pretty efficient at aggregating small
>messages into larger TCP packets before sending, in fact better than
>stock 2.4. Packet aggregation is something that depends on delicate
>timing of congestion control events on both sender and receiver; this is
>very sensitive to the application's communication pattern. Perhaps a
>2.4 TCP fix would need to be developed after all...
Actually for molecular dynamics I think the problem is mainly latency.
Scali/Myrinet is a big win for large number of processors (see
www.gromacs.org/benchmarks/scaling.php for my own benchmarks).
However for me it also helped quite a bit to increase the TCP and Shared
memory short message size for LAM (I use 512 kb). Apparently short and
long messages are treated differently.
Groeten, David.
________________________________________________________________________
Dr. David van der Spoel, Biomedical center, Dept. of Biochemistry
Husargatan 3, Box 576, 75123 Uppsala, Sweden
phone: 46 18 471 4205 fax: 46 18 511 755
spoel at xray.bmc.uu.se spoel at gromacs.org http://zorn.bmc.uu.se/~spoel
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From mack.joseph at epa.gov Tue Jan 8 13:58:03 2002
From: mack.joseph at epa.gov (Joseph Mack)
Date: Tue, 08 Jan 2002 13:58:03 -0500
Subject: beowulf at RDU airport (Raleigh-Durham, NC, USA)
Message-ID: <3C3B413B.7403BF94@epa.gov>
The beowulf statue (and possibly other wolf statues) are only
on display for a while. They will be auctioned off on Apr 5 at
the History Museum in Raleigh NC by Steve Gruber (the Arts Commission
consultant). If you want the one and only fibreglass beowulf statue,
contact Steve at fundboy at earthlink.net.
Joe
--
Joseph Mack PhD, Senior Systems Engineer, Lockheed Martin
contractor to the National Environmental Supercomputer Center,
mailto:mack.joseph at epa.gov ph# 919-541-0007, RTP, NC, USA
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rgb at phy.duke.edu Tue Jan 8 16:02:46 2002
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 8 Jan 2002 16:02:46 -0500 (EST)
Subject: beowulf at RDU airport (Raleigh-Durham, NC, USA)
In-Reply-To: <3C3AEC82.BC51393E@epa.gov>
Message-ID:
On Tue, 8 Jan 2002, Joseph Mack wrote:
> I was surprised to see a beowulf at RDU airport on a recent departure.
>
> Rather than a stack of commodity compute nodes, it is a fibreglass statue
> of a wolf, about 4' long x 2' high. The wolf's coat has printed
> circuit board patterns on it. Being somewhat mystified and
> having initially assumed I had just a statue of a wolf in front of me,
> I looked at the sign to see the title "Beowulf" with the name of the artist
> and sponsor on it. The beowulf is on a platform edged by rectangular
> circuit boards alternately on-end:on-side to give the profile of the
> top of a castle wall.
Dear Joe and Greg,
Hmmm, if that is the blue wolf with the exotic cooling fins on the back
that I passed yesterday dropping off my mother-in-law at the airport one
has to ask:
Does a beo-wulf run any faster in the snow?
(Recalling our long discussion of exotic cooling mechanisms for cpus and
noting our recent heavy snow...;-)
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From mack.joseph at epa.gov Tue Jan 8 16:15:37 2002
From: mack.joseph at epa.gov (Joseph Mack)
Date: Tue, 08 Jan 2002 16:15:37 -0500
Subject: beowulf at RDU airport (Raleigh-Durham, NC, USA)
References:
Message-ID: <3C3B6179.90C90E25@epa.gov>
"Robert G. Brown" wrote:
> Dear Joe and Greg,
>
> Hmmm, if that is the blue wolf
The beowulf wolf is red at the far end of the terminal.
The blue one you saw is at the near end of the terminal.
The blue wolf has a non-beowulf theme.
> Does a beo-wulf run any faster in the snow?
good question. Greg?
Joe
--
Joseph Mack PhD, Senior Systems Engineer, Lockheed Martin
contractor to the National Environmental Supercomputer Center,
mailto:mack.joseph at epa.gov ph# 919-541-0007, RTP, NC, USA
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From louie at chikka.com Wed Jan 9 06:01:07 2002
From: louie at chikka.com (louie miranda)
Date: Wed, 9 Jan 2002 19:01:07 +0800
Subject: beowulf -- HD Space q?
Message-ID: <031401c198fc$f15f4de0$2601a8c0@nocras>
Hi, im no beowulf expert.
I just want to ask, is it possible for beowulf to cluster server's HD Space?
thanks, please give light.
louie.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From jlb17 at duke.edu Wed Jan 9 06:25:30 2002
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Wed, 9 Jan 2002 06:25:30 -0500 (EST)
Subject: beowulf -- HD Space q?
In-Reply-To: <031401c198fc$f15f4de0$2601a8c0@nocras>
Message-ID:
On Wed, 9 Jan 2002 at 7:01pm, louie miranda wrote
> Hi, im no beowulf expert.
> I just want to ask, is it possible for beowulf to cluster server's HD Space?
>
Hmmm, well, your timing is good. Look in the archives of the mailing
list for the last few days -- there's a thread entitled "Shared
diskspace between nodes" that explores a number of options.
--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From hanzl at noel.feld.cvut.cz Wed Jan 9 08:54:34 2002
From: hanzl at noel.feld.cvut.cz (hanzl at noel.feld.cvut.cz)
Date: Wed, 09 Jan 2002 14:54:34 +0100
Subject: export disks from nodes?
Message-ID: <20020109145434W.hanzl@unknown-domain>
Is it possible to have scyld nodes with local filesystems, used mostly
locally but also exported?
- anybody figured out all the things needed for NFS server on node?
- can PVFS do that? (without slowing down mostly local HD access)
- any other idea?
Thanks
Vaclav
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From agrajag at scyld.com Wed Jan 9 08:44:43 2002
From: agrajag at scyld.com (Sean Dilda)
Date: Wed, 9 Jan 2002 08:44:43 -0500
Subject: export disks from nodes?
In-Reply-To: <20020109145434W.hanzl@unknown-domain>; from hanzl@noel.feld.cvut.cz on Wed, Jan 09, 2002 at 02:54:34PM +0100
References: <20020109145434W.hanzl@unknown-domain>
Message-ID: <20020109084443.A8780@blueraja.scyld.com>
On Wed, 09 Jan 2002, hanzl at noel.feld.cvut.cz wrote:
> Is it possible to have scyld nodes with local filesystems, used mostly
> locally but also exported?
Theoretically, yes. However, I've never done it so I don't know how
hard it is or exactly what problems you'll run into.
>
> - anybody figured out all the things needed for NFS server on node?
This is the main problem, getting the daemons and the files they require
onto the nodes, but even that can be tricky.
>
> - can PVFS do that? (without slowing down mostly local HD access)
This is probablly your best bet, but I'm not sure how it would slow down
local HD access. With PVFS you take a number of slave nodes and speicfy
them to serve data, and a filesystem is created that is split among all
those nodes, much like RAID striping. If anything, accessing a file off
a PVFS filesystem should be faster than off an NFS filesystem due to the
distributed nature of it.
Sean
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL:
From hanzl at noel.feld.cvut.cz Wed Jan 9 11:11:21 2002
From: hanzl at noel.feld.cvut.cz (hanzl at noel.feld.cvut.cz)
Date: Wed, 09 Jan 2002 17:11:21 +0100
Subject: export disks from nodes?
In-Reply-To: <20020109084443.A8780@blueraja.scyld.com>
References: <20020109145434W.hanzl@unknown-domain>
<20020109084443.A8780@blueraja.scyld.com>
Message-ID: <20020109171121B.hanzl@unknown-domain>
> > - anybody figured out all the things needed for NFS server on [scyld] node?
>
> This is the main problem, getting the daemons and the files they require
> onto the nodes, but even that can be tricky.
Might be worth creating light stripped-down version of NFS machinery -
anybody heard about anything like this? Maybe there is a one-floppy
linux distribution with NFS server to start with?
> > - can PVFS do that? (without slowing down mostly local HD access)
>
> This is probablly your best bet, but I'm not sure how it would slow down
> local HD access. With PVFS you take a number of slave nodes and speicfy
> them to serve data, and a filesystem is created that is split among all
> those nodes, much like RAID striping. If anything, accessing a file off
> a PVFS filesystem should be faster than off an NFS filesystem due to the
> distributed nature of it.
My application is rather special cause it is nearly perfect regarding
HD/CPU communication - it is quite happy with every HD streaming data
to CPU in the same box for tens of minutes, then results are quickly
mixed using network and all this repeats. There is not much left for
PVFS to help with - application itself keeps all HDs busy and all CPUs
busy, so it makes no sense to send data over network.
Network access to all data is infrequent, I just want it to be
possible and relatively easy (easier than to bpcp them).
However I am not a PVFS expert, maybe PVFS could help even under these
circumstances (any opinions?).
Regards and Thanks
Vaclav
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From d.l.farley at larc.nasa.gov Wed Jan 9 10:37:42 2002
From: d.l.farley at larc.nasa.gov (Doug Farley)
Date: Wed, 09 Jan 2002 10:37:42 -0500
Subject: Understanding Benchmark Results
Message-ID: <3C3C63C6.7080106@larc.nasa.gov>
Is there a site somewhere, or a paper that describes how to interperate
and deal with some of the volumous amouts of output from programs like
HPL and ScaLAPACK for benchmarking pourposes? Perhaps one that give
examples of output and compares some from various other clusters and
commercial servers so I can do some comparitive analysis?
Thanks
--
Douglas Farley
Data Analysis and Imaging Branch
Systems Engineering Competency
NASA Langley Research Center
< D.L.FARLEY at LaRC.NASA.GOV >
< Phone +1 757 864-8141 >
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From agrajag at scyld.com Wed Jan 9 13:50:55 2002
From: agrajag at scyld.com (Sean Dilda)
Date: Wed, 9 Jan 2002 13:50:55 -0500
Subject: export disks from nodes?
In-Reply-To: <20020109171121B.hanzl@unknown-domain>; from hanzl@noel.feld.cvut.cz on Wed, Jan 09, 2002 at 05:11:21PM +0100
References: <20020109145434W.hanzl@unknown-domain> <20020109084443.A8780@blueraja.scyld.com> <20020109171121B.hanzl@unknown-domain>
Message-ID: <20020109135055.B8780@blueraja.scyld.com>
On Wed, 09 Jan 2002, hanzl at noel.feld.cvut.cz wrote:
> My application is rather special cause it is nearly perfect regarding
> HD/CPU communication - it is quite happy with every HD streaming data
> to CPU in the same box for tens of minutes, then results are quickly
> mixed using network and all this repeats. There is not much left for
> PVFS to help with - application itself keeps all HDs busy and all CPUs
> busy, so it makes no sense to send data over network.
>
> Network access to all data is infrequent, I just want it to be
> possible and relatively easy (easier than to bpcp them).
bpcp does have a recursive mode, so if its a matter of recursively
copying, you can just do that. Otherwise, if you want easy acces, I
might also suggest having a central store for the data on the master
node that you can easily tweak, and have a script that'll send the data
out to the slave nodes whenever you change anything. However that'll
require you to copy everything over even for a minor change. If you do
have a lot of data, you can look into using rsync (with bpsh as its rsh
implementation) instead of bpcp. rsync will only transfer over what's
changed, thus saving a lot of bandwidth. I've never tried rsync with
bpsh, but I think it should work.
Sean
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL:
From pw at osc.edu Wed Jan 9 15:56:18 2002
From: pw at osc.edu (Pete Wyckoff)
Date: Wed, 9 Jan 2002 15:56:18 -0500
Subject: ANNOUNCEMENT: mpiexec mailing list and mpich/p4 support
Message-ID: <20020109155618.I17296@osc.edu>
Mpiexec is a replacement program for the standard "mpirun" script that
people have traditionally used to start parallel jobs. Mpiexec is used
specifically to initialize a parallel job from within a PBS batch or
interactive environment.
Mpiexec uses the task manager library of PBS to spawn copies of the
executable on the nodes in a PBS allocation. This is much faster than
invoking a separate rsh once for each process. Another benefit is that
resources used by the spawned processes are accounted correctly with
mpiexec, and reported in the PBS logs. Plus there's lots of knobs you
can twist to control job placement, input and output stream handling,
and other variations.
The distribution, including instructions for CVS access, can be found
at
http://www.osc.edu/~pw/mpiexec/
We've recently created a mailing list for mpiexec, mpiexec at osc.edu.
You can subscribe using the standard mailman techniques; see
http://email.osc.edu/mailman/listinfo/mpiexec
for information and archives.
The latest news is addition of support for those who use ethernet for
message passing, using MPICH with its P4 library. The other MPI
libraries supported are MPICH/GM (Myrinet) and EMP (research gigabit
ethernet). I'd love to support LAM as well, but could use some help
with that. Mpiexec is developed on a linux/ia64 environment, but
there's no reason it shouldn't work on clusters using other POSIX-like
operating systems. Patches to support other systems will be happily
accepted.
To use mpiexec in your cluster, you'll need to be willing to apply a
small patch to your PBS distribution to use all the functionality of
mpiexec. If you use MPICH/P4, you'll need to apply a rather large patch
to MPICH, although the MPICH developers are working to apply much of it
to their official distribution.
-- Pete
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From chris at chris-wilson.co.uk Wed Jan 9 16:38:20 2002
From: chris at chris-wilson.co.uk (Chris Wilson)
Date: Wed, 9 Jan 2002 21:38:20 +0000
Subject: Help needed: MPI problem
In-Reply-To: <5.1.0.14.2.20020104074905.04965dc0@localhost>; from gropp@mcs.anl.gov on Fri, Jan 04, 2002 at 07:52:26AM -0600
References: <200201040728.CAA25493@me1.eng.wayne.edu> <5.1.0.14.2.20020104074905.04965dc0@localhost>
Message-ID: <20020109213820.B1965@florence.intimate.mysticnet.org.uk>
On Fri, Jan 04, 2002 at 07:52, William Gropp wrote:
[...]
> Bug reports about MPICH should be sent to mpi-maint at mcs.anl.gov.
Is there a specific list for MPICH problems? I'm trying to solve why I
would be getting connection to process 0 failures in the middle of runs
and why serv_p4 briefly throws a wobbler and refuses to spawn children.
Among many other occasional glitches.
--
Chris Wilson (^_`) spam to bit.bucket at dev.null
florence: an old i686, running kernel 2.4.16 at 1992.28 bogoMIPS.
Anything that, in happening, causes itself to happen again,
happens again. -- THHGTTG
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rajkumar at csse.monash.edu.au Thu Jan 10 02:37:34 2002
From: rajkumar at csse.monash.edu.au (Rajkumar Buyya)
Date: Thu, 10 Jan 2002 18:37:34 +1100
Subject: Cluster Computing course Presentation Slides Online
Message-ID: <3C3D44BE.5D42FBB1@csse.monash.edu.au>
Dear All,
I am pleased to inform you that slides of our:
High Performance Cluster Computing
Vol1. Book Chapters 1, 4, 7, 9, 10, 14, 16, 17, 18, 19, 20) are now
available for download. Thanks to Hai Jin--he prepared them while
teaching a graduate course on "High Performance Cluster Computing" in
his university.
We have put complete source version (i.e., Power Point version), which
you can modify and use in your presentations. To download, visit:
http:/www.buyya.com/cluster/ OR
http://www.csse.monash.edu.au/~rajkumar/cluster/
and goto section "Presentation Slides" for download link.
Thanks
Raj
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From ssy at prg.cpe.ku.ac.th Thu Jan 10 02:50:36 2002
From: ssy at prg.cpe.ku.ac.th (Somsak Sriprayoonsakul)
Date: Thu, 10 Jan 2002 14:50:36 +0700
Subject: aggregate cluster stats
Message-ID: <002a01c199ab$805cb060$1a226c9e@cpe.ku.ac.th>
You can also use SCMS from www.opensce.org. It has GUI frontend to see
all node statistics and command line to grab the statistics in text-based.
You can also add new type of statistics by compiled it in .so format. Hope
this might help.
> I am currently running a 118 processor cluster, with bigbrother and larrd
to monitor
>system status and gather performance and utilization data on a node by node
basis.
> However, my management is now requesting aggregate statistics, and a
> web page showing load, etc, across the entire cluster.
>Has anybody hacked something like this themselves? I would rather stick
close to
>bigbrother and larrd, just to simplify implementation, but I have been
playing with
>SGI's open source release of PCP, and I am not adverse to switching to
another
>(free) solution if it can simplify the process.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From aby_sinha at yahoo.com Thu Jan 10 04:00:42 2002
From: aby_sinha at yahoo.com (Abhishek Sinha)
Date: Thu, 10 Jan 2002 14:30:42 +0530
Subject: IRC channel on beowulf ???
Message-ID: <3C2A4F440017132A@mail.san.yahoo.com> (added by postmaster@mail.san.yahoo.com)
Hello All
Is there a IRC channel dedicated to "beowulfery " ** . I have been looking
for one on beowulf /parallel computing
Thanks in advance
Abby
** Copyright RGB :)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From alberto at delta.ft.uam.es Thu Jan 10 09:42:34 2002
From: alberto at delta.ft.uam.es (Alberto Ramos)
Date: Thu, 10 Jan 2002 15:42:34 +0100
Subject: Beowulf with Gigabit Ethernet
Message-ID: <20020110154234.A17039@delta.ft.uam.es>
Hi all. Im new to this world, but here in one university of Madrid, we are
planning to construct a beowulf for lattice QCD numerical simulations.
We are thinking in 8-16 dual Athlon MP processors based in the new Tyan
MPX mainboard. We have a problem with the comunication. Myrinet is very
expensive for us, and we are thinking in Gigabit over cooper, but we dont
know if this has some problems with latencies. In other words, Do you think
that the improvement from fast Ethernet to Gigabit over cooper Ethernet is
good enougth?
Also we dont know very well what the latencies of gigabit over cooper are
compared with fast ethernet and myrinet, maybe some of us can help us.
Thanks to all.
Alberto.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From agrajag at scyld.com Thu Jan 10 10:13:22 2002
From: agrajag at scyld.com (Sean Dilda)
Date: Thu, 10 Jan 2002 10:13:22 -0500
Subject: IRC channel on beowulf ???
In-Reply-To: <3C2A4F440017132A@mail.san.yahoo.com>; from aby_sinha@yahoo.com on Thu, Jan 10, 2002 at 02:30:42PM +0530
References: <3C2A4F440017132A@mail.san.yahoo.com>
Message-ID: <20020110101322.A11750@blueraja.scyld.com>
On Thu, 10 Jan 2002, Abhishek Sinha wrote:
> Hello All
>
> Is there a IRC channel dedicated to "beowulfery " ** . I have been looking
> for one on beowulf /parallel computing
I personally don't know of any, but I wouldn't be opposed to seeing one
start up. To me, the obvious place to start such a channel would be
#beowulf on irc.openprojects.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL:
From alazur at plogic.com Thu Jan 10 10:34:44 2002
From: alazur at plogic.com (Adam Lazur)
Date: Thu, 10 Jan 2002 10:34:44 -0500
Subject: IRC channel on beowulf ???
In-Reply-To: <3C2A4F440017132A@mail.san.yahoo.com>
References: <3C2A4F440017132A@mail.san.yahoo.com>
Message-ID: <20020110153443.GA27168@clustermonkey.org>
Abhishek Sinha (aby_sinha at yahoo.com) said:
> Is there a IRC channel dedicated to "beowulfery " ** . I have been looking
> for one on beowulf /parallel computing
The closest thing to that which I've seen is #clusters on
irc.openprojects.net which the linux-cluster ml [1] guys (and now ocf
[2]?) use.
--
Adam Lazur
Special Forces, Paralogic Inc.
[1] http://mail.nl.linux.org/linux-cluster/
[2] http://opencf.org/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From gran at scali.no Thu Jan 10 09:42:55 2002
From: gran at scali.no (=?iso-8859-1?Q?=D8ystein?= Gran Larsen)
Date: Thu, 10 Jan 2002 15:42:55 +0100
Subject: Beowulf with Gigabit Ethernet
References: <20020110154234.A17039@delta.ft.uam.es>
Message-ID: <3C3DA86F.189FCC96@scali.no>
Alberto Ramos wrote:
> Hi all. Im new to this world, but here in one university of Madrid, we are
> planning to construct a beowulf for lattice QCD numerical simulations.
>
> We are thinking in 8-16 dual Athlon MP processors based in the new Tyan
> MPX mainboard. We have a problem with the comunication. Myrinet is very
> expensive for us, and we are thinking in Gigabit over cooper, but we dont
> know if this has some problems with latencies. In other words, Do you think
> that the improvement from fast Ethernet to Gigabit over cooper Ethernet is
> good enougth?
>
> Also we dont know very well what the latencies of gigabit over cooper are
> compared with fast ethernet and myrinet, maybe some of us can help us.
>
> Thanks to all.
>
> Alberto.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Check our ScaMPI datasheet at www.scali.com/download/doc/ScaMPI-DS-A4.pdf.
It uses ping-pong performance to illustrate the performance of SCI, Myrinet and Ethernets.
-?ystein
--
?ystein Gran Larsen, Dr.Scient mailto:gran at scali.no Tel:+47 2262-8982
---------------------------------------------------------------------
MPI?SCI=HPC -- Scalable Linux Systems -- www.scali.com
Subscribe to our mailing lists at http://www.scali.com/support
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From patrick at myri.com Thu Jan 10 11:39:34 2002
From: patrick at myri.com (Patrick Geoffray)
Date: Thu, 10 Jan 2002 11:39:34 -0500
Subject: Beowulf with Gigabit Ethernet
References: <20020110154234.A17039@delta.ft.uam.es>
Message-ID: <3C3DC3C6.24B0614@myri.com>
Hello Alberto,
Alberto Ramos wrote:
> know if this has some problems with latencies. In other words, Do you think
> that the improvement from fast Ethernet to Gigabit over cooper Ethernet is
> good enougth?
No. The latency of IP over GigE/cooper is roughly the same than IP over Fast
Ethernet (it's even a little bit more, because the minimum GigE packet is a
little bit larger than for Fast Ethernet).
> Also we dont know very well what the latencies of gigabit over cooper are
> compared with fast ethernet and myrinet, maybe some of us can help us.
If it's IP, that's the same thing. The part of the NIC overhead in the IP
latency is very tiny.
If your protocol is IP and you concern is latency, go with Fast Ethernet.
If your protocol is MPI, don't go GigE.
Patrick
----------------------------------------------------------
| Patrick Geoffray, Ph.D. patrick at myri.com
| Myricom, Inc. http://www.myri.com
| Cell: 865-389-8852 685 Emory Valley Rd (B)
| Phone: 865-425-0978 Oak Ridge, TN 37830
----------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From hanzl at noel.feld.cvut.cz Thu Jan 10 13:12:45 2002
From: hanzl at noel.feld.cvut.cz (hanzl at noel.feld.cvut.cz)
Date: Thu, 10 Jan 2002 19:12:45 +0100
Subject: export disks from nodes?
In-Reply-To: <20020109084443.A8780@blueraja.scyld.com>
References: <20020109145434W.hanzl@unknown-domain>
<20020109084443.A8780@blueraja.scyld.com>
Message-ID: <20020110191245M.hanzl@unknown-domain>
>> - anybody figured out all the things needed for NFS server on node?
>
>This is the main problem, getting the daemons and the files they require
>onto the nodes, but even that can be tricky.
I tried and got stuck on this:
# bpsh 0 rpc.mountd --no-nfs-version 3
svc_tcp.c - cannot getsockname or listen: Invalid argument
mountd: cannot create tcp service.
** Any explanation is more than welcome. **
Related (I hope) source code can be seen here:
http://www.ajk.tele.fi/libc/rpc/svc_tcp.c.html
For the rest, NFS server on scyld node seemes possible. So far I
managed to start these (more or less related) parts:
# bpsh 0 rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 806 status
100024 1 tcp 808 status
100021 1 udp 1025 nlockmgr
100021 3 udp 1025 nlockmgr
100011 1 udp 943 rquotad
100011 2 udp 943 rquotad
100003 2 udp 2049 nfs
The rpc.mountd problem might be related to resolver/names oddities,
e.g. netstat says:
# bpsh 0 netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 .0:1592 .-1:1936 ESTABLISHED
tcp 0 0 .0:1591 .-1:1936 ESTABLISHED
tcp 0 0 .0:1590 .-1:1936 ESTABLISHED
tcp 0 0 .0:1589 .-1:1937 CLOSE
tcp 1 0 .0:1432 .-1:1852 CLOSE_WAIT
...
tcp 0 0 .0:615 .-1:2223 ESTABLISHED
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags Type State I-Node Path
This certainly does not look as usual and mountd might get confused.
Thanks for any comments
Vaclav
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From erayo at cs.bilkent.edu.tr Thu Jan 10 12:53:07 2002
From: erayo at cs.bilkent.edu.tr (Eray Ozkural (exa))
Date: Thu, 10 Jan 2002 19:53:07 +0200
Subject: charmm scalability on 2.4 kernels
In-Reply-To: <3C3B1312.6B71388A@icase.edu>
References: <200201051810.g05IAU65015061@helix.nih.gov> <3C3B1312.6B71388A@icase.edu>
Message-ID:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Greetings Josip,
On Tuesday 08 January 2002 17:41, Josip Loncaric wrote:
> Actually, Linux kernel 2.4 incorporates an improved yet very different
> TCP stack where my TCP fixes do not help much in my usual test case
> (point-to-point streaming of small messages). However, Steve's
> scalability problems with the stock 2.4 TCP are very interesting, since
> they involve many machines, i.e. a completely different communication
> pattern from my usual point-to-point tests.
>
I have some results regarding 2.4.x. Actually, the 2.2.x was more consistent
in terms of latency. I'm attaching the output of mpptest performance
test suite on our 32 node beowulf system.
The kernel version is 2.4.14 on all nodes. Network hardware is Intel 82557
Ethernet PRO 100 (rev 8), and 3COM SuperStack II 3900 100BaseTX switch.
These are for point to point, bisection bandwidth and broadcast tests
included in the suite. I am very concerned by this situation as it might have
impact on our research in fine-grained algorithms. (But so far it doesn't
seem to have)
To plot the results:
gnuplot pt2pt.mpl
gnuplot bisect.mpl
gnuplot bcast.mpl
As you can see there are unexpected spikes in the plots. I don't know what to
attribute to them but AFAIK there were no other parallel applications running
at the time, and I am certain that I took multiple runs because of the
inconsistency of the results.
This is not caused by hardware level problems since in 2.2.x (without any
patch) there was a monotonous increase in latency with growing message size.
> Old 2.2+fix combination was pretty efficient at aggregating small
> messages into larger TCP packets before sending, in fact better than
> stock 2.4. Packet aggregation is something that depends on delicate
> timing of congestion control events on both sender and receiver; this is
> very sensitive to the application's communication pattern. Perhaps a
> 2.4 TCP fix would need to be developed after all...
>
In the bisect and bcast plots, there seem to be fewer irregularities but I
have no results now to compare them against what 2.2.x did. So it is hard to
say whether 2.4.x actually improves upon stock 2.2.x.
Sincerely,
- --
Eray Ozkural (exa)
Comp. Sci. Dept., Bilkent University, Ankara
www: http://www.cs.bilkent.edu.tr/~erayo
GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org
iD8DBQE8PdUKfAeuFodNU5wRAnFrAJ9F+Sibp9nP0P+jsrTbmNljB/1WMgCcC1DS
gwlSbAr9lOu9msgiAcOUnoU=
=zn69
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: beo32-2.4-perf.tar.gz
Type: application/x-gzip
Size: 4299 bytes
Desc: mpptest performance results
URL:
From erayo at cs.bilkent.edu.tr Thu Jan 10 12:59:48 2002
From: erayo at cs.bilkent.edu.tr (Eray Ozkural (exa))
Date: Thu, 10 Jan 2002 19:59:48 +0200
Subject: IRC channel on beowulf ???
In-Reply-To: <20020110101322.A11750@blueraja.scyld.com>
References: <3C2A4F440017132A@mail.san.yahoo.com> <20020110101322.A11750@blueraja.scyld.com>
Message-ID:
On Thursday 10 January 2002 17:13, Sean Dilda wrote:
> On Thu, 10 Jan 2002, Abhishek Sinha wrote:
> > Hello All
> >
> > Is there a IRC channel dedicated to "beowulfery " ** . I have been
> > looking for one on beowulf /parallel computing
>
> I personally don't know of any, but I wouldn't be opposed to seeing one
> start up. To me, the obvious place to start such a channel would be
> #beowulf on irc.openprojects.net
Unfortunately when I'm there there is nobody else.
Happy beo'ing,
--
Eray Ozkural (exa)
Comp. Sci. Dept., Bilkent University, Ankara
www: http://www.cs.bilkent.edu.tr/~erayo
GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From patrick at myri.com Thu Jan 10 13:33:10 2002
From: patrick at myri.com (Patrick Geoffray)
Date: Thu, 10 Jan 2002 13:33:10 -0500
Subject: Beowulf with Gigabit Ethernet
References: <20020110154234.A17039@delta.ft.uam.es> <3C3DA86F.189FCC96@scali.no>
Message-ID: <3C3DDE66.35B2D721@myri.com>
Hi ?ystein,
?ystein Gran Larsen wrote:
> Check our ScaMPI datasheet at www.scali.com/download/doc/ScaMPI-DS-A4.pdf.
> It uses ping-pong performance to illustrate the performance of SCI, Myrinet and Ethernets.
Which Myrinet hardware are you using in your test ? It looks like old
1.28 Gb/s interfaces not sold since quite a while (more than a year).
If it's the 2 Gb/s interfaces, the PCI is obviously the bottleneck
(the MPI curve caps normally at 240 MB/s with this equipment on a good
PCI 64/66).
You may want to precise in your document the Myrinet model and the PCI
characteristics of the test machines, and eventually the fact that the
Myrinet equipment used is no longer available. This is required for a
fair interpretation.
Regards
----------------------------------------------------------
| Patrick Geoffray, Ph.D. patrick at myri.com
| Myricom, Inc. http://www.myri.com
| Cell: 865-389-8852 685 Emory Valley Rd (B)
| Phone: 865-425-0978 Oak Ridge, TN 37830
----------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From agrajag at scyld.com Thu Jan 10 14:12:38 2002
From: agrajag at scyld.com (Sean Dilda)
Date: Thu, 10 Jan 2002 14:12:38 -0500
Subject: IRC channel on beowulf ???
In-Reply-To: ; from erayo@cs.bilkent.edu.tr on Thu, Jan 10, 2002 at 07:59:48PM +0200
References: <3C2A4F440017132A@mail.san.yahoo.com> <20020110101322.A11750@blueraja.scyld.com>
Message-ID: <20020110141238.B11750@blueraja.scyld.com>
On Thu, 10 Jan 2002, Eray Ozkural (exa) wrote:
> On Thursday 10 January 2002 17:13, Sean Dilda wrote:
> > On Thu, 10 Jan 2002, Abhishek Sinha wrote:
> > > Hello All
> > >
> > > Is there a IRC channel dedicated to "beowulfery " ** . I have been
> > > looking for one on beowulf /parallel computing
> >
> > I personally don't know of any, but I wouldn't be opposed to seeing one
> > start up. To me, the obvious place to start such a channel would be
> > #beowulf on irc.openprojects.net
>
> Unfortunately when I'm there there is nobody else.
I've popped in there a few times in the past and it was always empty.
However, right now me and two other people are sitting in there, so you
won't be completely alone if you join now.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL:
From erayo at cs.bilkent.edu.tr Thu Jan 10 14:45:08 2002
From: erayo at cs.bilkent.edu.tr (Eray Ozkural (exa))
Date: Thu, 10 Jan 2002 21:45:08 +0200
Subject: diskless nodes? (was Re: Xbox clusters?)
In-Reply-To:
References:
Message-ID:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Friday 07 December 2001 23:08, Troy Baer wrote:
>
> That's true, *if* you're buying $300 nodes. I'm not, though; our node
> cost tends to be around $2500-3000, because we tend to buy server-class
> SMP mobo's, lots of memory, Myrinet, rackmount cases, and a bunch of other
> stuff to keep me from having to walk/drive over to the machine room (in a
> secured building about 1.5 miles away) every time I need to reboot nodeXX.
>
I wonder what kind of hardware you use for being able to do that. It would be
very convenient for me as the system I use is 15 miles from my home.
In the setting I use, there is no video/keyboard/mouse for any nodes. I use a
serial cable in need of hard debugging. Everything else we do on eth. There
is only one thing I can't do: reboot or shutdown a node from the net.
Could you please write a list of the extra gear you have in your system for
remote administration?
Thanks,
- --
Eray Ozkural (exa)
Comp. Sci. Dept., Bilkent University, Ankara
www: http://www.cs.bilkent.edu.tr/~erayo
GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org
iD8DBQE8Pe9GfAeuFodNU5wRAllyAKCS5kBDuLzrZbXXRXuy4tK8tLQTzQCfWlu9
fvodUu/YaHPqEZSPfjvGVbE=
=v2Hd
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From becker at scyld.com Thu Jan 10 15:16:59 2002
From: becker at scyld.com (Donald Becker)
Date: Thu, 10 Jan 2002 15:16:59 -0500 (EST)
Subject: export disks from nodes?
In-Reply-To: <20020110191245M.hanzl@unknown-domain>
Message-ID:
On Thu, 10 Jan 2002 hanzl at noel.feld.cvut.cz wrote:
> >> - anybody figured out all the things needed for NFS server on node?
> >
> >This is the main problem, getting the daemons and the files they require
> >onto the nodes, but even that can be tricky.
>
> I tried and got stuck on this:
>
> # bpsh 0 rpc.mountd --no-nfs-version 3
> svc_tcp.c - cannot getsockname or listen: Invalid argument
> mountd: cannot create tcp service.
>
> ** Any explanation is more than welcome. **
I guessing that the rpc.mountd code doesn't have a default port if
/etc/services doesn't provide a port number.
In our system we omit /etc/services, along with most other /etc/*
configuration files, on the compute/slave nodes. This typically works
great, as applications should know their correct default port.
/etc/services is (should be) only needed if you want to config a
non-standand system.
> For the rest, NFS server on scyld node seemes possible. So far I
> managed to start these (more or less related) parts:
>
> The rpc.mountd problem might be related to resolver/names oddities,
> e.g. netstat says:
>
> # bpsh 0 netstat
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address Foreign Address State
> tcp 0 0 .0:1592 .-1:1936 ESTABLISHED
A quick description of the BeoNSS name service.
BeoNSS is a cluster-specific name service that avoids the drawbacks of
existing workstation-oriented name services. It eliminates per-node
name files, and the serialization and latency of network name lookups.
BeoNSS provides services for hosts, netgroups, password/usernames, and
ethers.
Hostnames
Cluster hostnames have the form .
This syntax was chosen because it does not conflict with valid external
(typically DNS) hostnames.
You may reference the local host as ".-2" or "self".
The master is known as ".-1", with aliases of "master" and "master0".
Cluster nodes start at .0 and extend to e.g. .31.
A compute node has a resolvable name if has been configured, even if
hasn't been physically added. If you configured 32 nodes and only
added 17, ".31" resolves and ".32" returns "no such host".
The currently released BeoNSS reverse-resolves the master as ".-1".
The new BeoNSS reverse-resolves the master as "master" or "master0",
with an hostname alias of ".-1". This was done for to support multiple
masters, but improves the readability for the above case.
The name "self" never appears in reverse name resolution, and ".-2" is
mostly used in the system internals.
Netgroups
The BeoNSS system also supports netgroups, primarily for NFS servers.
The only netgroup is "cluster", with a new alias for "cluster0". This
netgroup returns ".0" through the maximum configured slave node
e.g. ".31", even if not all 32 compute nodes have been added to the
system.
Ethers
The BeoNSS system also reports Ethernet addresses, similar to
/etc/ethers lookups. This service is only available on the master
which maintains the cluster boot configuration.
Donald Becker becker at scyld.com
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210 Second Generation Beowulf Clusters
Annapolis MD 21403 410-990-9993
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From troy at osc.edu Thu Jan 10 15:10:54 2002
From: troy at osc.edu (Troy Baer)
Date: Thu, 10 Jan 2002 15:10:54 -0500
Subject: diskless nodes? (was Re: Xbox clusters?)
In-Reply-To:
Message-ID:
On Thu, 10 Jan 2002, Eray Ozkural (exa) wrote:
> On Friday 07 December 2001 23:08, Troy Baer wrote:
> > That's true, *if* you're buying $300 nodes. I'm not, though; our node
> > cost tends to be around $2500-3000, because we tend to buy server-class
> > SMP mobo's, lots of memory, Myrinet, rackmount cases, and a bunch of other
> > stuff to keep me from having to walk/drive over to the machine room (in a
> > secured building about 1.5 miles away) every time I need to reboot nodeXX.
>
> I wonder what kind of hardware you use for being able to do that. It would be
> very convenient for me as the system I use is 15 miles from my home.
>
> In the setting I use, there is no video/keyboard/mouse for any nodes. I use a
> serial cable in need of hard debugging. Everything else we do on eth. There
> is only one thing I can't do: reboot or shutdown a node from the net.
>
> Could you please write a list of the extra gear you have in your system for
> remote administration?
Each of our cluster systems has a console server with some number of
Cyclades multiport serial cards. The compute nodes are all configured to
send their consoles to a serial port. On some of our older nodes with
mobos that support IPMI, we have a second serial port wired to the console
server for remote BIOS configuration and power control. For the rest we
have networked power controllers. We also have some locally developed
scripts to abstract away the differences, so an admin can just run a command
like "power off node05" and not worry whether it has IPMI or is on a
power controller. BTW, we've got two different types of power controllers,
the widely available APC 8-port 15A controllers and another brand whose
name I don't recall. The coworker of mine who developed our power control
scripts was of the opinion that the APCs are much easier to program for.
We do have a crash cart at each of our machine rooms with a VGA monitor
and keyboard for catastrophic cases where direct attention is required,
but that seems to be used mainly when we're initially configuring a system.
--Troy
--
Troy Baer email: troy at osc.edu
Science & Technology Support phone: 614-292-9701
Ohio Supercomputer Center web: http://oscinfo.osc.edu
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From cly at MIT.EDU Thu Jan 10 20:35:57 2002
From: cly at MIT.EDU (Ron Choy)
Date: Thu, 10 Jan 2002 20:35:57 -0500
Subject: Need advice on cluster hardware
Message-ID: <5.0.2.1.2.20020110201213.00af1090@hesiod>
Hi,
My advisor has made me responsible (*gasp*) for purchasing a 8 node
cluster, used mainly for computational linear algebra problems. After some
research I came up with the following configuration:
Frontend (+file server):
Asus A7M266-D (AMD 760MPX)
Enermax EG365P-VE 350W PS (meets the 15A on 12V requirement of the board)
2 x Athlon MP 1800+ (1.533GHz)
2 x Thermaltake VOLCANO 6Cu+ heatsink
2 x Kingston PC2100 512MB ECC Registered
IBM SCSI HD 36GB 10000RPM
Adaptec SCSI Controller
2 x Netgear GA622T gigabit nic
ATI xpert Rage XL 8MB AGP vid
Floppy drive
Sony 52X cdrom drive
Tower case
Compute nodes:
same as front end, minus 1 nic and SCSI adapter, and replace SCSI HD with
IDE HD.
Switch:
Intel 410T 16 port 10/100 switch
(* The reason why I have gigabit nics and 10/100 switch is that I don't
know if bandwidth is going to be a limit on the computations so I would
rather start out small and expand later. (is this a good idea?) )
Does this configuration looks reasonable? Any known conflicts and driver
issues? I am going to use the 2.4.x kernel.
Thanks!
Ron Choy
MIT LCS, Supercomputing technologies group
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From gran at scali.no Fri Jan 11 05:16:50 2002
From: gran at scali.no (=?iso-8859-1?Q?=D8ystein?= Gran Larsen)
Date: Fri, 11 Jan 2002 11:16:50 +0100
Subject: Beowulf with Gigabit Ethernet
References: <20020110154234.A17039@delta.ft.uam.es> <3C3DA86F.189FCC96@scali.no> <3C3DDE66.35B2D721@myri.com>
Message-ID: <3C3EBB92.AF4C259B@scali.no>
Patrick Geoffray wrote:
> Hi ?ystein,
>
> ?ystein Gran Larsen wrote:
> > Check our ScaMPI datasheet at www.scali.com/download/doc/ScaMPI-DS-A4.pdf.
> > It uses ping-pong performance to illustrate the performance of SCI, Myrinet and Ethernets.
>
> Which Myrinet hardware are you using in your test ? It looks like old
> 1.28 Gb/s interfaces not sold since quite a while (more than a year).
> If it's the 2 Gb/s interfaces, the PCI is obviously the bottleneck
> (the MPI curve caps normally at 240 MB/s with this equipment on a good
> PCI 64/66).
>
> You may want to precise in your document the Myrinet model and the PCI
> characteristics of the test machines, and eventually the fact that the
> Myrinet equipment used is no longer available. This is required for a
> fair interpretation.
>
> Regards
>
> ----------------------------------------------------------
> | Patrick Geoffray, Ph.D. patrick at myri.com
> | Myricom, Inc. http://www.myri.com
> | Cell: 865-389-8852 685 Emory Valley Rd (B)
> | Phone: 865-425-0978 Oak Ridge, TN 37830
> ----------------------------------------------------------
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Hi Patrick,
I agree that the data sheet could be more detailed. As far as I have been able
to uncover the Myrinet card in question was a PCI64B with 4MB SRAM and 2Gbit/s
link speed. The performance measurements were performed outside Scali and made
available to us.
>From time to time we try to find performance numbers of this kind for different
interconnects, but we have so far not been able to find any such (official)
results for Myrinet products. If they were available it could be expected of us
to use them when we try to compare products. In their absence we must rely on
and trust helpful individuals that have access to Myrinet systems for the numbers.
If you can direct us at an official resource with such numbers we would be very
grateful.
By the way, this type of performance numbers for the latest release of our
software is not on our web site yet, but it's on its way. The numbers for previous
releases can be found in the performance section on www.scali.com
Sincerely,
?ystein
--
?ystein Gran Larsen, Dr.Scient mailto:gran at scali.no Tel:+47 2262-8982
---------------------------------------------------------------------
MPI?SCI=HPC -- Scalable Linux Systems -- www.scali.com
Subscribe to our mailing lists at http://www.scali.com/support
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rsimac at thermawave.com Fri Jan 11 10:03:35 2002
From: rsimac at thermawave.com (Rob Simac)
Date: Fri, 11 Jan 2002 07:03:35 -0800
Subject: Fastest Intel Processors
Message-ID: <012501c19ab1$24880f30$6564010a@thermawave.com>
What is the fastest Intel processors available to the public? Have the 2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz has problems running with Red Hat 7.1. Has anyone else heard this?
Ciao,
Rob.
======================================================
Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
======================================================
"Remember this, foolish mortals, when ye stare headlong into the
mind-paralyzing void, the inky black nothingness of existence, the
hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
few minutes for your eyes to adjust."
Frank M. Carrano, Branford, Conn.
(Bulwer-Lytton Writing Contest Runner-up)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rob Simac.vcf
Type: text/x-vcard
Size: 117 bytes
Desc: not available
URL:
From ctierney at hpti.com Fri Jan 11 11:13:26 2002
From: ctierney at hpti.com (Craig Tierney)
Date: Fri, 11 Jan 2002 09:13:26 -0700
Subject: Fastest Intel Processors
In-Reply-To: <012501c19ab1$24880f30$6564010a@thermawave.com>; from rsimac@thermawave.com on Fri, Jan 11, 2002 at 07:03:35AM -0800
References: <012501c19ab1$24880f30$6564010a@thermawave.com>
Message-ID: <20020111091326.C7185@hpti.com>
The 2.2 Ghz cpus have just been released. This generation
of cpu is built at 0.13 microns and has twice as big of a
L2 cache (512 KB). These are the single processor versions.
The Xeon (smp) chips should follow shortly (I am guessing).
Go look at the roadmaps at www.theregister.co.uk. They tend to
be accurate.
I have not heard that there is a problem with the 2.0 Ghz. Is
it a problem with RedHat or the Linux kernel specifically? We
had no problems with the 1.7 Ghz Xeon chips, but they are not the
2.0 Ghz that you are talking about.
Craig
On Fri, Jan 11, 2002 at 07:03:35AM -0800, Rob Simac wrote:
> What is the fastest Intel processors available to the public? Have
> the 2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz
> has problems running with Red Hat 7.1. Has anyone else heard this?
>
> Ciao,
> Rob.
>
> ======================================================
> Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
>
> ======================================================
> "Remember this, foolish mortals, when ye stare headlong into the
> mind-paralyzing void, the inky black nothingness of existence, the
> hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
> few minutes for your eyes to adjust."
> Frank M. Carrano, Branford, Conn.
> (Bulwer-Lytton Writing Contest Runner-up)
--
Craig Tierney (ctierney at hpti.com)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From alberto at delta.ft.uam.es Fri Jan 11 11:09:17 2002
From: alberto at delta.ft.uam.es (Alberto Ramos)
Date: Fri, 11 Jan 2002 17:09:17 +0100
Subject: Fastest Intel Processors
In-Reply-To: <012501c19ab1$24880f30$6564010a@thermawave.com>; from rsimac@thermawave.com on Fri, Jan 11, 2002 at 07:03:35AM -0800
References: <012501c19ab1$24880f30$6564010a@thermawave.com>
Message-ID: <20020111170917.A28757@delta.ft.uam.es>
Sure that P4-2.0GHz is avaible for the public. I think that 7-Jan Intel
presented his new P4-2.2Ghz, and AMD his XP-2000+.
About the problems with RedHat, i have no idea, but it sounds strange...
Alberto.
On Fri, Jan 11, 2002 at 07:03:35AM -0800, Rob Simac wrote:
> What is the fastest Intel processors available to the public? Have the 2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz has problems running with Red Hat 7.1. Has anyone else heard this?
>
> Ciao,
> Rob.
>
> ======================================================
> Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
> ======================================================
> "Remember this, foolish mortals, when ye stare headlong into the
> mind-paralyzing void, the inky black nothingness of existence, the
> hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
> few minutes for your eyes to adjust."
> Frank M. Carrano, Branford, Conn.
> (Bulwer-Lytton Writing Contest Runner-up)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From cozzi at nd.edu Fri Jan 11 11:27:26 2002
From: cozzi at nd.edu (Marc Cozzi)
Date: Fri, 11 Jan 2002 11:27:26 -0500
Subject: Fastest Intel Processors
Message-ID:
googlegear.com shows 2.20GHz intels in stock at $609.
http://www.googlegear.com/ggweb/jsp/ProductDetail.jsp?ProductCode=80637-OEM
marc
-----Original Message-----
From: Rob Simac [mailto:rsimac at thermawave.com]
Sent: Friday, January 11, 2002 10:04 AM
To: beowulf at beowulf.org
Subject: Fastest Intel Processors
What is the fastest Intel processors available to the public? Have the
2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz has
problems running with Red Hat 7.1. Has anyone else heard this?
Ciao,
Rob.
======================================================
Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
======================================================
"Remember this, foolish mortals, when ye stare headlong into the
mind-paralyzing void, the inky black nothingness of existence, the
hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
few minutes for your eyes to adjust."
Frank M. Carrano, Branford, Conn.
(Bulwer-Lytton Writing Contest Runner-up)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From astroguy at bellsouth.net Fri Jan 11 11:31:55 2002
From: astroguy at bellsouth.net (.)
Date: Fri, 11 Jan 2002 11:31:55 -0500
Subject: Fastest Intel Processors
In-Reply-To:
Message-ID:
-----Original Message-----
From: . [mailto:astroguy at bellsouth.net]
Sent: Friday, January 11, 2002 11:25 AM
To: Rob Simac
Subject: RE: Fastest Intel Processors
Hi Rob,
I really think this Ghz. goobly-gook is way over rated. I think, if you
look more closley at the problem, the key to superior proformance is in the
found in the actual construction or architecture of the processor. The
Itanium, for example only runs at 733 Mghz but it has (on die) 2meg of
cashe on the L-2... The Alpha, in its previous incarnation ran at only 233
but did 64 bit processing... Quite amazing actually. A friend wrote me
that:
"Red hat is working closely with Compaq to solidify its OS to the Alpha
architecture"... I don't know for sure, first hand... But it is, of course,
the next logical step, since FreeBSD, already provides a clean UNIX 64 bit
processing code for the Alpha.
-----Original Message-----
From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
Behalf Of Rob Simac
Sent: Friday, January 11, 2002 10:04 AM
To: beowulf at beowulf.org
Subject: Fastest Intel Processors
What is the fastest Intel processors available to the public? Have
the 2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz has
problems running with Red Hat 7.1. Has anyone else heard this?
Ciao,
Rob.
======================================================
Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
======================================================
"Remember this, foolish mortals, when ye stare headlong into the
mind-paralyzing void, the inky black nothingness of existence, the
hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
few minutes for your eyes to adjust."
Frank M. Carrano, Branford, Conn.
(Bulwer-Lytton Writing Contest Runner-up)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From joelja at darkwing.uoregon.edu Fri Jan 11 11:38:35 2002
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Fri, 11 Jan 2002 08:38:35 -0800 (PST)
Subject: Fastest Intel Processors
In-Reply-To: <012501c19ab1$24880f30$6564010a@thermawave.com>
Message-ID:
There are 2.2ghz p4's, these are based on the .13 micron northwood core
rather than the willamete. to date I haven't heard of anyone having issues
with these... drop one on your socket 478 mainbaord and go to town... ;)
joelja
On Fri, 11 Jan 2002, Rob Simac wrote:
> What is the fastest Intel processors available to the public? Have the 2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz has problems running with Red Hat 7.1. Has anyone else heard this?
>
> Ciao,
> Rob.
>
> ======================================================
> Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
> ======================================================
> "Remember this, foolish mortals, when ye stare headlong into the
> mind-paralyzing void, the inky black nothingness of existence, the
> hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
> few minutes for your eyes to adjust."
> Frank M. Carrano, Branford, Conn.
> (Bulwer-Lytton Writing Contest Runner-up)
>
--
--------------------------------------------------------------------------
Joel Jaeggli Academic User Services joelja at darkwing.uoregon.edu
-- PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E --
The accumulation of all powers, legislative, executive, and judiciary, in
the same hands, whether of one, a few, or many, and whether hereditary,
selfappointed, or elective, may justly be pronounced the very definition of
tyranny. - James Madison, Federalist Papers 47 - Feb 1, 1788
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From astroguy at bellsouth.net Fri Jan 11 11:40:19 2002
From: astroguy at bellsouth.net (C.Clary)
Date: Fri, 11 Jan 2002 11:40:19 -0500
Subject: Fastest Intel Processors
In-Reply-To: <20020111091326.C7185@hpti.com>
Message-ID:
Hi all,
I really think this Ghz. goobly-gook is way over rated. I think, if you
look more closley at the problem, the key to superior proformance is in the
found in the actual construction or architecture of the processor. The
Itanium, for example only runs at 733 Mghz but it has (on die) 2meg of
cashe on the L-2... The Alpha, in its previous incarnation ran at only 233
but did 64 bit processing... Quite amazing actually.
A friend wrote me that:
"Red hat is working closely with Compaq to solidify its OS to the Alpha
architecture"... I don't know for sure, first hand... But it is, of course,
the next logical step, since FreeBSD, already provides a clean UNIX 64 bit
processing code for the Alpha.
Chip
-----Original Message-----
From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
Behalf Of Craig Tierney
Sent: Friday, January 11, 2002 11:13 AM
To: Rob Simac
Cc: beowulf at beowulf.org
Subject: Re: Fastest Intel Processors
The 2.2 Ghz cpus have just been released. This generation
of cpu is built at 0.13 microns and has twice as big of a
L2 cache (512 KB). These are the single processor versions.
The Xeon (smp) chips should follow shortly (I am guessing).
Go look at the roadmaps at www.theregister.co.uk. They tend to
be accurate.
I have not heard that there is a problem with the 2.0 Ghz. Is
it a problem with RedHat or the Linux kernel specifically? We
had no problems with the 1.7 Ghz Xeon chips, but they are not the
2.0 Ghz that you are talking about.
Craig
On Fri, Jan 11, 2002 at 07:03:35AM -0800, Rob Simac wrote:
> What is the fastest Intel processors available to the public? Have
> the 2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz
> has problems running with Red Hat 7.1. Has anyone else heard this?
>
> Ciao,
> Rob.
>
> ======================================================
> Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
>
> ======================================================
> "Remember this, foolish mortals, when ye stare headlong into the
> mind-paralyzing void, the inky black nothingness of existence, the
> hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
> few minutes for your eyes to adjust."
> Frank M. Carrano, Branford, Conn.
> (Bulwer-Lytton Writing Contest Runner-up)
--
Craig Tierney (ctierney at hpti.com)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From timm at fnal.gov Fri Jan 11 11:46:00 2002
From: timm at fnal.gov (Steven Timm)
Date: Fri, 11 Jan 2002 10:46:00 -0600 (CST)
Subject: Boards for Fastest Intel Processors
In-Reply-To: <20020111091326.C7185@hpti.com>
Message-ID:
Does anyone have any recommendations on what type of dual Xeon
motherboard to get to be able to run these new 2.2 GHz processors?
I have heard a lot about the Supermicro boards but am suspect due
to problems with several earlier versions of Supermicro PIII boards.
Is there any word on when Intel will be releasing their new
board with the native Intel chipset instead of a Serverworks chipset?
Steve
------------------------------------------------------------------
Steven C. Timm (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations
On Fri, 11 Jan 2002, Craig Tierney wrote:
> The 2.2 Ghz cpus have just been released. This generation
> of cpu is built at 0.13 microns and has twice as big of a
> L2 cache (512 KB). These are the single processor versions.
> The Xeon (smp) chips should follow shortly (I am guessing).
> Go look at the roadmaps at www.theregister.co.uk. They tend to
> be accurate.
>
> I have not heard that there is a problem with the 2.0 Ghz. Is
> it a problem with RedHat or the Linux kernel specifically? We
> had no problems with the 1.7 Ghz Xeon chips, but they are not the
> 2.0 Ghz that you are talking about.
>
> Craig
>
> On Fri, Jan 11, 2002 at 07:03:35AM -0800, Rob Simac wrote:
> > What is the fastest Intel processors available to the public? Have
> > the 2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz
> > has problems running with Red Hat 7.1. Has anyone else heard this?
> >
> > Ciao,
> > Rob.
> >
> > ======================================================
> > Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
> >
> > ======================================================
> > "Remember this, foolish mortals, when ye stare headlong into the
> > mind-paralyzing void, the inky black nothingness of existence, the
> > hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
> > few minutes for your eyes to adjust."
> > Frank M. Carrano, Branford, Conn.
> > (Bulwer-Lytton Writing Contest Runner-up)
>
>
>
> --
> Craig Tierney (ctierney at hpti.com)
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From joelja at darkwing.uoregon.edu Fri Jan 11 12:11:05 2002
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Fri, 11 Jan 2002 09:11:05 -0800 (PST)
Subject: Fastest Intel Processors
In-Reply-To: <200201111700.UAA11054@nocserv.free.net>
Message-ID:
The intial review I saw used an abit bd7 raid which an i845 based ddr
mainboard... the voltage drop is from 1.7 to .1.475 volts which coupled
with the die shrink seems to have knocked about 20 watts off the peak
power consumption.
joelja
On Fri, 11 Jan 102, Mikhail Kuzminsky wrote:
> According to Joel Jaeggli
> >
> > There are 2.2ghz p4's, these are based on the .13 micron northwood core
> > rather than the willamete. to date I haven't heard of anyone having issues
> > with these... drop one on your socket 478 mainbaord and go to town... ;)
> >
> As I understand, It'll be not right for all the motherboards.
> Northwood will have different available voltage values for different I
> (ampers), so you need really special VRM version which may be not
> present on your motherboard. At least for Tualatin core it's just
> as I said.
>
> Mikhail Kuzminsky
> Zelinsky Inst. of Organic Chemistry
> Moscow
>
--
--------------------------------------------------------------------------
Joel Jaeggli Academic User Services joelja at darkwing.uoregon.edu
-- PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E --
The accumulation of all powers, legislative, executive, and judiciary, in
the same hands, whether of one, a few, or many, and whether hereditary,
selfappointed, or elective, may justly be pronounced the very definition of
tyranny. - James Madison, Federalist Papers 47 - Feb 1, 1788
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From timm at fnal.gov Fri Jan 11 12:20:40 2002
From: timm at fnal.gov (Steven Timm)
Date: Fri, 11 Jan 2002 11:20:40 -0600 (CST)
Subject: <4>eth1: Too much work at interrupt, status=0x30000.
Message-ID:
I am running Linux kernel 2.4.9-12 on a dual PIII system,
Intel STL2 motherboard.
I have two ethernet interfaces in the system, eth0 and eth1,
eth0 is a eepro100, eth1 is a hamachi from packet devices.
I had the error message happening before but I upgraded the
kernel to 2.4.9-12 and it is continuing, worse than before.
The version of the hamachi.c driver is
#define DRV_NAME "hamachi"
#define DRV_VERSION "1.01+LK1.0.1"
#define DRV_RELDATE "5/18/2001"
Now the error is so frequent that the system doesn't seem to be
able to boot all the way.
How do we beat this problem? Is a newer version of the driver available?
Steve Timm
------------------------------------------------------------------
Steven C. Timm (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From hanzl at noel.feld.cvut.cz Fri Jan 11 13:35:11 2002
From: hanzl at noel.feld.cvut.cz (hanzl at noel.feld.cvut.cz)
Date: Fri, 11 Jan 2002 19:35:11 +0100
Subject: export disks from nodes?
In-Reply-To:
References: <20020110191245M.hanzl@unknown-domain>
Message-ID: <20020111193511U.hanzl@unknown-domain>
>> # bpsh 0 rpc.mountd --no-nfs-version 3
>> svc_tcp.c - cannot getsockname or listen: Invalid argument
>> mountd: cannot create tcp service.
>>
>> ** Any explanation is more than welcome. **
>
>I guessing that the rpc.mountd code doesn't have a default port if
>/etc/services doesn't provide a port number.
Bad luck. Behaves the same with /etc/services. Now I have:
# bpsh 0 ls /etc
exports ld.so.cache localtime mtab sswitch.conf protocols rpc services
# bpsh 0 ls -R /var
/var: lib lock nis run
/var/lib: nfs
/var/lib/nfs: sm sm.bak state
/var/lib/nfs/sm:
/var/lib/nfs/sm.bak:
/var/lock: subsys
/var/lock/subsys:
/var/nis:
/var/run:
Any other idea?
(Well, I know I should find full sources and look...)
> A quick description of the BeoNSS name service.
> ...
Thanks a lot for description.
Regards
Vaclav
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From dwu at swales.com Fri Jan 11 12:33:22 2002
From: dwu at swales.com (Dominic Wu)
Date: Fri, 11 Jan 2002 09:33:22 -0800
Subject: Fastest Intel Processors
In-Reply-To:
Message-ID:
Samsung bought the Alpha processor from Compaq some times back. Compaq has
all but abandoned what they had acquired from DEC. The current merger
attempt with HP is but a sign of Compaq's DEC indegigestion.
-----Original Message-----
From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
Behalf Of C.Clary
Sent: Friday, January 11, 2002 8:40 AM
To: Craig Tierney; Rob Simac
Cc: beowulf at beowulf.org
Subject: RE: Fastest Intel Processors
Hi all,
I really think this Ghz. goobly-gook is way over rated. I think, if you
look more closley at the problem, the key to superior proformance is in the
found in the actual construction or architecture of the processor. The
Itanium, for example only runs at 733 Mghz but it has (on die) 2meg of
cashe on the L-2... The Alpha, in its previous incarnation ran at only 233
but did 64 bit processing... Quite amazing actually.
A friend wrote me that:
"Red hat is working closely with Compaq to solidify its OS to the Alpha
architecture"... I don't know for sure, first hand... But it is, of course,
the next logical step, since FreeBSD, already provides a clean UNIX 64 bit
processing code for the Alpha.
Chip
-----Original Message-----
From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
Behalf Of Craig Tierney
Sent: Friday, January 11, 2002 11:13 AM
To: Rob Simac
Cc: beowulf at beowulf.org
Subject: Re: Fastest Intel Processors
The 2.2 Ghz cpus have just been released. This generation
of cpu is built at 0.13 microns and has twice as big of a
L2 cache (512 KB). These are the single processor versions.
The Xeon (smp) chips should follow shortly (I am guessing).
Go look at the roadmaps at www.theregister.co.uk. They tend to
be accurate.
I have not heard that there is a problem with the 2.0 Ghz. Is
it a problem with RedHat or the Linux kernel specifically? We
had no problems with the 1.7 Ghz Xeon chips, but they are not the
2.0 Ghz that you are talking about.
Craig
On Fri, Jan 11, 2002 at 07:03:35AM -0800, Rob Simac wrote:
> What is the fastest Intel processors available to the public? Have
> the 2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz
> has problems running with Red Hat 7.1. Has anyone else heard this?
>
> Ciao,
> Rob.
>
> ======================================================
> Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
>
> ======================================================
> "Remember this, foolish mortals, when ye stare headlong into the
> mind-paralyzing void, the inky black nothingness of existence, the
> hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
> few minutes for your eyes to adjust."
> Frank M. Carrano, Branford, Conn.
> (Bulwer-Lytton Writing Contest Runner-up)
--
Craig Tierney (ctierney at hpti.com)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From dwu at swales.com Fri Jan 11 12:33:05 2002
From: dwu at swales.com (Dominic Wu)
Date: Fri, 11 Jan 2002 09:33:05 -0800
Subject: Fastest Intel Processors
In-Reply-To:
Message-ID:
-----Original Message-----
From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
Behalf Of C.Clary
Sent: Friday, January 11, 2002 8:40 AM
To: Craig Tierney; Rob Simac
Cc: beowulf at beowulf.org
Subject: RE: Fastest Intel Processors
Hi all,
I really think this Ghz. goobly-gook is way over rated. I think, if you
look more closley at the problem, the key to superior proformance is in the
found in the actual construction or architecture of the processor. The
Itanium, for example only runs at 733 Mghz but it has (on die) 2meg of
cashe on the L-2... The Alpha, in its previous incarnation ran at only 233
but did 64 bit processing... Quite amazing actually.
A friend wrote me that:
"Red hat is working closely with Compaq to solidify its OS to the Alpha
architecture"... I don't know for sure, first hand... But it is, of course,
the next logical step, since FreeBSD, already provides a clean UNIX 64 bit
processing code for the Alpha.
Chip
-----Original Message-----
From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
Behalf Of Craig Tierney
Sent: Friday, January 11, 2002 11:13 AM
To: Rob Simac
Cc: beowulf at beowulf.org
Subject: Re: Fastest Intel Processors
The 2.2 Ghz cpus have just been released. This generation
of cpu is built at 0.13 microns and has twice as big of a
L2 cache (512 KB). These are the single processor versions.
The Xeon (smp) chips should follow shortly (I am guessing).
Go look at the roadmaps at www.theregister.co.uk. They tend to
be accurate.
I have not heard that there is a problem with the 2.0 Ghz. Is
it a problem with RedHat or the Linux kernel specifically? We
had no problems with the 1.7 Ghz Xeon chips, but they are not the
2.0 Ghz that you are talking about.
Craig
On Fri, Jan 11, 2002 at 07:03:35AM -0800, Rob Simac wrote:
> What is the fastest Intel processors available to the public? Have
> the 2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz
> has problems running with Red Hat 7.1. Has anyone else heard this?
>
> Ciao,
> Rob.
>
> ======================================================
> Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
>
> ======================================================
> "Remember this, foolish mortals, when ye stare headlong into the
> mind-paralyzing void, the inky black nothingness of existence, the
> hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
> few minutes for your eyes to adjust."
> Frank M. Carrano, Branford, Conn.
> (Bulwer-Lytton Writing Contest Runner-up)
--
Craig Tierney (ctierney at hpti.com)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From dwu at swales.com Fri Jan 11 12:34:11 2002
From: dwu at swales.com (Dominic Wu)
Date: Fri, 11 Jan 2002 09:34:11 -0800
Subject: Fastest Intel Processors
In-Reply-To: <20020111091326.C7185@hpti.com>
Message-ID:
One could, of course, go to the source: http://www.intel.com/ and confirm
the same information (2.2GHz, .13micron, 512K cache, etc.)
What they mention on the 512K cache, though, was more interesting:
512KB or 256 KB, Level 2 Advanced Transfer Cache
512 KB L2 Advanced Transfer Cache (ATC) is available with speeds 2.20 GHz
and 2 GHz. 256 KB L2 ATC is available with speeds 1.30 GHz to 2 GHz. The
Level 2 ATC delivers a much higher data throughput channel between the Level
2 cache and the processor core. The Advanced Transfer Cache consists of a
256-bit (32-byte) interface that transfers data on each core clock. As a
result, the IntelR PentiumR 4 processor at 2.20 GHz can deliver a data
transfer rate of 70 GB/s. This compares to a transfer rate of 16 GB/s on the
PentiumR III processor at 1 GHz. Features of the ATC include:
Non-Blocking, full speed, on-die level 2 cache
8-way set associativity
256-bit data bus to the level 2 cache
Data clocked into and out of the cache every clock cycle
How does the market speak of the data transfer rate that is almost 5 fold
increase over P3s sit with our esteemed colleagues?
-----Original Message-----
From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
Behalf Of Craig Tierney
Sent: Friday, January 11, 2002 8:13 AM
To: Rob Simac
Cc: beowulf at beowulf.org
Subject: Re: Fastest Intel Processors
The 2.2 Ghz cpus have just been released. This generation
of cpu is built at 0.13 microns and has twice as big of a
L2 cache (512 KB). These are the single processor versions.
The Xeon (smp) chips should follow shortly (I am guessing).
Go look at the roadmaps at www.theregister.co.uk. They tend to
be accurate.
I have not heard that there is a problem with the 2.0 Ghz. Is
it a problem with RedHat or the Linux kernel specifically? We
had no problems with the 1.7 Ghz Xeon chips, but they are not the
2.0 Ghz that you are talking about.
Craig
On Fri, Jan 11, 2002 at 07:03:35AM -0800, Rob Simac wrote:
> What is the fastest Intel processors available to the public? Have
> the 2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz
> has problems running with Red Hat 7.1. Has anyone else heard this?
>
> Ciao,
> Rob.
>
> ======================================================
> Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
>
> ======================================================
> "Remember this, foolish mortals, when ye stare headlong into the
> mind-paralyzing void, the inky black nothingness of existence, the
> hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
> few minutes for your eyes to adjust."
> Frank M. Carrano, Branford, Conn.
> (Bulwer-Lytton Writing Contest Runner-up)
--
Craig Tierney (ctierney at hpti.com)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From jeffrey.b.layton at lmco.com Fri Jan 11 13:57:10 2002
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Fri, 11 Jan 2002 13:57:10 -0500
Subject: Fastest Intel Processors
References:
Message-ID: <3C3F3586.BE786B94@lmco.com>
"C.Clary" wrote:
> Hi all,
> I really think this Ghz. goobly-gook is way over rated. I think, if you
> look more closley at the problem, the key to superior proformance is in the
> found in the actual construction or architecture of the processor. The
> Itanium, for example only runs at 733 Mghz but it has (on die) 2meg of
> cashe on the L-2... The Alpha, in its previous incarnation ran at only 233
> but did 64 bit processing... Quite amazing actually.
However, it depends on what you're doing. Our CFD code only ran
50% faster per CPU on an Alpha/Myrinet cluster compared to a
PII/450 cluster with Fast Ethernet (this was about 2 years ago).
We still haven't seen a need to go with Alpha (haven't tested
Itanium yet).
So, again, it depends on what you're doing and what your ultimate
goal is (but then again, the good Beowulf folks already know this :)
Jeff
>
>
> A friend wrote me that:
> "Red hat is working closely with Compaq to solidify its OS to the Alpha
> architecture"... I don't know for sure, first hand... But it is, of course,
> the next logical step, since FreeBSD, already provides a clean UNIX 64 bit
> processing code for the Alpha.
> Chip
>
> -----Original Message-----
> From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
> Behalf Of Craig Tierney
> Sent: Friday, January 11, 2002 11:13 AM
> To: Rob Simac
> Cc: beowulf at beowulf.org
> Subject: Re: Fastest Intel Processors
>
> The 2.2 Ghz cpus have just been released. This generation
> of cpu is built at 0.13 microns and has twice as big of a
> L2 cache (512 KB). These are the single processor versions.
> The Xeon (smp) chips should follow shortly (I am guessing).
> Go look at the roadmaps at www.theregister.co.uk. They tend to
> be accurate.
>
> I have not heard that there is a problem with the 2.0 Ghz. Is
> it a problem with RedHat or the Linux kernel specifically? We
> had no problems with the 1.7 Ghz Xeon chips, but they are not the
> 2.0 Ghz that you are talking about.
>
> Craig
>
> On Fri, Jan 11, 2002 at 07:03:35AM -0800, Rob Simac wrote:
> > What is the fastest Intel processors available to the public? Have
> > the 2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz
> > has problems running with Red Hat 7.1. Has anyone else heard this?
> >
> > Ciao,
> > Rob.
> >
> > ======================================================
> > Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
> >
> > ======================================================
> > "Remember this, foolish mortals, when ye stare headlong into the
> > mind-paralyzing void, the inky black nothingness of existence, the
> > hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
> > few minutes for your eyes to adjust."
> > Frank M. Carrano, Branford, Conn.
> > (Bulwer-Lytton Writing Contest Runner-up)
>
> --
> Craig Tierney (ctierney at hpti.com)
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From alex at compusys.co.uk Fri Jan 11 12:40:16 2002
From: alex at compusys.co.uk (alex at compusys.co.uk)
Date: Fri, 11 Jan 2002 17:40:16 +0000 (GMT)
Subject: Beowulf with Gigabit Ethernet
In-Reply-To: <200201111648.g0BGmOR15296@blueraja.scyld.com>
Message-ID:
I think that in general point to point performance information has a
limited value, whatever vendors might quote on their web-page. SCI might
be performing pretty well if it comes down to just latency and bandwidth
between two machines, but it is a ring topology. If you have
more machines on a ring they will share that same bandwidth.
The price performance ratio is dependent on what application you are
running plus how many nodes you need: 3D SCI is more expensive in
comparison to Myrinet while 2D SCI is cheaper.
Alex
> Patrick Geoffray wrote:
>
> > Hi ?ystein,
> >
> > ?ystein Gran Larsen wrote:
> > > Check our ScaMPI datasheet at www.scali.com/download/doc/ScaMPI-DS-A4.pdf.
> > > It uses ping-pong performance to illustrate the performance of SCI, Myrinet and Ethernets.
> >
> > Which Myrinet hardware are you using in your test ? It looks like old
> > 1.28 Gb/s interfaces not sold since quite a while (more than a year).
> > If it's the 2 Gb/s interfaces, the PCI is obviously the bottleneck
> > (the MPI curve caps normally at 240 MB/s with this equipment on a good
> > PCI 64/66).
> >
> > You may want to precise in your document the Myrinet model and the PCI
> > characteristics of the test machines, and eventually the fact that the
> > Myrinet equipment used is no longer available. This is required for a
> > fair interpretation.
> >
> > Regards
> >
> > ----------------------------------------------------------
> > | Patrick Geoffray, Ph.D. patrick at myri.com
> > | Myricom, Inc. http://www.myri.com
> > | Cell: 865-389-8852 685 Emory Valley Rd (B)
> > | Phone: 865-425-0978 Oak Ridge, TN 37830
> > ----------------------------------------------------------
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
> Hi Patrick,
>
> I agree that the data sheet could be more detailed. As far as I have been able
> to uncover the Myrinet card in question was a PCI64B with 4MB SRAM and 2Gbit/s
> link speed. The performance measurements were performed outside Scali and made
> available to us.
>
> >From time to time we try to find performance numbers of this kind for different
> interconnects, but we have so far not been able to find any such (official)
> results for Myrinet products. If they were available it could be expected of us
> to use them when we try to compare products. In their absence we must rely on
> and trust helpful individuals that have access to Myrinet systems for the numbers.
> If you can direct us at an official resource with such numbers we would be very
> grateful.
>
> By the way, this type of performance numbers for the latest release of our
> software is not on our web site yet, but it's on its way. The numbers for previous
> releases can be found in the performance section on www.scali.com
>
> Sincerely,
> ?ystein
>
> --
> ?ystein Gran Larsen, Dr.Scient mailto:gran at scali.no Tel:+47 2262-8982
> ---------------------------------------------------------------------
> MPI?SCI=HPC -- Scalable Linux Systems -- www.scali.com
> Subscribe to our mailing lists at http://www.scali.com/support
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From ctierney at hpti.com Fri Jan 11 13:34:58 2002
From: ctierney at hpti.com (Craig Tierney)
Date: Fri, 11 Jan 2002 11:34:58 -0700
Subject: Boards for Fastest Intel Processors
In-Reply-To: ; from timm@fnal.gov on Fri, Jan 11, 2002 at 10:46:00AM -0600
References: <20020111091326.C7185@hpti.com>
Message-ID: <20020111113458.A7345@hpti.com>
Not sure on the availability. Even if I did know, it would
be NDA. The serverworks chipset, Grand Champion HE, looks very
promising. PCI-X, uses DDR-RAM, had 6.4 GB/s memory bandwidth.
The i860 chipset has 3.2 GB/s. Each P4 and Xeon is said to
drive 3.2 GB/s. If Supermicro puts the HE chipset on the dual
board which I think is going to happen I am going to drool.
The current Xeon at 1.7 Ghz puts up a great fight with Alpha
833 Mhz chips when you use the right compiler. Performance of
the Xeon is only about 20% slower but at 1/4 the cost. Increased
cache and the increase in the memory bandwidth for the P4 in a dual setup
will be better for my codes than Alpha. Your codes may vary.
Craig
On Fri, Jan 11, 2002 at 10:46:00AM -0600, Steven Timm wrote:
> Does anyone have any recommendations on what type of dual Xeon
> motherboard to get to be able to run these new 2.2 GHz processors?
> I have heard a lot about the Supermicro boards but am suspect due
> to problems with several earlier versions of Supermicro PIII boards.
>
> Is there any word on when Intel will be releasing their new
> board with the native Intel chipset instead of a Serverworks chipset?
>
> Steve
>
>
> ------------------------------------------------------------------
> Steven C. Timm (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/
> Fermilab Computing Division/Operating Systems Support
> Scientific Computing Support Group--Computing Farms Operations
>
> On Fri, 11 Jan 2002, Craig Tierney wrote:
>
> > The 2.2 Ghz cpus have just been released. This generation
> > of cpu is built at 0.13 microns and has twice as big of a
> > L2 cache (512 KB). These are the single processor versions.
> > The Xeon (smp) chips should follow shortly (I am guessing).
> > Go look at the roadmaps at www.theregister.co.uk. They tend to
> > be accurate.
> >
> > I have not heard that there is a problem with the 2.0 Ghz. Is
> > it a problem with RedHat or the Linux kernel specifically? We
> > had no problems with the 1.7 Ghz Xeon chips, but they are not the
> > 2.0 Ghz that you are talking about.
> >
> > Craig
> >
> > On Fri, Jan 11, 2002 at 07:03:35AM -0800, Rob Simac wrote:
> > > What is the fastest Intel processors available to the public? Have
> > > the 2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz
> > > has problems running with Red Hat 7.1. Has anyone else heard this?
> > >
> > > Ciao,
> > > Rob.
> > >
> > > ======================================================
> > > Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
> > >
> > > ======================================================
> > > "Remember this, foolish mortals, when ye stare headlong into the
> > > mind-paralyzing void, the inky black nothingness of existence, the
> > > hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
> > > few minutes for your eyes to adjust."
> > > Frank M. Carrano, Branford, Conn.
> > > (Bulwer-Lytton Writing Contest Runner-up)
> >
> >
> >
> > --
> > Craig Tierney (ctierney at hpti.com)
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
--
Craig Tierney (ctierney at hpti.com)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From daniel.pfenniger at obs.unige.ch Fri Jan 11 13:42:11 2002
From: daniel.pfenniger at obs.unige.ch (Daniel Pfenniger)
Date: Fri, 11 Jan 2002 19:42:11 +0100
Subject: Fastest Intel Processors
References:
Message-ID: <3C3F3203.A682CC0A@obs.unige.ch>
astroguy at bellsouth.net wrote
>I really think this Ghz. goobly-gook is way over rated. I think, if you look more >closley at the problem, the
> key to superior proformance is in the found in the actual construction or >architecture of the processor.
> The Itanium, for example only runs at 733 Mghz but it has (on die) 2meg of >cashe on the L-2... The
> Alpha, in its previous incarnation ran at only 233 but did 64 bit processing... >Quite amazing actually.
The key to superior performance in Beowulf clusters is to compare the
performance/*cost* for your application. For the moment Alpha and
especially Itanium solutions appear too expensive.
Dan
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From wsb at paralleldata.com Fri Jan 11 14:13:28 2002
From: wsb at paralleldata.com (W Bauske)
Date: Fri, 11 Jan 2002 13:13:28 -0600
Subject: Boards for Fastest Intel Processors
References:
Message-ID: <3C3F3958.291142C6@paralleldata.com>
Supermicro's P4 Xeon dual uses the Intel 860 chipset as do all dual
P4 Xeon boards I'm aware of. Also, the current 2.2Ghz chips are not
Xeon's from what I read anyway so only singles are available at this
point.
Wes
Steven Timm wrote:
>
> Does anyone have any recommendations on what type of dual Xeon
> motherboard to get to be able to run these new 2.2 GHz processors?
> I have heard a lot about the Supermicro boards but am suspect due
> to problems with several earlier versions of Supermicro PIII boards.
>
> Is there any word on when Intel will be releasing their new
> board with the native Intel chipset instead of a Serverworks chipset?
>
> Steve
>
> ------------------------------------------------------------------
> Steven C. Timm (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/
> Fermilab Computing Division/Operating Systems Support
> Scientific Computing Support Group--Computing Farms Operations
>
> On Fri, 11 Jan 2002, Craig Tierney wrote:
>
> > The 2.2 Ghz cpus have just been released. This generation
> > of cpu is built at 0.13 microns and has twice as big of a
> > L2 cache (512 KB). These are the single processor versions.
> > The Xeon (smp) chips should follow shortly (I am guessing).
> > Go look at the roadmaps at www.theregister.co.uk. They tend to
> > be accurate.
> >
> > I have not heard that there is a problem with the 2.0 Ghz. Is
> > it a problem with RedHat or the Linux kernel specifically? We
> > had no problems with the 1.7 Ghz Xeon chips, but they are not the
> > 2.0 Ghz that you are talking about.
> >
> > Craig
> >
> > On Fri, Jan 11, 2002 at 07:03:35AM -0800, Rob Simac wrote:
> > > What is the fastest Intel processors available to the public? Have
> > > the 2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz
> > > has problems running with Red Hat 7.1. Has anyone else heard this?
> > >
> > > Ciao,
> > > Rob.
> > >
> > > ======================================================
> > > Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
> > >
> > > ======================================================
> > > "Remember this, foolish mortals, when ye stare headlong into the
> > > mind-paralyzing void, the inky black nothingness of existence, the
> > > hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
> > > few minutes for your eyes to adjust."
> > > Frank M. Carrano, Branford, Conn.
> > > (Bulwer-Lytton Writing Contest Runner-up)
> >
> >
> >
> > --
> > Craig Tierney (ctierney at hpti.com)
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From wsb at paralleldata.com Fri Jan 11 14:38:06 2002
From: wsb at paralleldata.com (W Bauske)
Date: Fri, 11 Jan 2002 13:38:06 -0600
Subject: Fastest Intel Processors
References:
Message-ID: <3C3F3F1E.5E97F6C1@paralleldata.com>
Mhz/Ghz matter. So does IPC (Instructions per clock). Word width
determines which architecture you must have to run on.
Consider that the IBM Power4 chip runs at 1.3Ghz yet it has the
fastest SPECFP2000 of any machine on the list(1169). IBM Power chips
have always used substantial on chip parallelism. A 2.2Ghz P4 is
766 for SPECFP2000.
Keep in mind that the way to measure how well a processor works is
to use your own codes and decide which one does it best for the $$$$
spent.
Wes
"C.Clary" wrote:
>
> Hi all,
> I really think this Ghz. goobly-gook is way over rated. I think, if you
> look more closley at the problem, the key to superior proformance is in the
> found in the actual construction or architecture of the processor. The
> Itanium, for example only runs at 733 Mghz but it has (on die) 2meg of
> cashe on the L-2... The Alpha, in its previous incarnation ran at only 233
> but did 64 bit processing... Quite amazing actually.
>
> A friend wrote me that:
> "Red hat is working closely with Compaq to solidify its OS to the Alpha
> architecture"... I don't know for sure, first hand... But it is, of course,
> the next logical step, since FreeBSD, already provides a clean UNIX 64 bit
> processing code for the Alpha.
> Chip
>
> -----Original Message-----
> From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
> Behalf Of Craig Tierney
> Sent: Friday, January 11, 2002 11:13 AM
> To: Rob Simac
> Cc: beowulf at beowulf.org
> Subject: Re: Fastest Intel Processors
>
> The 2.2 Ghz cpus have just been released. This generation
> of cpu is built at 0.13 microns and has twice as big of a
> L2 cache (512 KB). These are the single processor versions.
> The Xeon (smp) chips should follow shortly (I am guessing).
> Go look at the roadmaps at www.theregister.co.uk. They tend to
> be accurate.
>
> I have not heard that there is a problem with the 2.0 Ghz. Is
> it a problem with RedHat or the Linux kernel specifically? We
> had no problems with the 1.7 Ghz Xeon chips, but they are not the
> 2.0 Ghz that you are talking about.
>
> Craig
>
> On Fri, Jan 11, 2002 at 07:03:35AM -0800, Rob Simac wrote:
> > What is the fastest Intel processors available to the public? Have
> > the 2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz
> > has problems running with Red Hat 7.1. Has anyone else heard this?
> >
> > Ciao,
> > Rob.
> >
> > ======================================================
> > Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
> >
> > ======================================================
> > "Remember this, foolish mortals, when ye stare headlong into the
> > mind-paralyzing void, the inky black nothingness of existence, the
> > hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
> > few minutes for your eyes to adjust."
> > Frank M. Carrano, Branford, Conn.
> > (Bulwer-Lytton Writing Contest Runner-up)
>
> --
> Craig Tierney (ctierney at hpti.com)
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From wsb at paralleldata.com Fri Jan 11 14:43:24 2002
From: wsb at paralleldata.com (W Bauske)
Date: Fri, 11 Jan 2002 13:43:24 -0600
Subject: Fastest Intel Processors
References: <012501c19ab1$24880f30$6564010a@thermawave.com>
Message-ID: <3C3F405C.1D9EDF5C@paralleldata.com>
> Rob Simac wrote:
>
> What is the fastest Intel processors available to the public? Have the 2.0 Ghz been released?
> I have also heard rumors that the 2.0 Ghz has problems running with Red Hat 7.1. Has anyone else
> heard this?
>
I have a P4 1.9Ghz running fine on RH7.2.
There are newer drivers in RH7.2 that are useful (GbE).
Wes
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From bogdan.costescu at iwr.uni-heidelberg.de Fri Jan 11 14:44:54 2002
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Fri, 11 Jan 2002 20:44:54 +0100 (CET)
Subject: Beowulf with Gigabit Ethernet
In-Reply-To:
Message-ID:
On Fri, 11 Jan 2002 alex at compusys.co.uk wrote:
> I think that in general point to point performance information has a
> limited value, whatever vendors might quote on their web-page. SCI might
> be performing pretty well if it comes down to just latency and bandwidth
> between two machines, but it is a ring topology. If you have
> more machines on a ring they will share that same bandwidth.
As with all generalizations when it comes to parallel performance, there
are cases when actually a ring topology (or 3D torus) does give results:
when communication is done only between neighbours. CHARMM (as it was
recently mentioned on this list) can benefit from such setup; an example
can be found at:
http://arg.cmm.ki.si/vrana/
--
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From SGaudet at turbotekcomputer.com Fri Jan 11 15:47:22 2002
From: SGaudet at turbotekcomputer.com (Steve Gaudet)
Date: Fri, 11 Jan 2002 15:47:22 -0500
Subject: Boards for Fastest Intel Processors
Message-ID: <3450CC8673CFD411A24700105A618BD61BECE1@911TURBO>
Hello Steve,
> Does anyone have any recommendations on what type of dual Xeon
> motherboard to get to be able to run these new 2.2 GHz processors?
> I have heard a lot about the Supermicro boards but am suspect due
> to problems with several earlier versions of Supermicro PIII boards.
>
> Is there any word on when Intel will be releasing their new
> board with the native Intel chipset instead of a Serverworks chipset?
Intel will be releasing their own Xeon 603 pin dual motherboard in April.
We've run and shipped the SuperMicro dual Xeon motherboard and appears
stable.
Cheers,
Steve Gaudet
Linux Solutions Engineer
.....
===================================================================
| Turbotek Computer Corp. tel:603-666-3062 ext. 21 |
| 8025 South Willow St. fax:603-666-4519 |
| Building 2, Unit 105 toll free:800-573-5393 |
| Manchester, NH 03103 e-mail:sgaudet at turbotekcomputer.com |
| web: http://www.turbotekcomputer.com |
===================================================================
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From cozzi at nd.edu Fri Jan 11 17:29:04 2002
From: cozzi at nd.edu (Marc Cozzi)
Date: Fri, 11 Jan 2002 17:29:04 -0500
Subject: Fastest Intel Processors
Message-ID:
I'm sure most people here understand the relationships between architectures
and clocks
and not to move the subject but isn't AMD moving industry people to adapt
a new measurement for the performance or relative performance of newer
chips?
Think I read the proposal was to get as far from the MHZ/GHZ thing as
possible.
Hell, even Steven from Dell knows MHz from GHz.
marc
-----Original Message-----
From: W Bauske [mailto:wsb at paralleldata.com]
Sent: Friday, January 11, 2002 2:38 PM
To: beowulf at beowulf.org
Cc: beowulf at beowulf.org
Subject: Re: Fastest Intel Processors
Mhz/Ghz matter. So does IPC (Instructions per clock). Word width
determines which architecture you must have to run on.
Consider that the IBM Power4 chip runs at 1.3Ghz yet it has the
fastest SPECFP2000 of any machine on the list(1169). IBM Power chips
have always used substantial on chip parallelism. A 2.2Ghz P4 is
766 for SPECFP2000.
Keep in mind that the way to measure how well a processor works is
to use your own codes and decide which one does it best for the $$$$
spent.
Wes
"C.Clary" wrote:
>
> Hi all,
> I really think this Ghz. goobly-gook is way over rated. I think, if you
> look more closley at the problem, the key to superior proformance is in
the
> found in the actual construction or architecture of the processor. The
> Itanium, for example only runs at 733 Mghz but it has (on die) 2meg of
> cashe on the L-2... The Alpha, in its previous incarnation ran at only 233
> but did 64 bit processing... Quite amazing actually.
>
> A friend wrote me that:
> "Red hat is working closely with Compaq to solidify its OS to the Alpha
> architecture"... I don't know for sure, first hand... But it is, of
course,
> the next logical step, since FreeBSD, already provides a clean UNIX 64 bit
> processing code for the Alpha.
> Chip
>
> -----Original Message-----
> From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
> Behalf Of Craig Tierney
> Sent: Friday, January 11, 2002 11:13 AM
> To: Rob Simac
> Cc: beowulf at beowulf.org
> Subject: Re: Fastest Intel Processors
>
> The 2.2 Ghz cpus have just been released. This generation
> of cpu is built at 0.13 microns and has twice as big of a
> L2 cache (512 KB). These are the single processor versions.
> The Xeon (smp) chips should follow shortly (I am guessing).
> Go look at the roadmaps at www.theregister.co.uk. They tend to
> be accurate.
>
> I have not heard that there is a problem with the 2.0 Ghz. Is
> it a problem with RedHat or the Linux kernel specifically? We
> had no problems with the 1.7 Ghz Xeon chips, but they are not the
> 2.0 Ghz that you are talking about.
>
> Craig
>
> On Fri, Jan 11, 2002 at 07:03:35AM -0800, Rob Simac wrote:
> > What is the fastest Intel processors available to the public? Have
> > the 2.0 Ghz been released? I have also heard rumors that the 2.0 Ghz
> > has problems running with Red Hat 7.1. Has anyone else heard this?
> >
> > Ciao,
> > Rob.
> >
> > ======================================================
> > Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
> >
> > ======================================================
> > "Remember this, foolish mortals, when ye stare headlong into the
> > mind-paralyzing void, the inky black nothingness of existence, the
> > hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
> > few minutes for your eyes to adjust."
> > Frank M. Carrano, Branford, Conn.
> > (Bulwer-Lytton Writing Contest Runner-up)
>
> --
> Craig Tierney (ctierney at hpti.com)
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From wsb at paralleldata.com Fri Jan 11 18:56:06 2002
From: wsb at paralleldata.com (W Bauske)
Date: Fri, 11 Jan 2002 17:56:06 -0600
Subject: Fastest Intel Processors
References:
Message-ID: <3C3F7B96.1255AF58@paralleldata.com>
One could define it as SPECFP2000. AMD might not like that though cause
it would show that P4's are faster (by virtue of their faster clock).
It does show that XP's are about equal to a P4 clock for clock, which is
something AMD seems to want to deny. Compare the XP1900 vs P4 1.6Ghz, 634 vs 637.
I think AMD would prefer SPECINT2000. There XP1900 vs 1.6Ghz P4 is 701 vs 565.
So, that implies if you find Athlons out-perform a P4 on your code, it's
probably has a large integer component. (assuming a good compiler for both)
Wes
Marc Cozzi wrote:
>
> I'm sure most people here understand the relationships between architectures
> and clocks
> and not to move the subject but isn't AMD moving industry people to adapt
> a new measurement for the performance or relative performance of newer
> chips?
> Think I read the proposal was to get as far from the MHZ/GHZ thing as
> possible.
>
> Hell, even Steven from Dell knows MHz from GHz.
>
> marc
>
> -----Original Message-----
> From: W Bauske [mailto:wsb at paralleldata.com]
> Sent: Friday, January 11, 2002 2:38 PM
> To: beowulf at beowulf.org
> Cc: beowulf at beowulf.org
> Subject: Re: Fastest Intel Processors
>
> Mhz/Ghz matter. So does IPC (Instructions per clock). Word width
> determines which architecture you must have to run on.
>
> Consider that the IBM Power4 chip runs at 1.3Ghz yet it has the
> fastest SPECFP2000 of any machine on the list(1169). IBM Power chips
> have always used substantial on chip parallelism. A 2.2Ghz P4 is
> 766 for SPECFP2000.
>
> Keep in mind that the way to measure how well a processor works is
> to use your own codes and decide which one does it best for the $$$$
> spent.
>
> Wes
>
> "C.Clary" wrote:
> >
> > Hi all,
> > I really think this Ghz. goobly-gook is way over rated. I think, if you
> > look more closley at the problem, the key to superior proformance is in
> the
> > found in the actual construction or architecture of the processor. The
> > Itanium, for example only runs at 733 Mghz but it has (on die) 2meg of
> > cashe on the L-2... The Alpha, in its previous incarnation ran at only 233
> > but did 64 bit processing... Quite amazing actually.
> >
> > A friend wrote me that:
> > "Red hat is working closely with Compaq to solidify its OS to the Alpha
> > architecture"... I don't know for sure, first hand... But it is, of
> course,
> > the next logical step, since FreeBSD, already provides a clean UNIX 64 bit
> > processing code for the Alpha.
> > Chip
> >
> > -----Original Message-----
> > From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
> > Behalf Of Craig Tierney
> > Sent: Friday, January 11, 2002 11:13 AM
> > To: Rob Simac
> > Cc: beowulf at beowulf.org
> > Subject: Re: Fastest Intel Processors
> >
> > The 2.2 Ghz cpus have just been released. This generation
> > of cpu is built at 0.13 microns and has twice as big of a
> > L2 cache (512 KB). These are the single processor versions.
> > The Xeon (smp) chips should follow shortly (I am guessing).
> > Go look at the roadmaps at www.theregister.co.uk. They tend to
> > be accurate.
> >
> > I have not heard that there is a problem with the 2.0 Ghz. Is
> > it a problem with RedHat or the Linux kernel specifically? We
> > had no problems with the 1.7 Ghz Xeon chips, but they are not the
> > 2.0 Ghz that you are talking about.
> >
> > Craig
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From drikis at mail.dyu.edu.tw Sat Jan 12 01:15:29 2002
From: drikis at mail.dyu.edu.tw (Drikis Ivars)
Date: Sat, 12 Jan 2002 14:15:29 +0800 (CST)
Subject: Fastest Intel Processors
In-Reply-To: <20020111170917.A28757@delta.ft.uam.es>
Message-ID:
I read long time ago in www.lwn.net, that linux kernel has
limitation to processor speed, 2Ghz. Now it must be fixed, I hope.
---------------------------------------------------------------
Dr. Phys. Ivars Drikis Department of Mechanical Engineering
Da-Yeh University, Changhua
tel: 886-4-8528469 Taiwan 515
> On Fri, Jan 11, 2002 at 07:03:35AM -0800, Rob Simac wrote:
> > What is the fastest Intel processors available to the public?
> > Have the 2.0 Ghz been released? I have also heard rumors that the 2.0
> > Ghz has problems running with Red Hat 7.1. Has anyone else heard
> > this?
> >
> > Ciao,
> > Rob.
> >
> > ======================================================
> > Rob Simac -- Therma-Wave, Inc. -- rsimac at thermawave.com
> > ======================================================
> > "Remember this, foolish mortals, when ye stare headlong into the
> > mind-paralyzing void, the inky black nothingness of existence, the
> > hellish yawning maw of the abyss -- it's pretty damn dark, so give it a
> > few minutes for your eyes to adjust."
> > Frank M. Carrano, Branford, Conn.
> > (Bulwer-Lytton Writing Contest Runner-up)
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From raysonlogin at yahoo.com Sat Jan 12 03:28:26 2002
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Sat, 12 Jan 2002 00:28:26 -0800 (PST)
Subject: Fastest Intel Processors
In-Reply-To: <3C3F7B96.1255AF58@paralleldata.com>
Message-ID: <20020112082826.7267.qmail@web11406.mail.yahoo.com>
The sad thing is that that aren't any good compilers that target
Athlon. And keep in mind that AMD is using the Intel compiler for
running SPEC.
A lot of tuning is needed in order to get good SPEC results. But if you
look at real world applications, AMD is same or better that Intel, even
for FP code, and also the price is cheaper too.
http://www.aceshardware.com/read.jsp?id=45000277
Lastly, Linux kernels used to have problems with processors with a
clock speed greater than 2GHz. The fix was in 2.2.18pre, and I think it
should be in 2.4 stable.
http://lwn.net/2000/1005/kernel.php3
Rayson
> One could define it as SPECFP2000. AMD might not like that though
> cause it would show that P4's are faster (by virtue of their faster
> clock).
>
> It does show that XP's are about equal to a P4 clock for clock, which
> is something AMD seems to want to deny. Compare the XP1900 vs P4
> 1.6Ghz, 634 vs 637.
>
> I think AMD would prefer SPECINT2000. There XP1900 vs 1.6Ghz P4 is
> 701 vs 565.
>
> So, that implies if you find Athlons out-perform a P4 on your code,
> it's
> probably has a large integer component. (assuming a good compiler for
> both)
>
> Wes
>
> Marc Cozzi wrote:
> >
> > I'm sure most people here understand the relationships between
> architectures
> > and clocks
> > and not to move the subject but isn't AMD moving industry people to
> adapt
> > a new measurement for the performance or relative performance of
> newer
> > chips?
> > Think I read the proposal was to get as far from the MHZ/GHZ thing
> as
> > possible.
> >
> > Hell, even Steven from Dell knows MHz from GHz.
> >
> > marc
> >
> > -----Original Message-----
> > From: W Bauske [mailto:wsb at paralleldata.com]
> > Sent: Friday, January 11, 2002 2:38 PM
> > To: beowulf at beowulf.org
> > Cc: beowulf at beowulf.org
> > Subject: Re: Fastest Intel Processors
> >
> > Mhz/Ghz matter. So does IPC (Instructions per clock). Word width
> > determines which architecture you must have to run on.
> >
> > Consider that the IBM Power4 chip runs at 1.3Ghz yet it has the
> > fastest SPECFP2000 of any machine on the list(1169). IBM Power
> chips
> > have always used substantial on chip parallelism. A 2.2Ghz P4 is
> > 766 for SPECFP2000.
> >
> > Keep in mind that the way to measure how well a processor works is
> > to use your own codes and decide which one does it best for the
> $$$$
> > spent.
> >
> > Wes
> >
> > "C.Clary" wrote:
> > >
> > > Hi all,
> > > I really think this Ghz. goobly-gook is way over rated. I think,
> if you
> > > look more closley at the problem, the key to superior proformance
> is in
> > the
> > > found in the actual construction or architecture of the
> processor. The
> > > Itanium, for example only runs at 733 Mghz but it has (on die)
> 2meg of
> > > cashe on the L-2... The Alpha, in its previous incarnation ran at
> only 233
> > > but did 64 bit processing... Quite amazing actually.
> > >
> > > A friend wrote me that:
> > > "Red hat is working closely with Compaq to solidify its OS to the
> Alpha
> > > architecture"... I don't know for sure, first hand... But it is,
> of
> > course,
> > > the next logical step, since FreeBSD, already provides a clean
> UNIX 64 bit
> > > processing code for the Alpha.
> > > Chip
> > >
> > > -----Original Message-----
> > > From: beowulf-admin at beowulf.org
> [mailto:beowulf-admin at beowulf.org]On
> > > Behalf Of Craig Tierney
> > > Sent: Friday, January 11, 2002 11:13 AM
> > > To: Rob Simac
> > > Cc: beowulf at beowulf.org
> > > Subject: Re: Fastest Intel Processors
> > >
> > > The 2.2 Ghz cpus have just been released. This generation
> > > of cpu is built at 0.13 microns and has twice as big of a
> > > L2 cache (512 KB). These are the single processor versions.
> > > The Xeon (smp) chips should follow shortly (I am guessing).
> > > Go look at the roadmaps at www.theregister.co.uk. They tend to
> > > be accurate.
> > >
> > > I have not heard that there is a problem with the 2.0 Ghz. Is
> > > it a problem with RedHat or the Linux kernel specifically? We
> > > had no problems with the 1.7 Ghz Xeon chips, but they are not the
> > > 2.0 Ghz that you are talking about.
> > >
> > > Craig
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From raysonlogin at yahoo.com Sat Jan 12 03:28:26 2002
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Sat, 12 Jan 2002 00:28:26 -0800 (PST)
Subject: Fastest Intel Processors
In-Reply-To: <3C3F7B96.1255AF58@paralleldata.com>
Message-ID: <20020112082826.7267.qmail@web11406.mail.yahoo.com>
The sad thing is that that aren't any good compilers that target
Athlon. And keep in mind that AMD is using the Intel compiler for
running SPEC.
A lot of tuning is needed in order to get good SPEC results. But if you
look at real world applications, AMD is same or better that Intel, even
for FP code, and also the price is cheaper too.
http://www.aceshardware.com/read.jsp?id=45000277
Lastly, Linux kernels used to have problems with processors with a
clock speed greater than 2GHz. The fix was in 2.2.18pre, and I think it
should be in 2.4 stable.
http://lwn.net/2000/1005/kernel.php3
Rayson
> One could define it as SPECFP2000. AMD might not like that though
> cause it would show that P4's are faster (by virtue of their faster
> clock).
>
> It does show that XP's are about equal to a P4 clock for clock, which
> is something AMD seems to want to deny. Compare the XP1900 vs P4
> 1.6Ghz, 634 vs 637.
>
> I think AMD would prefer SPECINT2000. There XP1900 vs 1.6Ghz P4 is
> 701 vs 565.
>
> So, that implies if you find Athlons out-perform a P4 on your code,
> it's
> probably has a large integer component. (assuming a good compiler for
> both)
>
> Wes
>
> Marc Cozzi wrote:
> >
> > I'm sure most people here understand the relationships between
> architectures
> > and clocks
> > and not to move the subject but isn't AMD moving industry people to
> adapt
> > a new measurement for the performance or relative performance of
> newer
> > chips?
> > Think I read the proposal was to get as far from the MHZ/GHZ thing
> as
> > possible.
> >
> > Hell, even Steven from Dell knows MHz from GHz.
> >
> > marc
> >
> > -----Original Message-----
> > From: W Bauske [mailto:wsb at paralleldata.com]
> > Sent: Friday, January 11, 2002 2:38 PM
> > To: beowulf at beowulf.org
> > Cc: beowulf at beowulf.org
> > Subject: Re: Fastest Intel Processors
> >
> > Mhz/Ghz matter. So does IPC (Instructions per clock). Word width
> > determines which architecture you must have to run on.
> >
> > Consider that the IBM Power4 chip runs at 1.3Ghz yet it has the
> > fastest SPECFP2000 of any machine on the list(1169). IBM Power
> chips
> > have always used substantial on chip parallelism. A 2.2Ghz P4 is
> > 766 for SPECFP2000.
> >
> > Keep in mind that the way to measure how well a processor works is
> > to use your own codes and decide which one does it best for the
> $$$$
> > spent.
> >
> > Wes
> >
> > "C.Clary" wrote:
> > >
> > > Hi all,
> > > I really think this Ghz. goobly-gook is way over rated. I think,
> if you
> > > look more closley at the problem, the key to superior proformance
> is in
> > the
> > > found in the actual construction or architecture of the
> processor. The
> > > Itanium, for example only runs at 733 Mghz but it has (on die)
> 2meg of
> > > cashe on the L-2... The Alpha, in its previous incarnation ran at
> only 233
> > > but did 64 bit processing... Quite amazing actually.
> > >
> > > A friend wrote me that:
> > > "Red hat is working closely with Compaq to solidify its OS to the
> Alpha
> > > architecture"... I don't know for sure, first hand... But it is,
> of
> > course,
> > > the next logical step, since FreeBSD, already provides a clean
> UNIX 64 bit
> > > processing code for the Alpha.
> > > Chip
> > >
> > > -----Original Message-----
> > > From: beowulf-admin at beowulf.org
> [mailto:beowulf-admin at beowulf.org]On
> > > Behalf Of Craig Tierney
> > > Sent: Friday, January 11, 2002 11:13 AM
> > > To: Rob Simac
> > > Cc: beowulf at beowulf.org
> > > Subject: Re: Fastest Intel Processors
> > >
> > > The 2.2 Ghz cpus have just been released. This generation
> > > of cpu is built at 0.13 microns and has twice as big of a
> > > L2 cache (512 KB). These are the single processor versions.
> > > The Xeon (smp) chips should follow shortly (I am guessing).
> > > Go look at the roadmaps at www.theregister.co.uk. They tend to
> > > be accurate.
> > >
> > > I have not heard that there is a problem with the 2.0 Ghz. Is
> > > it a problem with RedHat or the Linux kernel specifically? We
> > > had no problems with the 1.7 Ghz Xeon chips, but they are not the
> > > 2.0 Ghz that you are talking about.
> > >
> > > Craig
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From jacsib at lutecium.org Sat Jan 12 04:02:05 2002
From: jacsib at lutecium.org (Jacques B. Siboni)
Date: Sat, 12 Jan 2002 09:02:05 +0000
Subject: [Fwd: icmp: ip reassembly time exceeded]
Message-ID: <3C3FFB8D.7C7C37C7@lutecium.org>
Dear all,
i forward the following mail to ltsp and beowulf as I use these concepts and
Mosix group seems to be in a very depressed mood.
The problem I encounter occurs before Mosix even starts. There is some new
kind of stuff with kernel 2.4.xx that does not accept some kinds of fragments.
It is more an NFS boot problem.
One (quick and dirty) solution could be to allow the kernel to load even with
an mtu less than 1500, which I could not do.
Thanks in advance
Jacques
--
Dr. Jacques B. Siboni mailto:jacsib at Lutecium.org
8 pass. Charles Albert, F75018 Paris, France
Tel. & Fax: 33 (0) 1 42 28 76 78
Home Page: http://www.lutecium.org/jacsib/
-------------- next part --------------
An embedded message was scrubbed...
From: "Jacques B. Siboni"
Subject: icmp: ip reassembly time exceeded
Date: Tue, 08 Jan 2002 12:22:24 +0000
Size: 1520
URL:
From wsb at paralleldata.com Sat Jan 12 04:10:21 2002
From: wsb at paralleldata.com (W Bauske)
Date: Sat, 12 Jan 2002 03:10:21 -0600
Subject: Fastest Intel Processors
References: <20020112082826.7267.qmail@web11406.mail.yahoo.com>
Message-ID: <3C3FFD7D.5BADFD4B@paralleldata.com>
Rayson Ho wrote:
>
> The sad thing is that that aren't any good compilers that target
> Athlon. And keep in mind that AMD is using the Intel compiler for
> running SPEC.
I suspect the compilers they used (Intel C++ 5.0.1 build 010727Z,
Intel Fortran 5.0.1 build 010727Z, and Compaq Visual Fortran 6.6) are
pretty up to date. Interesting they chose to use Intel's compiler
on their chip.
>
> A lot of tuning is needed in order to get good SPEC results. But if you
> look at real world applications, AMD is same or better that Intel, even
> for FP code, and also the price is cheaper too.
>
> http://www.aceshardware.com/read.jsp?id=45000277
>
The problem with Ace's and several other standard sites is they don't
actually compile the applications. They just live with what the app
vendor did which is mostly irrelevant for folks who compile their own
codes. That does apply for people who use a Winxx box though so it is
relevant to that audience. I can guarantee the binaries I generate for a
P4 will not run on any Athlon though. (SSE2)
Wes
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From erayo at cs.bilkent.edu.tr Sat Jan 12 10:28:26 2002
From: erayo at cs.bilkent.edu.tr (Eray Ozkural (exa))
Date: Sat, 12 Jan 2002 17:28:26 +0200
Subject: Need advice on cluster hardware
In-Reply-To: <5.0.2.1.2.20020110201213.00af1090@hesiod>
References: <5.0.2.1.2.20020110201213.00af1090@hesiod>
Message-ID:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Friday 11 January 2002 03:35, Ron Choy wrote:
> My advisor has made me responsible (*gasp*) for purchasing a 8 node
> cluster, used mainly for computational linear algebra problems. After some
> research I came up with the following configuration:
>
> Frontend (+file server):
> Asus A7M266-D (AMD 760MPX)
> Enermax EG365P-VE 350W PS (meets the 15A on 12V requirement of the board)
> 2 x Athlon MP 1800+ (1.533GHz)
> 2 x Thermaltake VOLCANO 6Cu+ heatsink
> 2 x Kingston PC2100 512MB ECC Registered
> IBM SCSI HD 36GB 10000RPM
> Adaptec SCSI Controller
> 2 x Netgear GA622T gigabit nic
> ATI xpert Rage XL 8MB AGP vid
> Floppy drive
> Sony 52X cdrom drive
> Tower case
>
> Compute nodes:
> same as front end, minus 1 nic and SCSI adapter, and replace SCSI HD with
> IDE HD.
You shouldn't be needing cdrom drives at nodes. I don't know, but with those
2 power-hogging cpu's you might be needing an extra fan somewhere.
>
> Switch:
> Intel 410T 16 port 10/100 switch
>
> (* The reason why I have gigabit nics and 10/100 switch is that I don't
> know if bandwidth is going to be a limit on the computations so I would
> rather start out small and expand later. (is this a good idea?) )
>
>
Linear Algebra problems are likely to require a lot of bisection bandwidth.
Is that switch going to work with your 1000Base-TX (?) NIC's at all? I assume
you'd be better off with a switch that suits your hardware. If you have the
budget go for a gigabit switch.
Will you work on dense or sparse problems? Your requirements are likely to
differ for the type of matrices you will use, and of course the kind of
research you will make. If you are a computational scientist you'd like a
faster network, if you are a computer scientist you might need a slower
network to show that your algorithm is effective in low-bandwidth
configurations!
>
> Does this configuration looks reasonable? Any known conflicts and driver
> issues? I am going to use the 2.4.x kernel.
It does look reasonable. Usually there aren't many problems with ASUS boards
and AMD cpus but of course testing is the best way to make sure.
Regards,
- --
Eray Ozkural (exa)
Comp. Sci. Dept., Bilkent University, Ankara
www: http://www.cs.bilkent.edu.tr/~erayo
GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org
iD8DBQE8QFYcfAeuFodNU5wRAr1pAJ9k69yUO/HdRtnUNW6GaMhuPrMUVgCfUeyt
2vENlwERRX0B3W9cZLJfV2s=
=zQgg
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From maurice at harddata.com Sat Jan 12 12:08:51 2002
From: maurice at harddata.com (Maurice Hilarius)
Date: Sat, 12 Jan 2002 10:08:51 -0700
Subject: Subject: Re: Boards for Fastest Intel Processors
In-Reply-To: <200201120840.g0C8egR03385@blueraja.scyld.com>
Message-ID: <5.1.0.14.2.20020112100522.03a8c7b0@mail.harddata.com>
With regards to your message , where you stated:
>Supermicro's P4 Xeon dual uses the Intel 860 chipset as do all dual
>P4 Xeon boards I'm aware of. Also, the current 2.2Ghz chips are not
>Xeon's from what I read anyway so only singles are available at this
>point.
>
>Wes
Also the XEON P4 is made on an entirely different and larger fab process.
As a result they draw huge a mounts of power, and require extraordinary
cooling.
This makes the cost and reliability an issue. It also makes the fast clock
speeds unlikely until Intel starts fabbing them on newer processes.
Also, Intel has said the 860 chipset is intended for workstations, and that
they have no intention of releasing their own board for it. Claims are that
later this year they are supposed to be releasing some newer chipset XEON
servers. Serverset is notable by it's absence in this space.
With our best regards,
Maurice W. Hilarius Telephone: 01-780-456-9771
Hard Data Ltd. FAX: 01-780-456-9772
11060 - 166 Avenue mailto:maurice at harddata.com
Edmonton, AB, Canada http://www.harddata.com/
T5X 1Y3
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From maurice at harddata.com Sat Jan 12 12:20:16 2002
From: maurice at harddata.com (Maurice Hilarius)
Date: Sat, 12 Jan 2002 10:20:16 -0700
Subject: Beowulf digest, Vol 1 #703 - 10 msgs
In-Reply-To: <200201120840.g0C8egR03385@blueraja.scyld.com>
Message-ID: <5.1.0.14.2.20020112100935.03a8b190@mail.harddata.com>
With regards to your message where you stated:
>Steven Timm wrote:
> >
> > Does anyone have any recommendations on what type of dual Xeon
> > motherboard to get to be able to run these new 2.2 GHz processors?
> > I have heard a lot about the Supermicro boards but am suspect due
> > to problems with several earlier versions of Supermicro PIII boards.
> >
> > Is there any word on when Intel will be releasing their new
> > board with the native Intel chipset instead of a Serverworks chipset?
> >
> > Steve
There are only two currently available choices:
Tyan S2603
Supermicro P4Dxx series.
Unfortunately the Intel 860 chipset is only compatible with RDRAM.
Supermicro have a few variants, but they are all limited to 2GB RAM.
The Tyan can go to 4GB, but uses a memory riser card to do it, and is a
very large form factor as a result. Also, the Intel cooling shrouds for
XEON are huge, and actually hang past the motherboard, breaking the form
factor specification.
Power supply requirements are also very hefty.
In general it is unlikely to implement this board in anything under 4U of
rack space.
With our best regards,
Maurice W. Hilarius Telephone: 01-780-456-9771
Hard Data Ltd. FAX: 01-780-456-9772
11060 - 166 Avenue mailto:maurice at harddata.com
Edmonton, AB, Canada http://www.harddata.com/
T5X 1Y3
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From maurice at harddata.com Sat Jan 12 12:49:23 2002
From: maurice at harddata.com (Maurice Hilarius)
Date: Sat, 12 Jan 2002 10:49:23 -0700
Subject: 8 node cluster
In-Reply-To: <200201121701.g0CH1FR10857@blueraja.scyld.com>
Message-ID: <5.1.0.14.2.20020112104417.03aaad50@mail.harddata.com>
With regards to your message where you stated:
>On Friday 11 January 2002 03:35, Ron Choy wrote:
> > My advisor has made me responsible (*gasp*) for purchasing a 8 node
> > cluster, used mainly for computational linear algebra problems. After some
> > research I came up with the following configuration:
> >
> > Frontend (+file server):
> > Asus A7M266-D (AMD 760MPX)
I think the Tyan S2466N, with it's onboard dual 3COM NIC is a better bet.
And less $$, and Tyan already has way more MP Athlon experience than ASUS.
> > Enermax EG365P-VE 350W PS (meets the 15A on 12V requirement of the board)
Check the real power output. In my experience Enermax is very optimistic
about their outputs.
I think you might consider Zippy or NMB too..
> > 2 x Athlon MP 1800+ (1.533GHz)
> > 2 x Thermaltake VOLCANO 6Cu+ heatsink
Why The stock AMD heatsink/fan is good, and the boxed CPUs come with this
and 3 year warranty instead of the one year on tray/OEM parts.
> > 2 x Kingston PC2100 512MB ECC Registered
Whatever. you DO know that 1GB modules are the same price or cheaper [per
MB than 512's?
Also, CAS2 modules at 1GB are available now, which allow you to cleanly run
4GB on this chipset..
> > IBM SCSI HD 36GB 10000RPM
> > Adaptec SCSI Controller
Consider LSI/Symbios. Open Source drivers are a good thing. Adaptec is all
reverse engineered.
As for performance, I do not want to start a war, but we get better results
on Symbios.
> > 2 x Netgear GA622T gigabit nic
Consider the Intel 8490..
> > ATI xpert Rage XL 8MB AGP vid
> > Floppy drive
> > Sony 52X cdrom drive
> > Tower case
With good airflow..
With our best regards,
Maurice W. Hilarius Telephone: 01-780-456-9771
Hard Data Ltd. FAX: 01-780-456-9772
11060 - 166 Avenue mailto:maurice at harddata.com
Edmonton, AB, Canada http://www.harddata.com/
T5X 1Y3
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rgb at phy.duke.edu Sat Jan 12 14:57:07 2002
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sat, 12 Jan 2002 14:57:07 -0500 (EST)
Subject: [Fwd: icmp: ip reassembly time exceeded]
In-Reply-To: <3C3FFB8D.7C7C37C7@lutecium.org>
Message-ID:
On Sat, 12 Jan 2002, Jacques B. Siboni wrote:
> Dear all,
>
> i forward the following mail to ltsp and beowulf as I use these concepts and
> Mosix group seems to be in a very depressed mood.
>
> The problem I encounter occurs before Mosix even starts. There is some new
> kind of stuff with kernel 2.4.xx that does not accept some kinds of fragments.
> It is more an NFS boot problem.
>
> One (quick and dirty) solution could be to allow the kernel to load even with
> an mtu less than 1500, which I could not do.
>
> Thanks in advance
>
> Jacques
>
>
Dear Jacques,
This has the look and feel of a hardware problem with your physical
network. The fact that small packets sometimes make it through but big
ones don't is telling indeed. You don't describe your physical network,
but one of the following could easily be the problem:
a) Bad wiring. One wire with an almost-broken wire can do this. So
can poorly wired connectors at the punchblock or inside the RJ45
connectors.
b) Wiring runs that are too long. 100BT has a maximum radius of 100
m from a switch that can retime packets. If runs are too long, a
collision condition can easily occur as one host brings up the line to
send but the signal doesn't have time to propagate to a host downstream
in time to keep it from ALSO bringing up the line to send. In a high
traffic density network, lots of packets collide and are lost, and
perhaps smaller packets have a better chance of making it through at
least sometimes.
c) Hubs instead of switches, especially too many hubs. Packets sent
to a hub are echoed on all lines, and ANY system trying to send in the
same window will cause a collision. Too many hubs add latency that
reduces the effective diameter of your network and increases the
probability of collisions. Switches actually read a packet and
retransmit it on ONLY the line it is destined for, and retime the packet
besides. This isolates systems from traffic not intended for them and
improves network stability and performance. Offhand I can't remember
the maximum number of "repeaters" (hubs) permitted in a 100BT network --
something like 3 -- because I haven't used hubs for years now, ever
since switches got so cheap. Good switches will also sometimes indicate
lines with a fault condition and isolate those lines.
d) Cheap/bad NICs. It is just my opinion, but this includes all
RTL8139 NICs from any manufacturer. These NICs have exhibited behavior
like that which you describe in my own systems all by themselves on an
otherwise perfect network -- if you flood them with a packet stream,
they can easily end up dropping all but one or two packets in a hundred.
Again, small packets probably doesn't improve their efficiency (it just
makes for a longer stream with even smaller interpacket gaps) but it
likely does improve the probability that a packet will make it through
before timing out. Unfortunately, RTL8139's are nearly ubiquitous,
since they are available in $10 NICs and some folks cannot resist the
bargain. If you have 8139's, just throw them away and buy a decent NIC
-- eepro100, tulip, 3c905 -- and your problem may magically go away.
e) It's a long shot, but a poorly supported card/driver or
interference with a particular chipset or motherboard or card
combination "can" cause things like this, but frankly I doubt it. I'd
work a-d over pretty thoroughly before I started worrying about problems
in the base linux kernel or network drivers (RTL drivers excluded,
although it isn't really a driver problem per se) or exotic chipset
problems. This is presuming that you are running a reasonably recent
and/or non-SMP production kernel. If you are running a really old
kernel (especially a really old SMP kernel) or an exotic homemade kernel
with strange drivers or the like, after I finished asking "why" I'd
agree that doing something sort of dumb like this could also cause such
a problem.
There are some lovely online guides to the care and feeding of Ethernet
networks, many of them linked to www.phy.duke.edu/brahma or available on
the Scyld website. One or more of them will (for example) tell you the
maximum number of repeaters permitted if in fact you are using hubs and
have a very large physical network.
Hope this helps. I'd advise investing in (minimally) a network cable
tester and if there is any chance at all your cable runs are too long,
in a reflectometer. If you are using hubs or RTL NICs, I'd STRONGLY
recommend swappping them out for switches and decent NICs as rapidly as
possible, especially for the lines connecting to your servers.
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From okeefe at brule.borg.umn.edu Sun Jan 13 14:08:45 2002
From: okeefe at brule.borg.umn.edu (Matthew O'Keefe)
Date: Sun, 13 Jan 2002 13:08:45 -0600
Subject: the need for storage area networks [was: Shared diskspace between nodes]
In-Reply-To: <042a01c197b2$a2165530$0302a8c0@Roaming>
References: <042a01c197b2$a2165530$0302a8c0@Roaming>
Message-ID: <20020113130845.A71651@brule.borg.umn.edu>
Jon,
others have provided good suggestions for solutions to
your problem, but I think the ultimate solution to your
problem is a storage area network between your Beowulf nodes
and a pool of shared storage devices. This approach allows efficient
partitioning and sharing of storage between the Beowulf nodes.
A cluster file system like GFS can be used to map a shared
file system (one that all nodes can mount directly) onto the
shared storage devices. This approach completely removes
your problem: trying to map your data evenly across many nodes,
when the data needs on each node can grow or shrink in
unexpected ways. It also allows you to manage 1 file system,
instead of 40.
Some may object that SANs are expensive, but that is changing.
IP-based SANs are now becoming available, and a cluster of
NFS servers with shared storage and a cluster file system can
also be used to share data across a Beowulf without the full
expense of a SAN. For details see the white paper I wrote
on "Accelerating Technical Computing..." at the Sistina web site
(www.sistina.com).
When running complex parallel applications in production
on a Beowulf cluster (for example, Oracle Real Application Clusters),
a storage area network and cluster file system greatly
simplifies your life.
Matt O'Keefe
On Mon, Jan 07, 2002 at 02:36:42PM -0500, Jon E. Mitchiner wrote:
> Greetings!
>
> I presently run a 40-node cluster, Dual 1GHz with 20GB hard drive on each
> system. This gives me roughly 15GB (safe estimate) after the OS, installed
> programs, some data, etc on each machine. This gives me roughly 600GB of
> space that I am not currently utilizing on 40 nodes.
>
> Right now, we are saving data on various nodes, and moving it around when
> space gets tight on a machine. This is getting time consuming as some of us
> have to look on different nodes to find out where your data is currently
> residing. I am considering saving all directory names in a database and
> then making a GUI interface via the web so its easy to find the location of
> data directories, rather than looking for it (especially if someone moved my
> directory to another machine without letting me know).
>
> I am curious if there is a program out there that might be able to utilize
> the space that we are not utilizing -- such as linking the file space
> between nodes so that way I can set up a "large" data partition sharable by
> all nodes. Some redunancy would be nice. Im curious if there is a software
> solution (either GPL licensed, or commercial) to utilize the space better.
>
> Optimally, it would be nice to see all "shared" drives as one large
> partition to be mounted to all nodes and all the data is handled by a daemon
> or something like that.
>
> Does anyone have any ideas, suggestions, or programs that might be able to
> do something similar?
>
> Thanks!
>
> Regards,
>
> Jon E. Mitchiner
> Minotaur Technologies
> http://www.minotaur.com
> AOL IM [http://www.aol.com/aim] MinotaurT
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From john at computation.com Mon Jan 14 00:57:27 2002
From: john at computation.com (John Nelson)
Date: Mon, 14 Jan 2002 00:57:27 -0500
Subject: Bizarre problems when adding a PPC machine...
Message-ID: <3C427347.7050201@computation.com>
Hi all,
I really hate to bother the mailing list but this one has me somewhat
stumped. I have a four node cluster comprising Linux machines and one
PPC machine. The Linux machines have been adequately tested and play
well together. That PPC machine is another matter. When I include the
PPC machine (a Mac 8500 running YellowDog Linux) in my network
cluster... well things fall apart. Here's what appears on the console
after running a simple test on my "root" node....
[john at adenine examples]$ ./mpirun -np 4 simpleio
p2_9722: p4_error: Could not allocate memory for commandline args:
553648128
bm_list_24602: (4.056938) Listener: Unable to interrupt client pid=24601.
Connection failed for reason: : Connection refused
p1_1962: p4_error: net_recv read: probable EOF on socket: 1
[john at adenine examples]$ Connection failed for reason: : Connection refused
p3_1283: p4_error: net_recv read: probable EOF on socket: 1
bm_list_24602: (4.076335) Listener: Unable to interrupt client pid=24601.
Connection failed for reason: : Connection refused
Connection failed for reason: : Connection refused
Broken pipe
Connection failed for reason: : Connection refused
Connection failed for reason: : Connection refused
Connection failed for reason: : Connection refused
Broken pipe
Connection failed for reason: : Connection refused
Broken pipe
bm_list_24602: p4_error: net_recv read: probable EOF on socket: 1
Connection refused is a strange strange message because RSH seems to be
working well as do other networking applications. I imagine that one
reason could be MPICH version differences between the different
architectures. These are the versions of the RPM libraries installed:
PPC: mpich-1.2.0-1a
Linux: mpich-1.2.0-12
But I also compiled and installed the source code on both classes of
machines.
Any ideas. Its probably something simple but being a Beowulf newbie,
its beyond me right now.
-- John
--
_________________________________________________________
John T. Nelson
President | Computation.com Inc.
mail: | john at computation.com
company: | http://www.computation.com/
journal: | http://www.computation.org/
_________________________________________________________
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From ole at scali.no Mon Jan 14 03:13:24 2002
From: ole at scali.no (Ole W. Saastad)
Date: Mon, 14 Jan 2002 09:13:24 +0100
Subject: Need advice on cluster hardware
References: <200201121702.g0CH2WR10875@blueraja.scyld.com>
Message-ID: <3C429324.EF649A2C@scali.no>
"Eray Ozkural (exa)"
and >Ron Choy wrote:
> > (* The reason why I have gigabit nics and 10/100 switch is that I don't
> > know if bandwidth is going to be a limit on the computations so I would
> > rather start out small and expand later. (is this a good idea?) )
> >
> >
>
> Linear Algebra problems are likely to require a lot of bisection bandwidth.
> Is that switch going to work with your 1000Base-TX (?) NIC's at all? I assume
> you'd be better off with a switch that suits your hardware. If you have the
> budget go for a gigabit switch.
>
> Will you work on dense or sparse problems? Your requirements are likely to
> differ for the type of matrices you will use, and of course the kind of
> research you will make. If you are a computational scientist you'd like a
> faster network, if you are a computer scientist you might need a slower
> network to show that your algorithm is effective in low-bandwidth
> configurations!
>
If you want to start small start with bonded 100 Mbits ethernet, this is
low
cost and give you relatively high bandwidth.
The bandwidth obtained with gigabit os newer as you would hope for, more
than 100
Mbytes/sec. The high bandwidth you get with the 1000T is nice for
servers and
file transfers but as with all ethernet the latency is killing you. With
only
11-12 Mbytes/sec. (or app. double with bonded) the latency is a problem
but
possible to live with. However, when using gigabit ethernet the latency
associated
with TCP/IP is really killing you. When the price difference between 100
and 1000T
becomes zero, you get much better performance for at no extra cost, then
the
picture changes somewhat. The latency is still your bottleneck, with
numbers
running over a 100 microseconds for a TCP/IP interconnect. It is the
TCP/IP
protocol that mainly account for this latency. In short the message is
that
for most applications it does not help very much to replace fast
ethernet
with gigabit ethernet when you must pay a lot of money for 1000T cards
and
switches.
I would recommend an interconnect with lower latency like SCI. With
latency less
than 4 microseconds and measured bandwidth over 300 MB/sec. SCI
outperforms the
gigabit network. (Myrinet is an alternative.) If you want to start
small a few
wulfkit would enable you to set up a small 2x or 4x cluster to test you
application
and verify your bottlenecks.
--
Ole W. Saastad, Dr.Scient.
Scali AS P.O.Box 70 Bogerud 0621 Oslo NORWAY
Tel:+47 22 62 89 68(dir) mailto:ole at scali.no http://www.scali.com
ScaMPI: bandwidth .gt. 300 MB/sec. latency .lt. 4 us.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From pw at osc.edu Mon Jan 14 08:57:25 2002
From: pw at osc.edu (Pete Wyckoff)
Date: Mon, 14 Jan 2002 08:57:25 -0500
Subject: Bizarre problems when adding a PPC machine...
In-Reply-To: <3C427347.7050201@computation.com>; from john@computation.com on Mon, Jan 14, 2002 at 12:57:27AM -0500
References: <3C427347.7050201@computation.com>
Message-ID: <20020114085725.B1907@osc.edu>
john at computation.com said:
> I really hate to bother the mailing list but this one has me somewhat
> stumped. I have a four node cluster comprising Linux machines and one
> PPC machine. The Linux machines have been adequately tested and play
> well together. That PPC machine is another matter. When I include the
> PPC machine (a Mac 8500 running YellowDog Linux) in my network
> cluster... well things fall apart. Here's what appears on the console
> after running a simple test on my "root" node....
>
>
> [john at adenine examples]$ ./mpirun -np 4 simpleio
> p2_9722: p4_error: Could not allocate memory for commandline args:
> 553648128
Looks like endianness problems in mpich. Complain to those developers,
with followup to this list in case others run into the same problem
later.
-- Pete
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rbw at networkcs.com Mon Jan 14 11:18:11 2002
From: rbw at networkcs.com (Richard Walsh)
Date: Mon, 14 Jan 2002 10:18:11 -0600 (CST)
Subject: Fastest Intel Processors
In-Reply-To:
Message-ID: <200201141618.KAA64114@us.msp.networkcs.com>
astroguy wrote:
>I really think this Ghz. gbesh oobly-gook is way over rated. I think, if you
>look more closley at the problem, the key to superior proformance is in the
>found in the actual construction or architecture of the processor. The
>Itanium, for example only runs at 733 Mghz but it has (on die) 2meg of
>cashe on the L-2... The Alpha, in its previous incarnation ran at only 233
>but did 64 bit processing... Quite amazing actually. A friend wrote me
All singular measures of performance are over rated (does this need to be
explicitly stated). Benchmarking your codes with a good compiler on the systems
being considered is the optimal approach, but is often out of reach. SPEC2000FP
(perhaps combined with STREAM) is a nice substitute. Here are the top ten micro-
processors on SPEC2000FP as of a few weeks ago (32 and 64 bit). Notice the weak
correlation with clock. Local memory (cache) structure, size, and latency matter,
superscalability of the core matters (multiple funits), instruction type matters,
(vector and pseudo-vector SSE), and processor-mother board bandwidth matters.
Note the P4 2 GHz outperforms the Itanium. Touting the Alpha could be a mistake.
Will we ever see the 21364 in a product? I am still waiting. IBM leaped to the
top of the list with its 1.3 GHz dual-core Power4 in October (a cool chip). Not
sure of the relavence of RedHat on an aging Alpha 21264. The Sparc III number
seems to be due to the anomolous high performance of the processor on a single
case in the 12(?) SPEC2000FP benchmarks pushing the average way up. Where is
SGI in this race ... mmmm?
System Processor Clock (MHz) SPEC Peak/Base
------ --------- ----- --------------
IBM 690 Power4 1300 1169/1098
Alpha ES45 21264C 1000 960/776
Sun 2050 UltraIII 1050 827/701
Alpha ES40 21264B 833 777/621
Alpha GS80 21264C 1001 756/585
Alpha GS160 21264C 1001 756/585
Dell 530 Pentium 4 2000 734/716
Dell 340 Pentium 4 2000 734/716
Intel 850 Pentium 4 2000 714/704
HP RX461 Itanium 800 701/701
------- --------- ---- --------------
AMD 1900 Pentium 4 1600 634/588
SGI 3200 R14 500 463/436
------ --------- ----- --------------
Pentium 4 using SSE2 (need the right compiler), with its fast clock (delivered
by small segments in the FPU pipleline), and nice bandwidth make it an excellent
performer for the price. For those with no budget constraints go for the IBM. Is
it only available on IBM custom MCMs or are there alternatives?
A great horse race really ... can you get odds in Las Vegas ... what microprocessor
currency will be the most valuable in six months. Any one selling microprocessor
futures, options, exotics ... En-ripp-off was contemplating it, but they choked
on there own hype ...
Bye-4-now!
rbw
#---------------------------------------------------
#
# Richard Walsh
# Project Manager, Cluster Computing, Computational
# Chemistry and Finance
# netASPx, Inc.
# 1200 Washington Ave. So.
# Minneapolis, MN 55415
# VOX: 612-337-3467
# FAX: 612-337-3400
# EMAIL: rbw at networkcs.com, richard.walsh at netaspx.com
#
#---------------------------------------------------
# "What you can do, or dream you can, begin it;
# Boldness has genius, power, and magic in it."
# -Goethe
#---------------------------------------------------
# "Without mystery, there can be no authority."
# -Charles DeGaulle
#---------------------------------------------------
# "Why waste time learning when ignornace is
# instantaneous?" -Thomas Hobbes
#---------------------------------------------------
# "In the chaos of a river thrashing, all that water
# still has to stand in line." -Dave Dobbyn
#---------------------------------------------------
~
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From alberto at delta.ft.uam.es Mon Jan 14 12:36:22 2002
From: alberto at delta.ft.uam.es (Alberto Ramos)
Date: Mon, 14 Jan 2002 18:36:22 +0100
Subject: Advice on NIC cards.
Message-ID: <20020114183622.B17982@delta.ft.uam.es>
Hello all. reading your comments about Gigabit Ethernet (thankd to all!),
we have decided to use fast Ethernet at first, and make some test.
Choosing the NIC, I think that NetGear GA620 is a good choice, but I want
to know what you think about that.
Thank you very much.
Alberto.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rross at mcs.anl.gov Mon Jan 14 13:49:17 2002
From: rross at mcs.anl.gov (Robert Ross)
Date: Mon, 14 Jan 2002 12:49:17 -0600 (CST)
Subject: PVFS v1.5.3 release
Message-ID:
Hello all,
The PVFS development team is happy to annouce the latest release of the
Parallel Virtual File System (PVFS), version 1.5.3.
PVFS is an open source parallel file system implementation for Linux
clusters that operates over TCP/IP and uses existing disk hardware,
meaning that you can implement a parallel file system on your cluster
without additional hardware costs. This release includes a number of bug
fixes and configuration improvements, many of which were contributed by
users of PVFS. Additional debugging utilities make it ever easier to
configure PVFS on your system, and the newest Linux 2.4 kernels are
supported as well. This release represents a significant improvement in
stability over the previous release, 1.5.2.
As always, the GPL'd source for PVFS is available from:
ftp://ftp.parl.clemson.edu/pub/pvfs
For more information on PVFS, including papers, FAQ, User's Guide, and a
Quick Start guide, see the PVFS home page:
http://www.parl.clemson.edu/pub/pvfs
Regards,
Rob (on behalf of the team)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From mack.joseph at epa.gov Mon Jan 14 13:49:49 2002
From: mack.joseph at epa.gov (Joseph Mack)
Date: Mon, 14 Jan 2002 13:49:49 -0500
Subject: Advice on NIC cards.
References: <20020114183622.B17982@delta.ft.uam.es>
Message-ID: <3C43284D.C8EEF21D@epa.gov>
Alberto Ramos wrote:
>
> we have decided to use fast Ethernet at first, and make some test.
.
.
> Choosing the NIC, I think that NetGear GA620 is a good choice, but I want
> to know what you think about that.
FYI
http://www.sfu.ca/acs/cluster/nic-test.html
Joe
--
Joseph Mack PhD, Senior Systems Engineer, Lockheed Martin
contractor to the National Environmental Supercomputer Center,
mailto:mack.joseph at epa.gov ph# 919-541-0007, RTP, NC, USA
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rross at mcs.anl.gov Mon Jan 14 15:00:14 2002
From: rross at mcs.anl.gov (Robert Ross)
Date: Mon, 14 Jan 2002 14:00:14 -0600 (CST)
Subject: PVFS v1.5.3 release
In-Reply-To:
Message-ID:
Oops...make that http://www.parl.clemson.edu/pvfs
So much for symmetry!
Rob
On Mon, 14 Jan 2002, Robert Ross wrote:
> For more information on PVFS, including papers, FAQ, User's Guide, and a
> Quick Start guide, see the PVFS home page:
>
> http://www.parl.clemson.edu/pub/pvfs
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From timm at fnal.gov Mon Jan 14 15:41:20 2002
From: timm at fnal.gov (Steven Timm)
Date: Mon, 14 Jan 2002 14:41:20 -0600 (CST)
Subject: Boards for Fastest Intel Processors
In-Reply-To: <3450CC8673CFD411A24700105A618BD61BECE1@911TURBO>
Message-ID:
Steve--thanks for the note.
Isn't the new Xeon package going to be 478 pins, same as the
"northwood" version of the P4? or am I getting confused here.
Will the supermicro boards such as the P4DC6+ or P4DCE+ support
the new,"prestonia" xeons or just the old ones?
I am seeing vendors that claim the boards will but on
the board spec sheet itself it only talks about the 603-pin package
which I assume is the older "foster" package. Is that right?
Steve Timm
------------------------------------------------------------------
Steven C. Timm (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations
On Fri, 11 Jan 2002, Steve Gaudet wrote:
> Hello Steve,
>
> > Does anyone have any recommendations on what type of dual Xeon
> > motherboard to get to be able to run these new 2.2 GHz processors?
> > I have heard a lot about the Supermicro boards but am suspect due
> > to problems with several earlier versions of Supermicro PIII boards.
> >
> > Is there any word on when Intel will be releasing their new
> > board with the native Intel chipset instead of a Serverworks chipset?
>
> Intel will be releasing their own Xeon 603 pin dual motherboard in April.
> We've run and shipped the SuperMicro dual Xeon motherboard and appears
> stable.
>
> Cheers,
>
> Steve Gaudet
> Linux Solutions Engineer
> .....
>
>
> ===================================================================
> | Turbotek Computer Corp. tel:603-666-3062 ext. 21 |
> | 8025 South Willow St. fax:603-666-4519 |
> | Building 2, Unit 105 toll free:800-573-5393 |
> | Manchester, NH 03103 e-mail:sgaudet at turbotekcomputer.com |
> | web: http://www.turbotekcomputer.com |
> ===================================================================
>
>
>
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From timm at fnal.gov Mon Jan 14 15:39:10 2002
From: timm at fnal.gov (Steven Timm)
Date: Mon, 14 Jan 2002 14:39:10 -0600 (CST)
Subject: Boards for Fastest Intel Processors
In-Reply-To: <3C3F3958.291142C6@paralleldata.com>
Message-ID:
On Fri, 11 Jan 2002, W Bauske wrote:
>
> Supermicro's P4 Xeon dual uses the Intel 860 chipset as do all dual
> P4 Xeon boards I'm aware of. Also, the current 2.2Ghz chips are not
> Xeon's from what I read anyway so only singles are available at this
> point.
>
> Wes
>
It is true that there is no announcement of 2.2 Ghz Xeon's at the
intel web site at the moment... but there are vendors on
pricewatch.com that claim they can sell them. Haven't actually
tried to buy one to see what the shipping date is.
Steve
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From joachim at lfbs.RWTH-Aachen.DE Tue Jan 15 02:45:03 2002
From: joachim at lfbs.RWTH-Aachen.DE (Joachim Worringen)
Date: Tue, 15 Jan 2002 08:45:03 +0100
Subject: Beowulf with Gigabit Ethernet
Message-ID: <3C43DDFF.D831F4C6@lfbs.rwth-aachen.de>
> On Fri, 11 Jan 2002 alex at compusys.co.uk wrote:
>
> > I think that in general point to point performance information has a
> > limited value, whatever vendors might quote on their web-page. SCI might
> > be performing pretty well if it comes down to just latency and bandwidth
> > between two machines, but it is a ring topology. If you have
> > more machines on a ring they will share that same bandwidth.
You might want to get a little bit more informed on SCI topologies and
their scalability characteristics. One good paper I can recommend is
from the SCI Europe 98 conference, available at
http://www.scali.com/whitepaper/scieurope98/scale_paper.pdf . It gives
some general calculations of all-to-all communication scalability with
SCI-torus-topologies. The numbers are somewhat outdated, but the
principle is still correct.
Your statement is not well founded because SCI is not a ring-topology,
but a point-to-point topology. It's similar to somebody saying
"(Centralized) Switches do not scale because all the traffic needs to to
through one box". It depends on the switch, I'd rather say. And on some
other things, like packet format, routing method etc.
Regarding the limited value of point-to-point performance, you are
right, but this applies to all networks.
Joachim
--
| _ RWTH| Joachim Worringen
|_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen
| |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim
|_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From Eugene.Leitl at lrz.uni-muenchen.de Tue Jan 15 04:35:42 2002
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl)
Date: Tue, 15 Jan 2002 10:35:42 +0100 (MET)
Subject: SUCCESS!!! ElfBoot loading images over linuxBIOS... (fwd)
Message-ID:
-- Eugen* Leitl leitl
______________________________________________________________
ICBMTO: N48 04'14.8'' E11 36'41.2'' http://www.leitl.org
57F9CFD3: ED90 0433 EB74 E4A9 537F CFF5 86E7 629B 57F9 CFD3
---------- Forwarded message ----------
Date: 15 Jan 2002 02:26:22 -0700
From: Eric W. Biederman
To: LinuxBIOS
Subject: SUCCESS!!! ElfBoot loading images over linuxBIOS...
I have just successfully created a version of elfboot.c that allows
you to load an image over linuxBIOS is running in ram. Executables
trying to load ontop of linuxBIOS has been one of the biggest problems
with the elfboot stuff. So it should be much more user friendly now.
I move linuxBIOS out of the way to the very top of memory at the last
possible instant. So this should not have much of an impact on
anything else.
Tommorrow my aim is to clean up and check this code in.
And of course a more detailed description.
/* The problem:
* Static executables all want to share the same addresses
* in memory because only a few addresses are reliably present on
* a machine, and implementing general relocation is hard.
*
* The solution:
* - Allocate a buffer twice the size of the linuxBIOS image.
* - Anything that would overwrite linuxBIOS copy into the lower half of
* the buffer.
* - After loading an ELF image copy linuxBIOS to the upper half of the
* buffer.
* - Then jump to the loaded image.
*
* Benefits:
* - Nearly arbitrary standalone executables can be loaded.
* - LinuxBIOS is preserved, so it can be returned to.
* - The implementation is still relatively simple,
* and much simpler then the general case implemented in kexec.
*
*/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From hanzl at noel.feld.cvut.cz Tue Jan 15 08:50:53 2002
From: hanzl at noel.feld.cvut.cz (hanzl at noel.feld.cvut.cz)
Date: Tue, 15 Jan 2002 14:50:53 +0100
Subject: export disks from nodes?
In-Reply-To: <20020111193511U.hanzl@unknown-domain>
References: <20020110191245M.hanzl@unknown-domain>
<20020111193511U.hanzl@unknown-domain>
Message-ID: <20020115145053U.hanzl@unknown-domain>
> # bpsh 0 rpc.mountd --no-nfs-version 3
> svc_tcp.c - cannot getsockname or listen: Invalid argument
> mountd: cannot create tcp service.
Solved! (And NFS server on scyld node works now.)
The problem was triggered by presence of socket on mountd's
stdin. Nasty mountd calls getsockname(0,...) in rpcmisc.c/rpc_init()
called from mountd.c/main(). Normally stdin is not a socket,
getsockname(0,...) returns ENOTSOCK and mountd starts up. If stdin is
a socket, mountd changes its behavior. When mountd is bproc-moved, it
also has socket on stdin, mountd misinterprets this situation and things
go havoc.
Maybe mountd should be changed not to rely on this test (and use
an aditional option instead).
If you do not want to change mountd, you can avoid the problem by
avoiding socket on mountd's stdin. Unfortunately, bpsh -n 0 rpc.mountd
is not enough - maybe it redirects master's /dev/null to slave process
(through socket :) - if yes, maybe bpsh could be changed to use
slave's /dev/null.
To start mountd on slave node, I nfs-mounted /bin etc., did "bpsh 0 bash"
and instructed this bash to run rpc.mountd
Message-ID: <5.1.0.14.2.20020115102010.03b0c040@mail.harddata.com>
With regards to your message at 10:01 AM 1/15/02,
beowulf-request at beowulf.org. Where you stated:
>Date: Mon, 14 Jan 2002 18:36:22 +0100
>From: Alberto Ramos
>To: Lista de correo sobre Beowulf
>Subject: Advice on NIC cards.
>
>
> Hello all. reading your comments about Gigabit Ethernet (thankd to all!),
>we have decided to use fast Ethernet at first, and make some test.
>
> Choosing the NIC, I think that NetGear GA620 is a good choice, but I want
>to know what you think about that.
I think you should look at the new Intel 8490XP Gigabit cards.
Same price range as the Netgear, fast, and trouble free..
With our best regards,
Maurice W. Hilarius Telephone: 01-780-456-9771
Hard Data Ltd. FAX: 01-780-456-9772
11060 - 166 Avenue mailto:maurice at harddata.com
Edmonton, AB, Canada http://www.harddata.com/
T5X 1Y3
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From maurice at harddata.com Tue Jan 15 12:30:29 2002
From: maurice at harddata.com (Maurice Hilarius)
Date: Tue, 15 Jan 2002 10:30:29 -0700
Subject: Beowulf digest, Vol 1 #707 - 9 msgs
In-Reply-To: <200201151701.g0FH1uR02280@blueraja.scyld.com>
Message-ID: <5.1.0.14.2.20020115102220.03b11810@mail.harddata.com>
With regards to your message at 10:01 AM 1/15/02,
beowulf-request at beowulf.org. Where you stated:
>Message: 6
>Date: Mon, 14 Jan 2002 14:41:20 -0600 (CST)
>From: Steven Timm
>To: Steve Gaudet
>cc:
>Subject: RE: Boards for Fastest Intel Processors
>
>Steve--thanks for the note.
>Isn't the new Xeon package going to be 478 pins, same as the
>"northwood" version of the P4? or am I getting confused here.
>
>Will the supermicro boards such as the P4DC6+ or P4DCE+ support
>the new,"prestonia" xeons or just the old ones?
>I am seeing vendors that claim the boards will but on
>the board spec sheet itself it only talks about the 603-pin package
>which I assume is the older "foster" package. Is that right?
>
>Steve Timm
You are right.
The 860 chipset is designed for 603 style packages..
Maybe they ARE coming out with faster XEONs, but so far I have seen nothing
from Intel indicating this for this socket.. And we are an Intel OEM and
server builder program member.
With our best regards,
Maurice W. Hilarius Telephone: 01-780-456-9771
Hard Data Ltd. FAX: 01-780-456-9772
11060 - 166 Avenue mailto:maurice at harddata.com
Edmonton, AB, Canada http://www.harddata.com/
T5X 1Y3
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From SGaudet at turbotekcomputer.com Tue Jan 15 13:30:47 2002
From: SGaudet at turbotekcomputer.com (Steve Gaudet)
Date: Tue, 15 Jan 2002 13:30:47 -0500
Subject: Beowulf digest, Vol 1 #707 - 9 msgs
Message-ID: <3450CC8673CFD411A24700105A618BD61BED10@911TURBO>
Hello Maurice,
> With regards to your message at 10:01 AM 1/15/02,
> beowulf-request at beowulf.org. Where you stated:
> >Message: 6
> >Date: Mon, 14 Jan 2002 14:41:20 -0600 (CST)
> >From: Steven Timm
> >To: Steve Gaudet
> >cc:
> >Subject: RE: Boards for Fastest Intel Processors
> >
> >Steve--thanks for the note.
> >Isn't the new Xeon package going to be 478 pins, same as the
> >"Northwood" version of the P4? or am I getting confused here.
> >
> >Will the supermicro boards such as the P4DC6+ or P4DCE+ support
> >the new,"prestonia" xeons or just the old ones?
> >I am seeing vendors that claim the boards will but on
> >the board spec sheet itself it only talks about the 603-pin package
> >which I assume is the older "foster" package. Is that right?
> >
> >Steve Timm
>
> You are right.
> The 860 chipset is designed for 603 style packages..
> Maybe they ARE coming out with faster XEONs, but so far I
> have seen nothing
> from Intel indicating this for this socket.. And we are an
> Intel OEM and
> server builder program member.
Intel has on their web site news about the new 2.2Ghz Xeon.
Intel? Xeon(tm) Processor
Performance, Scalability, and Value for Dual-Processor-Based Workstations
The Intel? Xeon(tm) processor with Intel? NetBurst(tm) microarchitecture is
Intel's newest, most advanced 32-bit microarchitecture for workstations.
Designed to deliver superior performance, scalability, and reliability, the
Intel Xeon processor is ideally suited for the most demanding workstation
applications. The Intel Xeon processor extends the bandwidth and
performance-enhancing features of the Intel NetBurst microarchitecture with
dual-processor support, providing even greater performance for
multi-threaded applications and multitasking environments. Intel Xeon
processor-based workstations deliver exceptional floating-point performance
for enhanced 3D visualization and intensive scientific calculations.
The World's Most Technologically Advanced IA-32 Workstation Systems Today
Dual-processor workstations based on the Intel? Xeon(tm) processor are the
most advanced, powerful systems in the IA-32 family. Intel Xeon processors
are designed to deliver performance, headroom, and scalability for existing
and emerging workstation applications, especially multi-threaded and
multitasking applications. Intel Xeon processors maximize performance,
productivity, and reliability with the following features:
Dual-processor support
512K L2 advanced transfer cache memory
Intel? NetBurst(tm) microarchitecture
400-MHz data bus frequency
On-die thermal sensor
System management bus
Intel? 860 chipset support >>>>>>>>>603-pin ?PGA Package with Dual-Processor
Support
There's more but didn't want to bore everyone.
Cheers,
Steve Gaudet
Linux Solutions Engineer
.....
===================================================================
| Turbotek Computer Corp. tel:603-666-3062 ext. 21 |
| 8025 South Willow St. fax:603-666-4519 |
| Building 2, Unit 105 toll free:800-573-5393 |
| Manchester, NH 03103 e-mail:sgaudet at turbotekcomputer.com |
| web: http://www.turbotekcomputer.com |
===================================================================
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From charwell at digitalpulp.com Tue Jan 15 18:52:24 2002
From: charwell at digitalpulp.com (Chris Harwell)
Date: Tue, 15 Jan 2002 18:52:24 -0500 (EST)
Subject: (no subject)
In-Reply-To: <3C449F7D.1DEC704B@myri.com>
Message-ID:
hi,
a little off topic.
i'm having alot of carrier errors on a eth2: Intel Corporation 82557
[Ethernet Pro 100] for the head node connection to the outside world.
trying to search for information on carrier errors is proving difficult.
does anyone know a defintion for them and/or where to look for more info?
eth2 Link encap:Ethernet HWaddr 00:03:47:73:8C:A0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:7805917 errors:0 dropped:0 overruns:0 frame:0
TX packets:8046787 errors:0 dropped:0 overruns:8 carrier:19182
collisions:23104 txqueuelen:100
RX bytes:2116994613 (2018.9 Mb) TX bytes:447885894 (427.1 Mb)
Interrupt:9 Base address:0x3000
thanks,
--
chris
charwell at digitalpulp.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From wiseowl at accessgate.net Tue Jan 15 15:55:44 2002
From: wiseowl at accessgate.net (Doug Shubert)
Date: Tue, 15 Jan 2002 15:55:44 -0500
Subject: InfiniBand Solutions Conference
Message-ID: <3C449750.135D3F10@accessgate.net>
http://www.infinibandta.org/events/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From patrick at myri.com Tue Jan 15 16:30:37 2002
From: patrick at myri.com (Patrick Geoffray)
Date: Tue, 15 Jan 2002 16:30:37 -0500
Subject: Beowulf with Gigabit Ethernet
References: <20020110154234.A17039@delta.ft.uam.es> <3C3DA86F.189FCC96@scali.no> <3C3DDE66.35B2D721@myri.com> <3C3EBB92.AF4C259B@scali.no>
Message-ID: <3C449F7D.1DEC704B@myri.com>
Hi ?ystein,
?ystein Gran Larsen wrote:
> and trust helpful individuals that have access to Myrinet systems for the numbers.
> If you can direct us at an official resource with such numbers we would be very
> grateful.
The curve for the latest hardware is on my laptop, but I had not spare time to
make it pretty for the web site. It will happen one of these days.
The problem by comparing curves like that is that it means nothing. In this
curve
for example, the PCI is the bottleneck for Myrinet, and I would guess it is also
the case for the SCI curve. You do not want to show the limit of this PCI bus,
you want to show the limit of the interconnect itself. Doing so is confusing,
it's
impossible to know what is the information presented by the graphics.
> By the way, this type of performance numbers for the latest release of our
> software is not on our web site yet, but it's on its way. The numbers for previous
> releases can be found in the performance section on www.scali.com
It's funny that performance curves are never at the top of the TODO list, a
proof
that nobody really care about them :-)
Regards.
----------------------------------------------------------
| Patrick Geoffray, Ph.D. patrick at myri.com
| Myricom, Inc. http://www.myri.com
| Cell: 865-389-8852 685 Emory Valley Rd (B)
| Phone: 865-425-0978 Oak Ridge, TN 37830
----------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From Peter.Koenig at phys.upb.de Wed Jan 16 05:47:45 2002
From: Peter.Koenig at phys.upb.de (Peter H. Koenig)
Date: Wed, 16 Jan 2002 11:47:45 +0100
Subject: charmm scalability on 2.4 kernels
References: <3C3DFEAA.4A1AAC21@phys.upb.de> <3C4557C2.CE938A8E@phys.upb.de>
Message-ID: <3C455A51.92748C73@phys.upb.de>
Hello,
Bogdan Costescu wrote:
> That is actually what I have observed during the last 3 years of
running
> different versions of kernels, MPI libraries and CHARMM. Running
using
> only one transport (TCP or shared mem) is always better than mixing
them,
> f.e (using LAM-6.5.6):
>
> CPUs nodes real time (min) transports
> 4 4 5.95 TCP
> 4 2 7.08 TCP+USYSV
>
> As you can see, the difference is quite significant.
Do you also have the numbers for 2 dual nodes using only TCP ?
Peter Koenig
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From bogdan.costescu at iwr.uni-heidelberg.de Wed Jan 16 07:06:49 2002
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Wed, 16 Jan 2002 13:06:49 +0100 (CET)
Subject: charmm scalability on 2.4 kernels
In-Reply-To: <3C455A51.92748C73@phys.upb.de>
Message-ID:
On Wed, 16 Jan 2002, Peter H. Koenig wrote:
> Do you also have the numbers for 2 dual nodes using only TCP ?
No, not for 2 dual nodes, I only made a (bigger) test run on 8 nodes:
CPUs Nodes Real-time(min) Transports
8 8 16.00 TCP
16 8 22.86 TCP+USYSV
16 8 27.24 TCP
--
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From Peter.Koenig at phys.uni-paderborn.de Wed Jan 16 07:22:47 2002
From: Peter.Koenig at phys.uni-paderborn.de (Peter H. Koenig)
Date: Wed, 16 Jan 2002 13:22:47 +0100
Subject: Queueing problem
References: <3C3E05FB.41196579@phys.upb.de> <3C455791.1AC1C9B5@phys.upb.de> <3C455D11.881C1D27@phys.upb.de> <3C456FEB.1EDB705E@phys.upb.de>
Message-ID: <3C457097.ED8EC71A@phys.uni-paderborn.de>
Hello,
recently we acquired new machines we want to integrate into our
computational workforce. We are currently using a DQS complex (A) of
alpha-workstations.
The new machines are integrated into two complexes:
(B) a beowulf-style cluster of Linux-PC including a headnode mainly for
parallel applications and development
(C) a pool of workstations for a (student-) computer lab, which can be
used for short calculations
We are also planning on investing in a further cluster (D) which may be
open for other groups.
Since the user base for each of the complexes (except for A and B) is
different we think that we might need to separate the complexes.
The jobs are to be submitted on the workstations (A) and routed to the
appropriate queue for execution. The submission and routing of jobs
should be possible with least involvement of the user. It should be
possible to restrict routing to other complexes to certain rules e.g.
routing to the computer lab should only be possible if a given
percentage of the queues there is idle (for allowing local submissions
of jobs, which should start without larger delays).
Can this be accomplished transparently to the user ? Can someone point
me to a queuing software which allows the specification of such rules
(even if this means quitting DQS)?
As far as I understand the documentation, DQS _does_ allow routing to
other complexes, but I have neither seen any information on how this can
be accomplished nor on whether rules for routing can be specified.
Peter Koenig
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From Peter.Lindgren at experian.com Wed Jan 16 10:07:26 2002
From: Peter.Lindgren at experian.com (Peter Lindgren)
Date: Wed, 16 Jan 2002 09:07:26 -0600
Subject: cluster frustrations
Message-ID:
Last year I began trying to port some of my applications to a cluster. Those applications are very partitionable, so it seemed like a great opportunity. I did a lot of Internet research looking for easy solutions for building clusters. Not having found compendium sites yet, I did it the hard way, slowly building up a list of candidate packages. Scyld was pretty easy to find and also easy to get from LinuxCentral or Cheapbytes. I discovered Oscar and Rocks and SCE and IBM's CSK, and downloaded them all. I studied an MPI primer, and adapted my code for a cluster. I wanted to install different cluster systems, try my particular application and really see which one worked best for me.
I got 10 PCs diverted to the attempt. Our networking people furnished a switch. A guy from our help desk hooked them all up. Then I was on my own...
BUT, I'm not a Unix/Linux administrator. Even though I've installed and played with Linux a number of times on workstations both at home and at work, I've had a lot of trouble getting the cluster working. I've learned a lot and had some minor successes, but it still just seems too hard.
I found the mailing lists for some of these packages. I've followed them with interest and have sometimes gotten perfect to-the-point help (and sometimes no response at all.) I got Scyld Beowulf running (with occasional help from a couple of Unix admins as well as from some guys on the Beowulf list), enough to show that my application could work on a cluster. Each time I tried to install another or later version, however, I had more problems. Right now, things aren't stable and my application often bombs before finishing, although it worked before.
I've tried to install Rocks a number of times. I got through (once) to where the compute nodes were up, but I haven't been able to get the latest version to work yet. In fairness, I haven't tried contacting their list even though they seem willing to help - I'm just too discouraged or shy I guess.
It sure doesn't look like I will get multiple systems installed to do my comparisons. I haven't been able to find any published reviews or comparisons either.
So here are my pleas:
USER community: has anyone independently tested these systems? In particular, paying attention to their ease-of-installation and configuration by those who aren't Unix experts. I'm sure the various groups are TRYING to make installation/configuration as simple as possible, but how far have they gotten?
DEVELOPER community: There are potential users out there who would benefit from cluster computing, but who aren't Unix experts themselves, and don't have such an available expert on staff. I'm not saying a completely non-technical user should be able to do this, but how about a reasonably intelligent engineer/scientist/programmer?
EVERYONE: should I:
just stop expecting to be able to do this myself as a non-admin?
stop expecting such systems will just work when you put in the CD?
get used to banging my head on the wall for a few days or weeks?
get over my reluctance to keep asking for help on the lists?
get an admin devoted to my project?
A reference showing how many OTHER people can manage to install clusters:
http://Beowulf-underground.org/success.html
proving I must be the village idiot.
Some references with links to multiple systems:
http://www.lcic.org/computational.html
http://clusters.top500.org
http://www.csse.monash.edu.au/~rajkumar/cluster/
http://Beowulf-underground.org/
Papers by individual project teams that discuss other projects:
http://www.cacr.caltech.edu/cluster2001/program/talks/oscar.pdf
http://rocks.npaci.edu/papers/ieee-cluster-2001/paper.pdf
P.S. The best analogy I've made to how this feels is attempting to fix your own car. It seems promising, you've maybe done a few things in the past that worked. Now, you've gotten in big trouble. You really ought to just take it in to your mechanic and hope they can fix it (but it's so embarrassing!) And, really you're so mad you just want to take it out on the car - maybe push it off a nearby cliff...
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rabrads at erie.net Wed Jan 16 11:28:37 2002
From: rabrads at erie.net (rick bradshaw)
Date: Wed, 16 Jan 2002 11:28:37 -0500
Subject: aggregate cluster statswul
References:
Message-ID: <3C45AA35.7020401@erie.net>
Andrew,
I have a package called AmIHappy that uses some perl, php, soap that
collects data from up to 128 nodes in a cluster. It is changeable by
just writing a script to the system for what ever you need and it
connects to a MySQL database to store data on the nodes. It can run at
boot time, or when ever the user wants. It has a web front end that
shows status of all nodes, failure nodes, or one specific node. contact
me if you want to see the software.
Rick Bradshaw
rabrads at linux-fan.com
Andrew Fant wrote:
>I am currently running a 118 processor cluster, with bigbrother and larrd to monitor
>system status and gather performance and utilization data on a node by node basis.
>However, my management is now requesting aggregate statistics, and a web page
>showing load, etc, across the entire cluster.
>
>Has anybody hacked something like this themselves? I would rather stick close to
>bigbrother and larrd, just to simplify implementation, but I have been playing with
>SGI's open source release of PCP, and I am not adverse to switching to another
>(free) solution if it can simplify the process.
>
>Thanks for any suggestions,
> Andy
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From jim at ks.uiuc.edu Wed Jan 16 11:31:06 2002
From: jim at ks.uiuc.edu (Jim Phillips)
Date: Wed, 16 Jan 2002 10:31:06 -0600 (CST)
Subject: cluster frustrations
In-Reply-To:
Message-ID:
Hi,
I've had Scyld running successfully for quite a while, and have even
taught others (http://www.ks.uiuc.edu/Research/namd/tutorial/NCSA2001/).
I know what I'm doing, and have even set up an older non-Scyld cluster,
but I was tearing my hair out for several weeks at the beginning because
of random crashes. These turned out to be hardware and BIOS related
rather than software-related, altough different versions of the software
exhibited the problems to varying degrees.
When you build a cluster, you are often taking consumer-class hardware and
driving it much harder than a normal user. You also have zero error
tolerance across the entire cluster. While in theory this should all be
worked out in testing, cluster users are the only people likely to see
errors in the real world. In our case, the problem was that a BIOS
setting of "optimal" for some PCI bus parameters was leading to occasional
data corruption between the CPU and the network card. Since we had nice
network cards, capable of doing their own checksumming, the errors were
never caught. The was never an issue on the old cluster, which used cheap
"tulip" cards and made the CPU do the checksumming.
A normal user would drive maybe 100 MB per day across that network card,
probably at 10 Mbit, or 1/10 of it's peak capacity, almost all of the data
would be incoming, probably web images. We were driving 100 MB across
every 15 seconds, which is 5000x more opportunities for error. Put 32
machines together and you have over 100,000x the error rate that a typical
user would see. Add in a 10x lower tolerance for program failure and you
could easily say that a cluster user is demanding one million times more
hardware reliability than a normal desktop user.
This is why server-class, error-correcting hardware exists.
-Jim
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From lindahl at conservativecomputer.com Wed Jan 16 12:33:12 2002
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed, 16 Jan 2002 12:33:12 -0500
Subject: cluster frustrations
In-Reply-To: ; from Peter.Lindgren@experian.com on Wed, Jan 16, 2002 at 09:07:26AM -0600
References:
Message-ID: <20020116123312.B17480@wumpus.foo>
On Wed, Jan 16, 2002 at 09:07:26AM -0600, Peter Lindgren wrote:
> P.S. The best analogy I've made to how this feels is attempting to
> fix your own car. It seems promising, you've maybe done a few things
> in the past that worked. Now, you've gotten in big trouble. You
> really ought to just take it in to your mechanic and hope they can
> fix it (but it's so embarrassing!) And, really you're so mad you
> just want to take it out on the car - maybe push it off a nearby
> cliff...
Personally, I'm happy to pay a mechanic to fix my car, because I know
a lot about supercomputing, but not that much about fixing cars.
-- greg
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From joachim at lfbs.RWTH-Aachen.DE Wed Jan 16 13:36:09 2002
From: joachim at lfbs.RWTH-Aachen.DE (Joachim Worringen)
Date: Wed, 16 Jan 2002 19:36:09 +0100
Subject: cluster frustrations
References:
Message-ID: <3C45C819.5B040A53@lfbs.rwth-aachen.de>
Peter Lindgren wrote:
> A reference showing how many OTHER people can manage to install clusters:
> http://Beowulf-underground.org/success.html
> proving I must be the village idiot.
;-)
I'm quite confident that you're not the vi. I bet that 30% of those
"success stories" already ceased to exist as such, 50% are having
similar problems to yours and 20% are running "perfectly".
I.e., I use a cluster (9 Quad-Xeon nodes) which the computing centre
here in J?lich has built and is maintaining. It runs stable. with kernel
2.2, Myrinet and GM 1.1.3 - but with really unsatisfactory
(communication) performance. But they don't get it to run reliably with
the current Linux/GM/MPICH versions which of course should run faster,
better, nicer. I don't blame Linux or Myrinet for these problems - I
just want to show that even people capable of running Crays, SP-2s,
Paragon, any kind of workstatons etc. have a hard time setting up and
maintaining a Linux cluster. And the next update is usually the next
nightmare.
Joachim
--
| _ RWTH| Joachim Worringen
|_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen
| |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim
|_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From becker at scyld.com Wed Jan 16 14:03:07 2002
From: becker at scyld.com (Donald Becker)
Date: Wed, 16 Jan 2002 14:03:07 -0500 (EST)
Subject: (no subject)
In-Reply-To:
Message-ID:
On Tue, 15 Jan 2002, Chris Harwell wrote:
> a little off topic.
This should be on the eepro100 at scyld.com list.
> i'm having alot of carrier errors on a eth2: Intel Corporation 82557
> [Ethernet Pro 100] for the head node connection to the outside world.
What driver version? What is the detection message?
> trying to search for information on carrier errors is proving difficult.
It's pretty rare, and the problem is almost always a duplex mismatch.
> does anyone know a defintion for them and/or where to look for more info?
Bad cables or a broken link partner are the first places to check.
> eth2 Link encap:Ethernet HWaddr 00:03:47:73:8C:A0
> RX packets:7805917 errors:0 dropped:0 overruns:0 frame:0
> TX packets:8046787 errors:0 dropped:0 overruns:8 carrier:19182
> collisions:23104 txqueuelen:100
Hmmm, this is likely reporting out-of-window collisions.
You are half duplex mode (from the non-zero collision count).
Check that your link partner is also in half duplex mode. If on a
repeater, check all connected devices are in half duplex mode.
Donald Becker becker at scyld.com
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210 Second Generation Beowulf Clusters
Annapolis MD 21403 410-990-9993
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From maurice at harddata.com Wed Jan 16 15:07:45 2002
From: maurice at harddata.com (Maurice Hilarius)
Date: Wed, 16 Jan 2002 13:07:45 -0700
Subject: Ethernet problem
In-Reply-To: <200201161701.g0GH1YR06319@blueraja.scyld.com>
Message-ID: <5.1.0.14.2.20020116130618.03fe5050@mail.harddata.com>
With regards to your message at 10:01 AM 1/16/02,
beowulf-request at beowulf.org. Where you stated:
>Date: Tue, 15 Jan 2002 18:52:24 -0500 (EST)
>From: Chris Harwell
>Reply-To:
>To:
>Subject: (no subject)
>
>hi,
>
>a little off topic.
>
>i'm having alot of carrier errors on a eth2: Intel Corporation 82557
>[Ethernet Pro 100] for the head node connection to the outside world.
>
>trying to search for information on carrier errors is proving difficult.
>
>does anyone know a defintion for them and/or where to look for more info?
I would suggest signing up for the Scyld EtherExpress Pro mailinglist:
http://www.scyld.com/mailman/listinfo/eepro100
or:
mailto:eepro100-request at scyld.com?subject=subscribe
With our best regards,
Maurice W. Hilarius Telephone: 01-780-456-9771
Hard Data Ltd. FAX: 01-780-456-9772
11060 - 166 Avenue mailto:maurice at harddata.com
Edmonton, AB, Canada http://www.harddata.com/
T5X 1Y3
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From hahn at physics.mcmaster.ca Wed Jan 16 15:09:33 2002
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Wed, 16 Jan 2002 15:09:33 -0500 (EST)
Subject: cluster frustrations
In-Reply-To:
Message-ID:
> This is why server-class, error-correcting hardware exists.
uh, let's not go too far! it's quite possible to drive
well-chosen and carefully-configured commodity hardware
100%, 24/7, wire-speed, platter-level, etc. but it definitely
requires a certain amount of luck/study/experience.
there are shortcuts, of course. for instance, you can buy
very nicely configured building blocks from compaq/dell/etc,
usually from their "business desktop/workstation" lines
and expect robustness under load, albeit often at a slightly
lower performance and/or higher price than white-box,
hand-picked-with-TLC parts...
I still think that beowulf implies commodity parts, which
in many cases rules out "server-class".
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From patrick at myri.com Wed Jan 16 19:35:43 2002
From: patrick at myri.com (Patrick Geoffray)
Date: Wed, 16 Jan 2002 19:35:43 -0500
Subject: cluster frustrations
References: <3C45C819.5B040A53@lfbs.rwth-aachen.de>
Message-ID: <3C461C5F.48F046E7@myri.com>
Joachim,
Joachim Worringen wrote:
> But they don't get it to run reliably with
> the current Linux/GM/MPICH versions which of course should run faster,
> better, nicer. I don't blame Linux or Myrinet for these problems -
Obviously, you do. Inciting another flame war ?
I have searched in the log of the Myricom support and I found one
help ticket from Ulrich Detert (help ticket #7197, Wed Jun 13 15:07:59
2001) with the configuration that you describe and a piece of MPI code
supposed to trigger a malfunction. This code was run the same day with
recent GM and MPICH-GM releases and shown no problem whatsoever. I have
tried a few minutes ago with the current software, and again no
problem. The help ticket was closed Fri Sep 14 10:13:28 2001 with no
reply from the customer.
So if you really experienced problems with this machine, please
contact help at myri.com, this is the first step toward happiness.
> just want to show that even people capable of running Crays, SP-2s,
> Paragon, any kind of workstatons etc. have a hard time setting up and
> maintaining a Linux cluster. And the next update is usually the next
You cannot compare Crays/SP2 with do-it-yourself Linux clusters. When
you buy a Cray or a SP(2,3), you get a machine that experts build for you,
you get softwares that experts install for you, you get often someone
on-site to take your hand the first month or even during the life of the
machine. The only problem is that you pay a lot for that.
Linux clusters are not easy to install, it's wrong to believe they are.
To have access to the Myricom support archive, I can tell you that a
large number of problems are related to customers trying the
do-it-yourself way, with no cluster experience, only Windows background,
who do not know exactely what's a kernel and how to compile one, who
believe that Redhat is pure Linux and have never heard about the MPI
specs.
Not surprisingly, customers using a third party, either a big vendor
like IBM or a small one like many people on this list, where people
know what they are doing, have usually a much smoother experience.
But you still have to pay a little for that.
So the do-it-yourself way ? Why not, but if it fails, call the mechanics.
Patrick
----------------------------------------------------------
| Patrick Geoffray, Ph.D. patrick at myri.com
| Myricom, Inc. http://www.myri.com
| Cell: 865-389-8852 685 Emory Valley Rd (B)
| Phone: 865-425-0978 Oak Ridge, TN 37830
----------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From Drake.Diedrich at anu.edu.au Wed Jan 16 20:22:44 2002
From: Drake.Diedrich at anu.edu.au (Drake Diedrich)
Date: Thu, 17 Jan 2002 12:22:44 +1100
Subject: Queueing problem
In-Reply-To: <3C457097.ED8EC71A@phys.uni-paderborn.de>
References: <3C3E05FB.41196579@phys.upb.de> <3C455791.1AC1C9B5@phys.upb.de> <3C455D11.881C1D27@phys.upb.de> <3C456FEB.1EDB705E@phys.upb.de> <3C457097.ED8EC71A@phys.uni-paderborn.de>
Message-ID: <20020117122243.A2119@duh.anu.edu.au>
On Wed, Jan 16, 2002 at 01:22:47PM +0100, Peter H. Koenig wrote:
> Hello,
>
> recently we acquired new machines we want to integrate into our
> computational workforce. We are currently using a DQS complex (A) of
> alpha-workstations.
> The new machines are integrated into two complexes:
> (B) a beowulf-style cluster of Linux-PC including a headnode mainly for
> parallel applications and development
> (C) a pool of workstations for a (student-) computer lab, which can be
> used for short calculations
> We are also planning on investing in a further cluster (D) which may be
> open for other groups.
>
> Since the user base for each of the complexes (except for A and B) is
> different we think that we might need to separate the complexes.
Sounds like it. Are student's also submitting batch jobs, or only
running interactive jobs? If the goal is just to use the idle cycles on the
students' interactive machines, having qidle start up at login may be all
that is necessary to keep batch jobs from interfering with their work.
Setting the priority to automatically nice all jobs, and
load_masg/load_alarm to discourage scheduling on the C nodes when there are
A/B nodes available should also help reduce impact on them even when qidle
isn't running.
If both AB and C users are submitting jobs, and you want to give each
lower priority to the other's queues, you can do that by putting two queues
on each node, one for the AB users and one for the C users. One should be
subordinated to the other, so that it's jobs suspend when someone with
greater priority on that set of nodes queues a job. You'll want lots of
swap so there's no memory impact from suspended jobs. user_acls or REQUIRED
resources can be used to limit jobs to the allowed queue on each node. I
suppose using a consumable resource for the student computers could limit
jobs to a certain fraction of the C nodes (never used them myself), but if
you're suspending completely when students are using the C nodes I see no
reason not to queue jobs on all low priority C queues at once.
I'd restrict parallel jobs to just the B nodes though, with a resource
specified in their queues that I'd strongly encourage all parallel users to
set when queueing their jobs, otherwise a single suspended C-node in a large
parallel job could suspend the entire job, while still tying up many B-nodes
until the student finishes and the parallel computation can continue.
>
> The jobs are to be submitted on the workstations (A) and routed to the
> appropriate queue for execution. The submission and routing of jobs
> should be possible with least involvement of the user. It should be
> possible to restrict routing to other complexes to certain rules e.g.
> routing to the computer lab should only be possible if a given
> percentage of the queues there is idle (for allowing local submissions
> of jobs, which should start without larger delays).
>
> As far as I understand the documentation, DQS _does_ allow routing to
> other complexes, but I have neither seen any information on how this can
> be accomplished nor on whether rules for routing can be specified.
There's the intercell routing, but I've never known anyone to use it. If
you have tight enough ties between cells that users and files are the same
and jobs are likely to be portable, there seems little point in running
separate qmaster's.
> Can this be accomplished transparently to the user ? Can someone point
> me to a queuing software which allows the specification of such rules
> (even if this means quitting DQS)?
>
With more work, you could specify required resources on all queues, and
have your own userspace code that runs through orphaned jobs and qalters
their requirements to end up in the appropriate nodes. This might get you
exactly what you want, but I don't know of any examples.
Generic NQS had routing queues that could probably have done this all
more naturally, but didn't support multinode jobs (just SMP jobs). Not sure
how easy PBS plug-in schedulers are to write, or what enhancements SGE has
added to DQS yet.
-Drake
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From joachim at lfbs.RWTH-Aachen.DE Thu Jan 17 03:12:24 2002
From: joachim at lfbs.RWTH-Aachen.DE (Joachim Worringen)
Date: Thu, 17 Jan 2002 09:12:24 +0100
Subject: cluster frustrations
References: <3C45C819.5B040A53@lfbs.rwth-aachen.de> <3C461C5F.48F046E7@myri.com>
Message-ID: <3C468768.6F75F37F@lfbs.rwth-aachen.de>
Patrick Geoffray wrote:
>
> Joachim,
>
> Joachim Worringen wrote:
> > But they don't get it to run reliably with
> > the current Linux/GM/MPICH versions which of course should run faster,
> > better, nicer. I don't blame Linux or Myrinet for these problems -
>
> Obviously, you do. Inciting another flame war ?
No, I never intend to incite flame wars, but discussions. I can tell you
a lot of stories about mal-functioning self-made SCI clusters, but I
have no hands-on experience with such a cluster being operated in a
similar (production) environment, because such customers usually chose
Scali-made systems. And I prefer to talk about hands-on experience, not
second-hand stories. The Scali-equipped systems I know of run well now,
although this hasn't always been like this (mostly due to bugs/strange
features in the last generation hardware, LC2). But Scali systems, to
stick with these, are well-defined platforms, running qualified kernels
etc., which (if not using such) is one source of problems.
[...]
> So if you really experienced problems with this machine, please
> contact help at myri.com, this is the first step toward happiness.
I had reproducable application aborts when running PMB with 32
processes. I informed Ulrich Detert about this, and he confirmed the
problems. Up to now, they stick with 2.2 (which runs stable, but not as
fast it could), which does *not* mean, that such a system wouldn't work
with 2.4 and current GM - it's only that these guys did try to find that
"golden configuration" during their update (or by chance did hit the one
dirty configuration) and didn't succeed.
Once again: I don't doubt that there do exist Myrinet systems which run
perfectly. There just may be a lot of chances (with self-made clusters
in general) to make mistakes, hindering stable operation.
> You cannot compare Crays/SP2 with do-it-yourself Linux clusters.
Exactly. Paying less money means investing more time. Which may be
equivalent to money.
Joachim
--
| _ RWTH| Joachim Worringen
|_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen
| |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim
|_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rsweet at atos-group.nl Thu Jan 17 05:32:31 2002
From: rsweet at atos-group.nl (rsweet at atos-group.nl)
Date: Thu, 17 Jan 2002 11:32:31 +0100 (MET)
Subject: which kernel on ServerWorksLE/which nfs?
Message-ID:
I'm wondering if the list could indulge my curiosity by sharing which
kernel versions they are running sucessfully (for me meaning it stays up
for at least several weeks...) on ServerWorks LE chipsets (in my case
Asus CUR_DLS and SuperMicro P3TDE/370)?
In particular is anyone running a 2.4 kernel with any reliability?
If so, what NICS are you using?
Are you using nfsv3 (knfs) or nfsv2/userland nfs?
regards,
-Ryan Sweet
--
Ryan Sweet
Atos Origin Engineering Services
http://www.aoes.nl
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rgb at phy.duke.edu Thu Jan 17 11:29:05 2002
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 17 Jan 2002 11:29:05 -0500 (EST)
Subject: cluster frustrations (Suggestions for same)
In-Reply-To:
Message-ID:
On Wed, 16 Jan 2002, Mark Hahn wrote:
> > This is why server-class, error-correcting hardware exists.
>
> uh, let's not go too far! it's quite possible to drive
> well-chosen and carefully-configured commodity hardware
> 100%, 24/7, wire-speed, platter-level, etc. but it definitely
> requires a certain amount of luck/study/experience.
>
> there are shortcuts, of course. for instance, you can buy
> very nicely configured building blocks from compaq/dell/etc,
> usually from their "business desktop/workstation" lines
> and expect robustness under load, albeit often at a slightly
> lower performance and/or higher price than white-box,
> hand-picked-with-TLC parts...
>
> I still think that beowulf implies commodity parts, which
> in many cases rules out "server-class".
I totally agree with Mark here -- I can buy somewhere between 2 and 4
ECC-memory-equipped nodes over the counter from Intrex (my local vanilla
PC supplier) for the cost of one server-class node of equivalent power
and never experience a moment's difficulty. I can even get custom
configured rackmount systems through them, although I then have to work
a bit harder to ensure a HPWTLC fit.
Here are some simple suggestions for those wishing to build beowulfs or
clusters with truly commodity nodes:
a) Prototype a single node before you buy 8,16,32 of them.
This is easy (and cheap) enough. That way you can test the motherboard,
memory configuration, hard disk setup, video, and ethernet controller
before you buy a lot of them and blow your wad. If one component proves
to be troublesome, either trade it in (with the sale of 31 more systems
hanging in the balance, most vendors become remarkably cooperative about
swapping things around and working with you to find something --
ANYTHING -- that they sell that you'll be happy with;-) or just throw it
away and try something else in its place -- you're saving enough to
throw a couple of SYSTEMS away at the end and come out WAY ahead.
b) Run configurations by the list before buying even a prototype.
This happens all the time, and is a very reasonable thing to do. You
won't ALWAYS find out that your hardware combo isn't right that way (you
should still prototype) but you'll likely get some useful advice or
reassurance. At least the motherboard, memory, and NIC and switch are
excellent things to query if you are in doubt.
c) Use quality components.
This is really a mix of caveat emptor and common sense. Find a vendor
you can work with and trust who will make things right if they sell you
substandard components, and who is unlikely to sell you substandard
components in the first place. Commodity NIC prices range from $10 to
$50, and (as one might expect) there is a bit of you get what you pay
for in that range. There are (or have been in the past) decent NICs
even at the middle of that range, but you've DEFINITELY got to work to
find a good cheap one, and the more expensive ones (eepro100, 3c905) are
more expensive in part because they have the best performance and
stability and features. "Generic" memory is often fine, but sometimes
is a source of endless trouble, so make sure your vendor is willing to
get quality memory (e.g. Kingston) if you encounter trouble with their
OTC brand.
d) It might well be the hardware.
How many times have I experienced inexplicable problems getting the
network to work? Getting an attached camera to work? Getting a system
to boot? Getting a system to work for more than thirty minutes without
crashing? -- and be tearing my hair out and cursing linux and device
drivers and all the ancestors of the creators of same only to find that
my network cable with broken -- loose wire, worked if you wiggled it
just right and then broke when it felt like it. The card wasn't seated
in the PCI bus properly, once it was the bttv driver autoloaded
charmlike and it worked perfectly just like the hardware lists said it
would. The floppy cable was in upside down, or the power connector
wasn't fully seated on the motherboard. A memory stick was in a dusty
slot and not making a good electrical connection.
Just shipping a system can cause cables to bounce loose. If a box is
DOA, ALWAYS open it up and reseat the cables and connectors before
cursing the vendor and sending it back.
It might be that something really is broken, not the configuration or
the software or you at all. It might even be trivial to fix, once you
look for it.
d) Be patient. Work it out.
One thing to remember is that problems with hardware can be like
lightning or shark attacks -- rare and very local. The PARTICULAR
combination of motherboard, memory, device, case may not work for you,
while each one of them works fine for other people combined with other
hardware. Five of the motherboards in an order of twenty five may have
a different flash of the BIOS (one that doesn't work). The power
supplies may be marginal, and work fine for systems with only three
components or while idle but cause instability when run under load.
Like it or not, this sort of thing happens, and getting server-class
packages doesn't necessarily ameliorate the problem (depending on the
vendor and whether or not you're getting them turnkey preconfigured).
Sure, you can ALWAYS pay somebody to do the work for you, but it is
ALWAYS cheaper to do it yourself and, if you go about it sensibly, can
be a fun and rewarding experience.
Just >>expect<< to have to learn some things, to have to solve some
problems, to get better over time. Just because you weren't born
knowing all about TCP/IP, account management, software installation and
operation, programming technique, and all the other things that are at
least useful if not essential to cluster operation doesn't make you an
idiot, it makes you a student. The beowulf list is filled with teachers
(literally -- myself, Walt and Rob Ross, and many, many more) and
students on their way to being teachers. As always, try it yourself,
then look for help. The longer you've been doing it, the easier you
will find it to solve the problems you encounter.
If it makes you feel any better, I've been doing Unix [systems
administration and engineering and cluster computing and etc.] for about
15 years, and have been doing computers in general from punched paper
tape on, and I still put connectors in backwards, fail to seat memory or
a PCI card properly, install something that overwrites a key
configuration file and have to do it all over again, and could make you
cringe with stories of the REALLY dumb things I've done in the past
(tried to copy files from a backup of /etc on another filesystem into
the /etc it was running on at the time, for example -- had to reinstall
the system from tape after that one as I rendered it totally
unbootable).
Live and learn. Experiment and play. Have fun. You'll get better, and
one day you too will be an "expert", even if you only do it a little at
a time.
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From josip at icase.edu Thu Jan 17 11:41:52 2002
From: josip at icase.edu (Josip Loncaric)
Date: Thu, 17 Jan 2002 11:41:52 -0500
Subject: which kernel on ServerWorksLE/which nfs?
References:
Message-ID: <3C46FED0.6336CBF1@icase.edu>
rsweet at atos-group.nl wrote:
>
> I'm wondering if the list could indulge my curiosity by sharing which
> kernel versions they are running sucessfully (for me meaning it stays up
> for at least several weeks...) on ServerWorks LE chipsets (in my case
> Asus CUR_DLS and SuperMicro P3TDE/370)?
>
> In particular is anyone running a 2.4 kernel with any reliability?
We've had good luck with stock Red Hat 7.x distributions and ServerWorks
LE chipsets (SuperMicro 370DLE with dual PIII/800 CPUs). We just
upgraded to Linux kernel 2.4.9-13 (an update to the kernel Red Hat 7.2)
which also works OK. However, last fall some of the Giganet cLAN
drivers had trouble finding the 64-bit PCI bus since the 2.2->2.4
upgrade. I do not know yet if 2.4.9-13 fixed that problem, or if the
problem is due to Giganet drivers.
> If so, what NICS are you using?
> Are you using nfsv3 (knfs) or nfsv2/userland nfs?
We use "tulip" style cards (Kingston KNE100TX (21143 chip) on our
ServerWorks LE systems). The on-board Intel-based ethernet also works
OK, but we currently do not use them. NFS is started in its default
configuration, which I believe is nfsv3 (running as knfs, not as a
separate process).
Our ServerWorks LE systems stay up for months, except for one which
stays up only for weeks. These machines get heavy use...
Sincerely,
Josip
--
Dr. Josip Loncaric, Research Fellow mailto:josip at icase.edu
ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From akostocker at hotmail.com Thu Jan 17 12:32:23 2002
From: akostocker at hotmail.com (Tony Stocker)
Date: Thu, 17 Jan 2002 17:32:23 +0000
Subject: File ownership & permission on compute nodes (Scyld)
Message-ID:
All,
Here's the background:
I'm using the 'freebie' version of Scyld [27bz-8]. We are running several
tests on our cluster comparing performance of our algorithms versus some
heavy metal platforms. When running very I/O intensive algorithms over NFS
we see major performance hits, obviously. So we decided to create
directories on the compute nodes to copy over the input files and run the
algorithms outputting to compute node's local disk as well. We then plan to
copy off the output, our assumption is that this should be the performance
we see over NFS.
Here's the problem:
I can create the directories (as root) just fine. But for some reason I can
not get the file/directory ownership to read the passwd file. It appears to
read the group file fine. What I did was copy over /etc/passwd and
/etc/group to the compute nodes' /etc directory. While this works for group
ownership, it does not for user. I've tried copying over /etc/shadow, and
rebooting the compute nodes but that doesn't seem to address the problem.
It's as if it's not bothering to look at the passwd file at all.
Here's the plea for help:
Help! :-)
Seriously, if someone could tell me what I might need to copy over to make
the passwd file read and applied so that file ownerships and permissions can
be maintained - that would be great!
Thanks!
-Tony
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From akostocker at hotmail.com Thu Jan 17 12:45:51 2002
From: akostocker at hotmail.com (Tony Stocker)
Date: Thu, 17 Jan 2002 17:45:51 +0000
Subject: beomap help (Scyld)
Message-ID:
All,
Can someone give me some idea of what switches are available for beomap?
Like how do I get it to return compute nodes but not the host node? Or is
there a way to find multiple nodes, ranked perhaps in some fashion?
Info:
Using Scyld 'freebie' version [27bz-8]
Purpose:
Use beomap in shell script to find least busy nodes so that files can be
copied to local hard drives of these nodes and then algorithms run on them.
Thanks!
-Tony
_________________________________________________________________
Join the world?s largest e-mail service with MSN Hotmail.
http://www.hotmail.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From lindahl at conservativecomputer.com Thu Jan 17 12:58:53 2002
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Thu, 17 Jan 2002 12:58:53 -0500
Subject: cluster frustrations (Suggestions for same)
In-Reply-To: ; from rgb@phy.duke.edu on Thu, Jan 17, 2002 at 11:29:05AM -0500
References:
Message-ID: <20020117125852.A20801@wumpus.foo>
On Thu, Jan 17, 2002 at 11:29:05AM -0500, Robert G. Brown wrote:
> I totally agree with Mark here -- I can buy somewhere between 2 and 4
> ECC-memory-equipped nodes over the counter from Intrex (my local vanilla
> PC supplier) for the cost of one server-class node of equivalent power
One of the biggest sources of needless arguments on this list is
different definitions. Since when is the node you described first not
a "server-class node"?
If you define "server-class" as "stuff from vendors with huge markups"
then sure, white boxes usable as servers are going to be cheaper. But
if you define "server class" as "boxes with certain features, such as
ECC memory", it's a whole different discussion.
We get similar discussions over "commodity", "beowulf", "cluster", etc
etc.
-- greg
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From becker at scyld.com Thu Jan 17 14:21:29 2002
From: becker at scyld.com (Donald Becker)
Date: Thu, 17 Jan 2002 14:21:29 -0500 (EST)
Subject: File ownership & permission on compute nodes (Scyld)
In-Reply-To:
Message-ID:
On Thu, 17 Jan 2002, Tony Stocker wrote:
> I'm using the 'freebie' version of Scyld [27bz-8]. We are running several
> tests on our cluster comparing performance of our algorithms versus some
> heavy metal platforms. When running very I/O intensive algorithms over NFS
> we see major performance hits, obviously.
...
> I can create the directories (as root) just fine. But for some reason I can
> not get the file/directory ownership to read the passwd file. It appears to
> read the group file fine. What I did was copy over /etc/passwd and
> /etc/group to the compute nodes' /etc directory.
Verify that your /etc/nsswitch.conf configuration file has both "bproc"
and "files" entries. This file is created by /etc/beowulf/node_up
(really /usr/lib/beoboot/bin/node_up) each time a slave node is started.
> It's as if it's not bothering to look at the passwd file at all.
Correct. For most cluster configurations there is no /etc/passwd file.
User name (really password entry -- 'pwent') information is provided by
the BeoNSS name service. When user starts a process on a cluster slave node,
the only valid getpwent() entry for that process is that single user.
If you do add "files" to the name service switch creation in the node_up
script you might also want to 'scp -p /etc/passwd $NODE:/etc' in the
script.
Donald Becker becker at scyld.com
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210 Second Generation Beowulf Clusters
Annapolis MD 21403 410-990-9993
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rgb at phy.duke.edu Thu Jan 17 17:31:34 2002
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 17 Jan 2002 17:31:34 -0500 (EST)
Subject: cluster frustrations (Suggestions for same)
In-Reply-To: <20020117125852.A20801@wumpus.foo>
Message-ID:
On Thu, 17 Jan 2002, Greg Lindahl wrote:
> On Thu, Jan 17, 2002 at 11:29:05AM -0500, Robert G. Brown wrote:
>
> > I totally agree with Mark here -- I can buy somewhere between 2 and 4
> > ECC-memory-equipped nodes over the counter from Intrex (my local vanilla
> > PC supplier) for the cost of one server-class node of equivalent power
>
> One of the biggest sources of needless arguments on this list is
> different definitions. Since when is the node you described first not
> a "server-class node"?
>
> If you define "server-class" as "stuff from vendors with huge markups"
> then sure, white boxes usable as servers are going to be cheaper. But
> if you define "server class" as "boxes with certain features, such as
> ECC memory", it's a whole different discussion.
>
> We get similar discussions over "commodity", "beowulf", "cluster", etc
> etc.
All good points and true. I certainly didn't want to start a semantic
argument over what is a server class system since you'll get different
answers from nearly anybody you ask depending on what they want to
serve, their tolerance for problems, the features they require (e.g.
hot swap disks or power supplies) and the depth of their pocketbook.
I was actually referencing the ~$2.5K/ea "server class" boxes that were
mentioned (IIRC in my normally somewhat confused mental state;-) in this
very thread. However, I also generally build my own "servers" for half
that, give or take, depending as you note on detailed features like
SCSI, ECC, amount of disk, local backup device or no.
With linux especially virtually any OTC system sold down to weenie and
aged Celerons can be a server (and a damn good server at that) to a
small cluster or departmental LAN. Most of them will simply run for
years flawlessly when turned on and correctly configured. A few won't.
A high performance cluster (or server) needs to be engineered a bit more
carefully, but the same really holds true there as well.
The point was more "You don't have to spend a fortune to get reliable,
high quality OTC nodes, but you do have to be a smart shopper" than
"server-class (with reference to nodes or otherwise) means anything at
all" to anybody but marketing and management.
Cheerfully corrected,
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From joachim at lfbs.RWTH-Aachen.DE Fri Jan 18 02:49:50 2002
From: joachim at lfbs.RWTH-Aachen.DE (Joachim Worringen)
Date: Fri, 18 Jan 2002 08:49:50 +0100
Subject: which kernel on ServerWorksLE/which nfs?
References:
Message-ID: <3C47D39E.BADD466A@lfbs.rwth-aachen.de>
rsweet at atos-group.nl wrote:
>
> I'm wondering if the list could indulge my curiosity by sharing which
> kernel versions they are running sucessfully (for me meaning it stays up
> for at least several weeks...) on ServerWorks LE chipsets (in my case
> Asus CUR_DLS and SuperMicro P3TDE/370)?
We use SuperMicro 370DLE, pretty much the same than P3T..., but for
coppermine P-III, together with the onboard NIC (Intel). However, we
don't use ethernet for message passing, only for NFS etc.
> In particular is anyone running a 2.4 kernel with any reliability?
We are using 2.4.4 (with some patches for Promise IDE support), the
systems are up for weeks although used heavily for low-level library
development.
> If so, what NICS are you using?
> Are you using nfsv3 (knfs) or nfsv2/userland nfs?
NFSv3. BTW, has anyone a proven concept to improve NFS performance for
Linux clients towards a Solaris server? We experience that write access
of the clients causes the disks of the server being extremely active, as
if no caching would take place. The same server serves NFS smoothly for
any Solaris clients. Increasing the NFS block size did not really help
very much. Th net gives many suggestions, most of them relate to
outdated kernel / nfs client versions.
Joachim
--
| _ RWTH| Joachim Worringen
|_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen
| |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim
|_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rob.myers at gtri.gatech.edu Fri Jan 18 10:14:28 2002
From: rob.myers at gtri.gatech.edu (Rob Myers)
Date: 18 Jan 2002 10:14:28 -0500
Subject: which kernel on ServerWorksLE/which nfs?
In-Reply-To:
References:
Message-ID: <1011366868.6361.34.camel@ransom>
2.4.9 and 2.4.17 have both worked reliably for me on this motherboard.
(2.4.17 did eliminate some ide warnings, however) we have a couple
intel 10/100 nics in all the time. have tested with a netgear ga621 as
well. always running nfsv3 with trond's patches.
it has always been rock solid.
good luck!
rob.
ps- 2.4 is not the kernel of pain!
On Thu, 2002-01-17 at 05:32, rsweet at atos-group.nl wrote:
>
> I'm wondering if the list could indulge my curiosity by sharing which
> kernel versions they are running sucessfully (for me meaning it stays up
> for at least several weeks...) on ServerWorks LE chipsets (in my case
> Asus CUR_DLS and SuperMicro P3TDE/370)?
>
> In particular is anyone running a 2.4 kernel with any reliability?
>
> If so, what NICS are you using?
> Are you using nfsv3 (knfs) or nfsv2/userland nfs?
>
> regards,
> -Ryan Sweet
>
> --
> Ryan Sweet
> Atos Origin Engineering Services
> http://www.aoes.nl
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From akostocker at hotmail.com Fri Jan 18 11:25:16 2002
From: akostocker at hotmail.com (Tony Stocker)
Date: Fri, 18 Jan 2002 16:25:16 +0000
Subject: How to set time on slave nodes (Scyld)
Message-ID:
Hi Again!
Okay, the host node is set to UTC. We keep it synced with local time
servers. However when I run /usr/lib/beoboot/bin/bdate to update the slave
nodes' time it is setting them to Eastern time. What do I have to do to get
the slave nodes to be set to UTC, and to keep their time synced with the
host node?
Thanks,
Tony
_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From becker at scyld.com Thu Jan 17 18:49:33 2002
From: becker at scyld.com (Donald Becker)
Date: Thu, 17 Jan 2002 18:49:33 -0500 (EST)
Subject: Ethernet Flowcontrol...
In-Reply-To: <01ca01c1b802$c6d41e00$0601a8c0@workstation>
Message-ID:
On Sun, 17 Feb 2002, Bill Northrup wrote:
> Hello everyone. I was just about to begin some network tuning and wanted
> to get some input from the list. We have a few gig devices that talk to
> other gig devices as well as the lesser Fast Ethernet nodes. I am well
> aware of the gig frame padding, latency and such with gig e.
Frame padding shouldn't concern most people -- no one is running Gb
Ethernet with repeaters. (IMHO, the MAC changes for Gb half duplex were
a waste of time.)
> However I
> was wondering if anyone is using flow control both rx and tx or asym to
> help with packet flow?
There is little down-side to flow control, and it's usually enabled by
default. Link flow control is transparent, and can minimize overruns
and dropped packets. For the 100Mb->1Gb direction, where presumably
flow control would never be triggered, it has no performance impact.
> For instance I have a gig master that is trying
> to shove everything down a fast e pipe at the switch. Should one just
> enable TX flow control on the gig segment? The other way from fast e to
> gig e wouldn't be an issue, right? Does relying on the network for flow
> control reduce the overhead encountered on any of the machines or
> possibly off load it to the network devices that may do it better? I'll
> report back to the list what I find, but everyone's mileage is
> different.
Donald Becker becker at scyld.com
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210 Second Generation Beowulf Clusters
Annapolis MD 21403 410-990-9993
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From akostocker at hotmail.com Fri Jan 18 12:23:56 2002
From: akostocker at hotmail.com (Tony Stocker)
Date: Fri, 18 Jan 2002 17:23:56 +0000
Subject: bpcp question (Scyld)
Message-ID:
Sick of me yet? :-)
Quick question regarding bpcp, if you want to run bpcp from a slave node and
copy back to the host node, how do you do it? We'd like to run a script on
the slave nodes, and as its final command it would copy an output file back
to the host node. We've tried a simple test but can't get it to work, see
sample command line below. We do *not* want to run the bpcp from the host
node if at all possible.
Command line attempts:
bpsh 0 bpcp /dir/file -1:/dir
bpsh 0 bpcp /dir/file "-1":/dir
bpsh 0 bpcp /dir/file '-1':/dir
bpsh 0 bpcp /dir/file [-1]:/dir
None of these work, most return a response that says that -1 is not a valid
option.
By the way, I'd like to thank everyone for the help I've received so far,
it's been fantastic. Virtually all of the problems/issues I've encountered
so far have been solved - which goes to explain why I have more questions:
I'm progressing. :-)
Thanks again all!
-Tony
_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From walke at usna.edu Fri Jan 18 13:47:46 2002
From: walke at usna.edu (LT V. H. Walke)
Date: 18 Jan 2002 13:47:46 -0500
Subject: bpcp question (Scyld)
In-Reply-To:
References:
Message-ID: <1011379666.14495.24.camel@vhwalke.mathsci.usna.edu>
Try:
bpsh 0 bpcp /dir/file .-1:/dir
Note the addition of the ".".
Good luck,
Vann
On Fri, 2002-01-18 at 12:23, Tony Stocker wrote:
>
> Sick of me yet? :-)
>
> Quick question regarding bpcp, if you want to run bpcp from a slave node and
> copy back to the host node, how do you do it? We'd like to run a script on
> the slave nodes, and as its final command it would copy an output file back
> to the host node. We've tried a simple test but can't get it to work, see
> sample command line below. We do *not* want to run the bpcp from the host
> node if at all possible.
>
> Command line attempts:
>
> bpsh 0 bpcp /dir/file -1:/dir
> bpsh 0 bpcp /dir/file "-1":/dir
> bpsh 0 bpcp /dir/file '-1':/dir
> bpsh 0 bpcp /dir/file [-1]:/dir
>
> None of these work, most return a response that says that -1 is not a valid
> option.
>
>
> By the way, I'd like to thank everyone for the help I've received so far,
> it's been fantastic. Virtually all of the problems/issues I've encountered
> so far have been solved - which goes to explain why I have more questions:
> I'm progressing. :-)
>
> Thanks again all!
>
> -Tony
>
>
> _________________________________________________________________
> Chat with friends online, try MSN Messenger: http://messenger.msn.com
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From kwood at gshiis.com Fri Jan 18 14:08:05 2002
From: kwood at gshiis.com (Kevin Wood)
Date: Fri, 18 Jan 2002 14:08:05 -0500
Subject: Solaris 8 and Mpich
Message-ID:
Hey there all,
Got a question for you. I am trying to compile mpich 1.2.2.3 on a Solaris 8
Sparc machine and I am getting the following errors:
gcc -I../../../../include -DUSE_SOCKLEN_T -DUSE_U_INT_FOR_XDR -DHAVE_LIBSO
CKET=1 -DHAVE_LIBNSL=1 -DHAVE_RPC_RPC_H=1 -DHAVE_NETINET_IN_H=1 -DHAVE_ARPA_
INET_H=1 -DHAVE_STDLIB_H=1 -DHAVE_UNISTD_H=1 -DHAVE_STRING_H=1 -DHAVE_STRING
S_H=1 -DHAVE_TERMIO_H=1 -DHAVE_TERMIOS_H=1 -DBOTH_STRING_INCS=1 -DHAVE_STDAR
G_H=1 -DUSE_STDARG=1 -DUSE_STDARG=1 -DHAVE_SIGPROCMASK=1 -DHAVE_SIGEMPTYSET=
1 -DHAVE_SIGADDSET=1 -DHAVE_SIGHOLD=1 -DHAVE_SIGACTION=1 -DSEMUN_UNDEFINED=1
-DSEMCTL_ARG_UNION=1 -DHAVE_STRERROR=1 -DHAVE_VPRINTF=1 -DHAVE_SYS_UIO_H=1
-DHAVE_WRITEV=1 -DNO_ECHO=1 -DHAS_RSHCOMMAND=1 -DHAVE_XDRMEM_CREATE=1 -DHAS_
XDR=1 -DHAVE_LIBSOCKET=1 -DHAVE_LIBNSL=1 -DHAVE_RPC_RPC_H=1 -DHAVE_NETINE
T_IN_H=1 -DHAVE_ARPA_INET_H=1 -DHAVE_STDLIB_H=1 -DHAVE_UNISTD_H=1 -DHAVE_STR
ING_H=1 -DHAVE_STRINGS_H=1 -DHAVE_TERMIO_H=1 -DHAVE_TERMIOS_H=1 -DBOTH_STRIN
G_INCS=1 -DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DUSE_STDARG=1 -DHAVE_SIGPROCMASK=
1 -DHAVE_SIGEMPTYSET=1 -DHAVE_SIGADDSET=1 -DHAVE_SIGHOLD=1 -DHAVE_SIGACTION=
1 -DSEMUN_UNDEFINED=1 -DSEMCTL_ARG_UNION=1 -DHAVE_STRERROR=1 -DHAVE_VPRINTF=
1 -DHAVE_SYS_UIO_H=1 -DHAVE_WRITEV=1 -DNO_ECHO=1 -DHAS_RSHCOMMAND=1 -DHAVE_X
DRMEM_CREATE=1 -DHAS_XDR=1 -DRSHCOMMAND='"/bin/remsh"' -I/opt/mpich/mpid/ch_
p4/p4/include -I.. -I../include -c p4_debug.c
/usr/ccs/bin/as: "/var/tmp/ccxyWA9i.s", line 576: error: unknown opcode
".subsection"
/usr/ccs/bin/as: "/var/tmp/ccxyWA9i.s", line 576: error: statement syntax
/usr/ccs/bin/as: "/var/tmp/ccxyWA9i.s", line 589: error: unknown opcode
".previous"
/usr/ccs/bin/as: "/var/tmp/ccxyWA9i.s", line 589: error: statement syntax
*** Error code 1
make: Fatal error: Command failed for target `p4_debug.o'
Current working directory /opt/mpich/mpid/ch_p4/p4/lib
*** Error code 1
make: Fatal error: Command failed for target `p4inmpi'
Current working directory /opt/mpich/mpid/ch_p4/p4
*** Error code 1
make: Fatal error: Command failed for target `p4inmpi'
Current working directory /opt/mpich/mpid/ch_p4
*** Error code 1
make: Fatal error: Command failed for target `mpilib'
Current working directory /opt/mpich
*** Error code 1
make: Fatal error: Command failed for target `mpi-modules'
Current working directory /opt/mpich
*** Error code 1
make: Fatal error: Command failed for target `mpi'
Any ideas as to what is going on? I am using gcc and make from
www.sunfreeware.com to compile the code and I am using the assembler from
Solaris to do the assembly work. Any help would be greatly appreciated. If
anyone has built mpi on sun, any other possible information that I might
need would be grealty appreciated.
Thanks
Kevin
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From becker at scyld.com Fri Jan 18 14:28:07 2002
From: becker at scyld.com (Donald Becker)
Date: Fri, 18 Jan 2002 14:28:07 -0500 (EST)
Subject: How to set time on slave nodes (Scyld)
In-Reply-To:
Message-ID:
On Fri, 18 Jan 2002, Tony Stocker wrote:
> Okay, the host node is set to UTC. We keep it synced with local time
> servers. However when I run /usr/lib/beoboot/bin/bdate to update the slave
> nodes' time it is setting them to Eastern time. What do I have to do to get
> the slave nodes to be set to UTC, and to keep their time synced with the
> host node?
I suspect you are misinterpreting some aspect of what is happening.
The timezone setting is stored in /etc/localzone, and is used only by
user level programs. It is independent of the kernel's clock, kept in
UTC/GMT. The "CMOS clock", a hardware device that keeps time when the
machine is powered off, may be set to either localtime or UTC.
'bdate' is a tiny program that copies UTC time from kernel to kernel.
It is run in the node_up script to synchronize the slave clock to the master.
It's a wonderful illustration of how simple cluster tools should be.
main(int argc, char **argv)
{
int node = atoi(argv[1]);
struct timeval master_time;
gettimeofday(&master_time, NULL); /* Get time on master. */
_bproc_move(node, BPROC_DUMP_ALL); /* Move to slave node. */
settimeofday(&master_time,NULL); /* Set time on slave.
return 0;
}
[[ Error checking omitted for clarity. ]]
The apparent timezone is unrelated to the kernel's idea of what the
current UTC time is. Each node has it's own copy of /etc/localtime
copied from the master in the node_up script. If you are having
timezone problems, verify that /etc/localzone is set correctly on the
master.
BTW, I'm personally offended by the need for /etc/localtime. Yes,
actually insulted by its very existence. The /etc/* files have long
been a configuration and maintenance issue for clusters. We have put a
lot of thought into eliminating most of the /etc files. But 'localzone'
is stubbornly required by the C library and it just _has_ to be there.
Scyld Beowulf only currently requires only three /etc files on the
compute nodes
/etc/mtab
Created by 'mount', this records the mounted file systems for 'df'.
Someday it will be replaced by /proc/mtab, and thus will never be
out of sync with the kernel.
/etc/nsswitch.conf
The Name Server Switch Configuration file. This configures how
name lookups ("Directory Services") are done. The usual settings
are "files", "NIS" or "DNS" (for hostnames). We use this to
specify Beowulf specific methods for user info, host name,
netgroup, and "ethers" information.
Someday this could be optionally replaced by an environment variable.
/etc/localzone
The static configuration for the local timezone e.g. Eastern
Standard Time. This really should be part of the NSSwitch system.
That way Beowulf users could eliminate the final configuration file.
Doing this would be useful for other users as well. A new
mechanism could be developed to allow implementing per-user,
per-group or per-tty timezone settings. After all, if you are
grid user near Chicago using machine in both NM and Livermore it
would nice to have consistent time-stamps on the results.
Donald Becker becker at scyld.com
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210 Second Generation Beowulf Clusters
Annapolis MD 21403 410-990-9993
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From becker at scyld.com Fri Jan 18 14:34:07 2002
From: becker at scyld.com (Donald Becker)
Date: Fri, 18 Jan 2002 14:34:07 -0500 (EST)
Subject: bpcp question (Scyld)
In-Reply-To:
Message-ID:
On Fri, 18 Jan 2002, Tony Stocker wrote:
> Quick question regarding bpcp, if you want to run bpcp from a slave node and
> copy back to the host node, how do you do it? We'd like to run a script on
...
> bpsh 0 bpcp /dir/file -1:/dir
> bpsh 0 bpcp /dir/file "-1":/dir
> bpsh 0 bpcp /dir/file '-1':/dir
> bpsh 0 bpcp /dir/file [-1]:/dir
The proper cluster hostname syntax is ".-1", ".0", ".1".
A few tools optionally accept just the number, omitting the leading ".".
But as you found out this conflicts with the syntax for options and IP
addresses.
In your case using the alias "master" would make the operation clearer.
bpsh 0 bpcp /dir/file .-1:/dir
bpsh 0 bpcp /dir/file master:/dir
The alias "self" is similarly preferred to ".-2" for clarity.
Future releases will be using the alias "master0" (master), so keep
this in mind if you are writing scripts.
Donald Becker becker at scyld.com
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210 Second Generation Beowulf Clusters
Annapolis MD 21403 410-990-9993
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From agrajag at scyld.com Fri Jan 18 14:59:40 2002
From: agrajag at scyld.com (Sean Dilda)
Date: Fri, 18 Jan 2002 14:59:40 -0500
Subject: bpcp question (Scyld)
In-Reply-To: ; from akostocker@hotmail.com on Fri, Jan 18, 2002 at 05:23:56PM +0000
References:
Message-ID: <20020118145940.A7832@blueraja.scyld.com>
On Fri, 18 Jan 2002, Tony Stocker wrote:
> Command line attempts:
>
> bpsh 0 bpcp /dir/file -1:/dir
> bpsh 0 bpcp /dir/file "-1":/dir
> bpsh 0 bpcp /dir/file '-1':/dir
> bpsh 0 bpcp /dir/file [-1]:/dir
I'd suggest:
bpcp 0:/dir/file /dir
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL:
From maurice at harddata.com Sat Jan 19 00:40:37 2002
From: maurice at harddata.com (Maurice Hilarius)
Date: Fri, 18 Jan 2002 22:40:37 -0700
Subject: Intel 860 PCI bandwidth problem
Message-ID: <5.1.0.14.2.20020118223108.067a4330@mail.harddata.com>
In recent test on motherboards with Intel 860 chipsets we were seeing less
than wonderful transfer rates using Wulfkit and Myrinet cards.
After some explorations on kernel issues, and other hardware forums we were
still not seeing any reason why this was happening.
Recently Intel published updated chipset errata lists, and I scanned over them.
One issue quickly popped out at me, and I now know what the problem seems
to be:
In the file found at:
ftp://download.intel.com/design/chipsets/specupdt/29071501.pdf
Intel lists errata for the 860 chipset.
One of these states:
"5. Sustained PCI Bandwidth Problem:
During a memory read multiple operation, a PCI master will read more than
one complete cache line from memory. In this situation, the MCH pre-fetches
information from memory to provide optimal performance. However, the MCH
cannot provide information to the PCI master fast enough. Therefore, the
ICH2 terminates the read cycle early to free up the PCI bus for other PCI
masters to claim.
Implication: The early termination limits the maximum bandwidth to ~90 MB/s.
Workaround: None
Status: Intel has no fix planned for this erratum."
This effectively eliminates the 860 chipset motherboards from contention
for HPTC clustering use, IMHO.
Any thoughts from anyone on this?
With our best regards,
Maurice W. Hilarius Telephone: 01-780-456-9771
Hard Data Ltd. FAX: 01-780-456-9772
11060 - 166 Avenue mailto:maurice at harddata.com
Edmonton, AB, Canada http://www.harddata.com/
T5X 1Y3
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From lindahl at conservativecomputer.com Sat Jan 19 00:58:46 2002
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Sat, 19 Jan 2002 00:58:46 -0500
Subject: Intel 860 PCI bandwidth problem
In-Reply-To: <5.1.0.14.2.20020118223108.067a4330@mail.harddata.com>; from maurice@harddata.com on Fri, Jan 18, 2002 at 10:40:37PM -0700
References: <5.1.0.14.2.20020118223108.067a4330@mail.harddata.com>
Message-ID: <20020119005846.A3462@wumpus.foo>
On Fri, Jan 18, 2002 at 10:40:37PM -0700, Maurice Hilarius wrote:
> Implication: The early termination limits the maximum bandwidth to ~90 MB/s.
That note sounds like it's not talking about DMA operation. You did
look at the Myrinet Experiences website
http://www.conservativecomputer.com/myrinet/perf.html
and saw that measured PCI DMA performance is in the 200-300 MB/s
range, but it depends on some BIOS and other details, yes? Most non-GM
Myrinet drivers don't use DMA in one or both directions, and so this
problem would look terrible with them, but not GM. I don't know about
SCI.
-- greg
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From sp at scali.com Sat Jan 19 08:47:36 2002
From: sp at scali.com (Steffen Persvold)
Date: Sat, 19 Jan 2002 14:47:36 +0100
Subject: Intel 860 PCI bandwidth problem
References: <5.1.0.14.2.20020118223108.067a4330@mail.harddata.com>
Message-ID: <3C4978F8.D48A66A6@scali.com>
Maurice Hilarius wrote:
>
> In recent test on motherboards with Intel 860 chipsets we were seeing less
> than wonderful transfer rates using Wulfkit and Myrinet cards.
>
> After some explorations on kernel issues, and other hardware forums we were
> still not seeing any reason why this was happening.
>
> Recently Intel published updated chipset errata lists, and I scanned over them.
>
> One issue quickly popped out at me, and I now know what the problem seems
> to be:
> In the file found at:
> ftp://download.intel.com/design/chipsets/specupdt/29071501.pdf
>
> Intel lists errata for the 860 chipset.
> One of these states:
> "5. Sustained PCI Bandwidth Problem:
> During a memory read multiple operation, a PCI master will read more than
> one complete cache line from memory. In this situation, the MCH pre-fetches
> information from memory to provide optimal performance. However, the MCH
> cannot provide information to the PCI master fast enough. Therefore, the
> ICH2 terminates the read cycle early to free up the PCI bus for other PCI
> masters to claim.
>
> Implication: The early termination limits the maximum bandwidth to ~90 MB/s.
>
> Workaround: None
>
> Status: Intel has no fix planned for this erratum."
>
> This effectively eliminates the 860 chipset motherboards from contention
> for HPTC clustering use, IMHO.
>
> Any thoughts from anyone on this?
>
This only affects DMA operations which uses the PCI instruction "Read multiple". Normal Wulfkit
usage (ScaMPI) is with PIO and is therefore not affected by this issue. Instead, PIO performance on
these chipsets is limited by the fact that we cannot get more than 32byte burst (yet), giving you
approx 170MByte/sec.
Regards,
--
Steffen Persvold | Scalable Linux Systems | Try out the world's best
mailto:sp at scali.no | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.12.2 -
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >300MBytes/s and <4uS latencyy
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From sp at scali.com Sat Jan 19 08:47:47 2002
From: sp at scali.com (Steffen Persvold)
Date: Sat, 19 Jan 2002 14:47:47 +0100
Subject: Intel 860 PCI bandwidth problem
References: <5.1.0.14.2.20020118223108.067a4330@mail.harddata.com> <20020119005846.A3462@wumpus.foo>
Message-ID: <3C497903.BF838BA4@scali.com>
Greg Lindahl wrote:
>
> On Fri, Jan 18, 2002 at 10:40:37PM -0700, Maurice Hilarius wrote:
>
> > Implication: The early termination limits the maximum bandwidth to ~90 MB/s.
>
> That note sounds like it's not talking about DMA operation. You did
> look at the Myrinet Experiences website
>
I think you misunderstood, "memory read multiple" is the PCI instruction used by most DMA engines
(SCSI, ethernet, SCI and I would guess Myrinet) when they read from RAM (source machine). On the
other side (destination), "memory write invalidate" is normally used.
However, this errateadoesn't limit the SCI DMA bandwith to ~90MB/s either, 210MB/s is the most I've
seen so far.
Regards,
--
Steffen Persvold | Scalable Linux Systems | Try out the world's best
mailto:sp at scali.no | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.12.2 -
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >300MBytes/s and <4uS latencyy
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From edwards at icantbelieveimdoingthis.com Sat Jan 19 11:24:04 2002
From: edwards at icantbelieveimdoingthis.com (Art Edwards)
Date: Sat, 19 Jan 2002 09:24:04 -0700
Subject: A couple of questions
Message-ID: <20020119092404.A13397@icantbelieveimdoingthis.com>
I'm using SCYLD beowulf (27Bz-7) with fair success. I have a couple of
unrelated questions:
I should point out that I'm only using 100MB ethernet through a 100MB ethernet
switch. Also, the nodes are AMD athlon 1.4 GHz.
1. Does anyone have experience running GAMESS-US under Scyld? I'm running
using the MPI version and I'm getting seemingly perfect scaling at 1.8x each
time I double the number of processors. Has anyone had better experience?
The author claims we should be getting nearly perfect doubling (using the
sockets version) for a linux-ethernet cluster.
2. In upgrading to 27Bz-8 has anyone had any difficulty? Are there clear
advantages for a network that is running ethernet?
3. Is there a simple bpsh command to remove all files from /tmp on the slave
nodes?
--
Arthur H. Edwards
712 Valencia Dr. NE
Abq. NM 87108
(505) 256-0834
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From siegert at sfu.ca Mon Jan 21 21:13:44 2002
From: siegert at sfu.ca (Martin Siegert)
Date: Mon, 21 Jan 2002 18:13:44 -0800
Subject: kernel oopses
Message-ID: <20020121181344.A9760@stikine.ucs.sfu.ca>
This is somewhat off topic - sorry for that.
I am trying to use the linux driver for the SK9D21 GigE copper card
on a RedHat 7.2 system with a 2.4.13 kernel.
I have two athlon mp boxes with such a card connected with a cross-over
cable. Running netpipe (NPtcp) from one box to the other causes
a kernel oops.
I am quite sure that the oops is due to the driver
http://www.syskonnect.com/syskonnect/support/driver/htm/sk9dlin_2_4_13.htm
(without that driver the kernel runs fine).
The first thing I would like to do is to log the oops message. Right now
it goes to the console only - it does not appear in the log files
although syslog sends everything of severity *.info to /var/log/messages.
/proc/kmsg is empty.
There is a /proc/kcore file, but it is a binary file. I tried to read
it using ksymoops, but it generates a huge amount of garbage on the
screen but not the error message I see on the console.
I have tried the -s option to klogd with no success either.
Is there a way of getting the error message into a file (without typing
it off the console screen)?
Thanks for suggestions.
Cheers,
Martin
========================================================================
Martin Siegert
Academic Computing Services phone: (604) 291-4691
Simon Fraser University fax: (604) 291-4242
Burnaby, British Columbia email: siegert at sfu.ca
Canada V5A 1S6
========================================================================
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From torvir at dpsl.net Tue Jan 22 02:20:20 2002
From: torvir at dpsl.net (Rahul Torvi)
Date: Tue, 22 Jan 2002 12:50:20 +0530
Subject: Basic question
Message-ID:
Hi All,
I m running a search server for our company on one of the server I have.
This server is built using java. Can I set up a Beowulf cluster to run this
search server so that I may get better results. If so what do I need to do.
- Rahul T
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From torvir at dpsl.net Tue Jan 22 02:38:00 2002
From: torvir at dpsl.net (Rahul Torvi)
Date: Tue, 22 Jan 2002 13:08:00 +0530
Subject: Basic question
Message-ID:
Hi All,
I m running a search server for our company on one of the server I have.
This server is built using java. Can I set up a Beowulf cluster to run this
search server so that I may get better results. If so what do I need to do.
- Rahul T
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From wyy at admu.edu.ph Tue Jan 22 04:42:23 2002
From: wyy at admu.edu.ph (Horatio B. Bogbindero)
Date: Tue, 22 Jan 2002 17:42:23 +0800
Subject: Basic question
In-Reply-To: ; from torvir@dpsl.net on Tue, Jan 22, 2002 at 01:08:00PM +0530
References:
Message-ID: <20020122174223.A25040@admu.edu.ph>
On Tue, Jan 22, 2002 at 01:08:00PM +0530, Rahul Torvi wrote (wyy sez):
>
> I m running a search server for our company on one of the server I have.
> This server is built using java. Can I set up a Beowulf cluster to run this
> search server so that I may get better results. If so what do I need to do.
>
of course, you can run you program in a beowulf cluster. but....
you have to recode it of course.
some options are using the Java bindings for LAM-MPI which will
make your coding simpler. or you can of course, user native
java RMI but it will be a wee bit harder.
--
--------------------------------------
William Emmanuel S. Yu
Ateneo Cervini-Eliazo Networks (ACENT)
email : wyy at admu dot edu dot ph
web : http://CNG.ateneo.net/wyu/
phone : 63(2)4266001-4186
GPG : http://CNG.ateneo.net/wyu/wyy.pgp
War spares not the brave, but the cowardly.
-- Anacreon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL:
From joachim at lfbs.RWTH-Aachen.DE Tue Jan 22 06:45:29 2002
From: joachim at lfbs.RWTH-Aachen.DE (Joachim Worringen)
Date: Tue, 22 Jan 2002 12:45:29 +0100
Subject: Basic question
References: <20020122174223.A25040@admu.edu.ph>
Message-ID: <3C4D50D9.4973EF3A@lfbs.rwth-aachen.de>
Horatio B. Bogbindero wrote:
>
> On Tue, Jan 22, 2002 at 01:08:00PM +0530, Rahul Torvi wrote (wyy sez):
> >
> > I m running a search server for our company on one of the server I have.
> > This server is built using java. Can I set up a Beowulf cluster to run this
> > search server so that I may get better results. If so what do I need to do.
> >
> of course, you can run you program in a beowulf cluster. but....
> you have to recode it of course.
Ask the guys at Google how they did it...
But generally, I think it must be a *very* big company with heavy search
traffic if a single server can not perform this task. I really doubt
that this is the case here. Therefore, it might be easier to buy a
bigger machine than to go parallel for this. Did you analyze where the
bottleneck is?
Joachim
--
| _ RWTH| Joachim Worringen
|_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen
| |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim
|_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From torvir at dpsl.net Tue Jan 22 07:13:50 2002
From: torvir at dpsl.net (Rahul Torvi)
Date: Tue, 22 Jan 2002 17:43:50 +0530
Subject: Basic question
In-Reply-To: <3C4D50D9.4973EF3A@lfbs.rwth-aachen.de>
Message-ID:
Hi..
how to incorporate RMI in Beowulf cluster. Since i use RMI server for my
search. Is it possible in LAM-MPI without changing my current code. What are
the possible ways to use the same code without recoding???
Regards,
Rahult
-----Original Message-----
From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
Behalf Of Joachim Worringen
Sent: Tuesday, January 22, 2002 5:15 PM
To: Beowulf mailinglist
Subject: Re: Basic question
Horatio B. Bogbindero wrote:
>
> On Tue, Jan 22, 2002 at 01:08:00PM +0530, Rahul Torvi wrote (wyy sez):
> >
> > I m running a search server for our company on one of the server I
have.
> > This server is built using java. Can I set up a Beowulf cluster to run
this
> > search server so that I may get better results. If so what do I need to
do.
> >
> of course, you can run you program in a beowulf cluster. but....
> you have to recode it of course.
Ask the guys at Google how they did it...
But generally, I think it must be a *very* big company with heavy search
traffic if a single server can not perform this task. I really doubt
that this is the case here. Therefore, it might be easier to buy a
bigger machine than to go parallel for this. Did you analyze where the
bottleneck is?
Joachim
--
| _ RWTH| Joachim Worringen
|_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen
| |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim
|_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From scheinin at crs4.it Tue Jan 22 08:22:28 2002
From: scheinin at crs4.it (Alan Scheinine)
Date: Tue, 22 Jan 2002 14:22:28 +0100 (MET)
Subject: Basic question
Message-ID: <200201221322.OAA20564@dylandog.crs4.it>
With MPI you cannot pass classes between processes. As far as
the effort of writing code. MPI is simple when the information
to be passed consists of numbers or a simple array. I did not
save the original message but if I remember correctly, the
program to be ported is written in Java. In such as case,
different processes probably share information that is not easy
to duplicate with MPI.
> how to incorporate RMI in Beowulf cluster. Since i use RMI server for my
> search. Is it possible in LAM-MPI without changing my current code.
> What are
> the possible ways to use the same code without recoding???
> Regards,
> Rahult
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From Eugene.Leitl at lrz.uni-muenchen.de Tue Jan 22 09:53:20 2002
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl)
Date: Tue, 22 Jan 2002 15:53:20 +0100 (MET)
Subject: CCL:the question about building parallel computing without disk(fwd)
Message-ID:
---------- Forwarded message ----------
Date: Tue, 22 Jan 2002 16:14:53 +0900
From: mystwind at magicn.com
To: chemistry at ccl.net
Subject: CCL:the question about building parallel computing without disk
Dear CCLers
Sorry for distubing your work.
At the momenant I am tring to build parallel computing without disk on
Red Hat Linux 7.2
------------------------------------
Pentium 4 1.7GHz * 4
Red Hat Linux 7.2
Kernel 2.4.7-10
------------------------------------
I've compiled following basic things in kernel of master node.
------------------------------------------------------------
prompt for development and/or incomplete code/drivers = y
kernel automounter support = y
kernel automounter virsion 4 support(also support v3) = y
NFS file system support = y
Provide NFSv3 client support = y
NFS server support = y
Provide NFSv3 server support = y
------------------------------------------------------------
but booting was failed.
I couldn't find any clue what it happened.
Is there anyone to help me?
Anyway thank you for reading it ..
Jino Kim, mystwind at magicn.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From john at computation.com Tue Jan 22 10:25:54 2002
From: john at computation.com (John Nelson)
Date: Tue, 22 Jan 2002 10:25:54 -0500 (EST)
Subject: Basic question
In-Reply-To: <20020122174223.A25040@admu.edu.ph>
Message-ID:
This begs an interesting question (well an interesting question for me)
and that is "how does one use the Java bindings?" C++ bindings are
well-documented. How does one go about building a cluster-aware app with
Java bindings though (and do I need additional libraries?).
-- John
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From Eugene.Leitl at lrz.uni-muenchen.de Wed Jan 23 08:14:07 2002
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl)
Date: Wed, 23 Jan 2002 14:14:07 +0100 (MET)
Subject: /. linux desktop clustering
Message-ID:
http://slashdot.org/articles/02/01/22/1854218.shtml
crashlight writes: "A Linux cluster on the desktop--Rocket Calc just
announced their 8- processor "personal" cluster in a mid-tower-sized box.
Starting at $4500, you get 8 Celeron 800MHz processors, each with 256MB
RAM and a 100Mbps ethernet connection. The box also has an integrated
100Mbps switch. Plus it's sexy." Perhaps less sexy, but for a lot less
money, you can also run a cluster of Linux (virtual) machines on your
desktop on middle-of-the-road hardware. See this followup on Grant Gross's
recent piece on Virtual Machines over at Newsforge.
-- Eugen* Leitl leitl
______________________________________________________________
ICBMTO: N48 04'14.8'' E11 36'41.2'' http://www.leitl.org
57F9CFD3: ED90 0433 EB74 E4A9 537F CFF5 86E7 629B 57F9 CFD3
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From troy at osc.edu Wed Jan 23 17:25:09 2002
From: troy at osc.edu (Troy Baer)
Date: Wed, 23 Jan 2002 17:25:09 -0500
Subject: lperfex version 2.0 released
Message-ID:
Hi all,
I'm happy to announce that lperfex 2.0 is now available. lperfex is a
hardware performance monitoring tool for Linux/IA32 systems, using the
interface provided by Mikael Pettersson's perfctr library version 2.0.
(Work on updating it to use the more recent perfctr 2.3.x is ongoing.)
If you've used Cray's hpm or SGI's perfex, then lperfex should seem
fairly familiar. If not, think of lperfex as a variation on the time
command which can also track low-level hardware events like floating
point operations, cache misses, and so on. It is not intrusive into
the code whose performance it measures and does not require special
compilation or code instrumentation.
lperfex version 2.0 is a rather major change from the previous versions.
First, it uses perfctr rather than libperf for its low-level interface.
This also means that it can count events on non-P6 x86 processors such as
original Pentiums and Athlons. Second, the command line argument handling
has been improved so that symbolic event names (eg. P6_FLOPS) can be
used instead of the older versions' event numbers. Finally, we have added
anonymous CVS access and a mailing list to support lperfex.
See http://www.osc.edu/~troy/lperfex/ for code and more details.
--Troy
--
Troy Baer email: troy at osc.edu
Science & Technology Support phone: 614-292-9701
Ohio Supercomputer Center web: http://oscinfo.osc.edu
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From henken at seas.upenn.edu Wed Jan 23 18:11:15 2002
From: henken at seas.upenn.edu (henken)
Date: Wed, 23 Jan 2002 18:11:15 -0500 (EST)
Subject: bproc and ssh
Message-ID:
Hello --
We are using bproc as the basis for a new cluster suite (install,
scheduling, managment) and are interested in limiting access for ssh'ing
into the nodes. We would like the ssh permissions to be the same as the
bproc permissions -- this would allow us to use the bproc interface as the
main interface for controlling access to the nodes. Has anyone done this?
How?
Thanks!!
Nic
--
Nicholas Henke
Undergraduate - Engineerring 2002
--
Senior Architect and Developer
Liniac Project - University of Pennsylvania
http://clubmask.sourceforge.net
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There's nothing like good food, good beer, and a bad girl.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From ctierney at hpti.com Thu Jan 24 17:54:57 2002
From: ctierney at hpti.com (Craig Tierney)
Date: Thu, 24 Jan 2002 15:54:57 -0700
Subject: CCL:Largest Linux Cluster? (fwd)
In-Reply-To: <20020124144459.A1561@icantbelieveimdoingthis.com>; from edwards@icantbelieveimdoingthis.com on Thu, Jan 24, 2002 at 02:44:59PM -0700
References: <20020124110451.A23716@icantbelieveimdoingthis.com> <20020124144459.A1561@icantbelieveimdoingthis.com>
Message-ID: <20020124155457.A9640@hpti.com>
On Thu, Jan 24, 2002 at 02:44:59PM -0700, Art Edwards wrote:
> On Thu, Jan 24, 2002 at 10:17:28AM -0800, alvin at Maggie.Linux-Consulting.com wrote:
> >
> > hi art
> >
> > On Thu, 24 Jan 2002, Art Edwards wrote:
> >
> > > On Thu, Jan 24, 2002 at 05:55:24PM +0100, Eugene Leitl wrote:
> >
> > ...
> >
> > > > Can anyone tell me what is currently the largest linux-based workstation
> > > > cluster that has been successfully deployed and is being used for
> > > > computational chemistry studies? (largest = number of nodes regardless of
> > > > the speed of each node).
> > > >
> > > Sandia National Laboratories has C-Plant that runs Linux in addition to several
> > > layers of home-grown OS on several thousand nodes. The basic node is a DEC
> > > ev6 with myranet (sp). They use no local disk, opting for a huge disk farm.
> >
> > do you happen to know how they manage the huge disk farm???
> > - resumably raid5 systems...
> > - are each raid5 sub-system dual-hosted so that the other cpu
> > can getto the data if one of the cpu cant get to it
> > - does all nodes access the "disk farm" thru the gigabit ethernet
> > or dual-hosted scsi cables ??
> > - how does one optimize a disk farm ?? (hdparm seems too clumbsy)
> >
> > -- in the old days.... 1980's ... there used to be dual-hosted
> > disk controllers where PC-HOST#1 and PC-HOST#2 can both access the same
> > physical CDC/DEC/Fujitsu drives
> > - wish i could find these dual host scsi controllers for todaysPCs
> That is part of the home-grown software. There are parallel IO ports that require
> special calls. I'm a user, not a developer so that is the extent of my expertise.
Sandia's IO system does not fall into 'todaysPC' category.
You don't need a dual ported scsi controller if you have a really
big system. Why not just install 8 Fibre cannel cards in one machine and stripe
across them? Then install 8-16 (or how many you want) gigE cards to provide the
bandwidth to the ENFS servers that provide the IO to the nodes.
Craig
>
> Art Edwards
> >
> > have fun linuxing
> > alvin
> > http://www.Linux-1U.net .. 8x 200GB IDE disks -->> 1.6TeraByte 1U Raid5 ..
> >
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
--
Craig Tierney (ctierney at hpti.com)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From edwards at icantbelieveimdoingthis.com Thu Jan 24 16:44:59 2002
From: edwards at icantbelieveimdoingthis.com (Art Edwards)
Date: Thu, 24 Jan 2002 14:44:59 -0700
Subject: CCL:Largest Linux Cluster? (fwd)
In-Reply-To: ; from alvin@Maggie.Linux-Consulting.com on Thu, Jan 24, 2002 at 10:17:28AM -0800
References: <20020124110451.A23716@icantbelieveimdoingthis.com>
Message-ID: <20020124144459.A1561@icantbelieveimdoingthis.com>
On Thu, Jan 24, 2002 at 10:17:28AM -0800, alvin at Maggie.Linux-Consulting.com wrote:
>
> hi art
>
> On Thu, 24 Jan 2002, Art Edwards wrote:
>
> > On Thu, Jan 24, 2002 at 05:55:24PM +0100, Eugene Leitl wrote:
>
> ...
>
> > > Can anyone tell me what is currently the largest linux-based workstation
> > > cluster that has been successfully deployed and is being used for
> > > computational chemistry studies? (largest = number of nodes regardless of
> > > the speed of each node).
> > >
> > Sandia National Laboratories has C-Plant that runs Linux in addition to several
> > layers of home-grown OS on several thousand nodes. The basic node is a DEC
> > ev6 with myranet (sp). They use no local disk, opting for a huge disk farm.
>
> do you happen to know how they manage the huge disk farm???
> - resumably raid5 systems...
> - are each raid5 sub-system dual-hosted so that the other cpu
> can getto the data if one of the cpu cant get to it
> - does all nodes access the "disk farm" thru the gigabit ethernet
> or dual-hosted scsi cables ??
> - how does one optimize a disk farm ?? (hdparm seems too clumbsy)
>
> -- in the old days.... 1980's ... there used to be dual-hosted
> disk controllers where PC-HOST#1 and PC-HOST#2 can both access the same
> physical CDC/DEC/Fujitsu drives
> - wish i could find these dual host scsi controllers for todaysPCs
That is part of the home-grown software. There are parallel IO ports that require
special calls. I'm a user, not a developer so that is the extent of my expertise.
Art Edwards
>
> have fun linuxing
> alvin
> http://www.Linux-1U.net .. 8x 200GB IDE disks -->> 1.6TeraByte 1U Raid5 ..
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rbbrigh at valeria.mp.sandia.gov Thu Jan 24 14:07:28 2002
From: rbbrigh at valeria.mp.sandia.gov (Ron Brightwell)
Date: Thu, 24 Jan 2002 12:07:28 -0700 (MST)
Subject: CCL:Largest Linux Cluster?
Message-ID: <200201241907.MAA13802@dogbert.mp.sandia.gov>
>
> > Sandia National Laboratories has C-Plant that runs Linux in addition to several
> > layers of home-grown OS on several thousand nodes. The basic node is a DEC
> > ev6 with myranet (sp). They use no local disk, opting for a huge disk farm.
>
> do you happen to know how they manage the huge disk farm???
> - resumably raid5 systems...
> - are each raid5 sub-system dual-hosted so that the other cpu
> can getto the data if one of the cpu cant get to it
> - does all nodes access the "disk farm" thru the gigabit ethernet
> or dual-hosted scsi cables ??
> - how does one optimize a disk farm ?? (hdparm seems too clumbsy)
>
> -- in the old days.... 1980's ... there used to be dual-hosted
> disk controllers where PC-HOST#1 and PC-HOST#2 can both access the same
> physical CDC/DEC/Fujitsu drives
> - wish i could find these dual host scsi controllers for todaysPCs
>
The Cplant clusters use a parallel I/O system called ENFS, which is a modified
version of the NFS protocol that avoids locking and doesn't provide full
UNIX semantics. Compute nodes send parallel I/O requests via ENFS to a set
of proxy I/O nodes that mount a back-end filesystem through gigE uplinks.
In the case of the 1792-node cluster, the back-end is an SGI O2K running XFS
(I think).
-Ron
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rbbrigh at valeria.mp.sandia.gov Thu Jan 24 13:30:04 2002
From: rbbrigh at valeria.mp.sandia.gov (Ron Brightwell)
Date: Thu, 24 Jan 2002 11:30:04 -0700 (MST)
Subject: CCL:Largest Linux Cluster?
Message-ID: <200201241830.LAA13592@dogbert.mp.sandia.gov>
> >
> > Can anyone tell me what is currently the largest linux-based workstation
> > cluster that has been successfully deployed and is being used for
> > computational chemistry studies? (largest = number of nodes regardless of
> > the speed of each node).
> >
> Sandia National Laboratories has C-Plant that runs Linux in addition to several
> layers of home-grown OS on several thousand nodes. The basic node is a DEC
> ev6 with myranet (sp). They use no local disk, opting for a huge disk farm.
Our largest Cplant cluster is currently 1792 compute nodes. They are all
currently running Linux 2.2.18. We don't really do anything special to the
OS, other than add some modules for our Portals communication layer and our
parallel runtime environment, so there's not "several layers of home-grown OS".
I know that we have application groups doing computational chemistry codes,
but I can't give any details about what they are or how they are using Cplant.
-Ron
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From edwards at icantbelieveimdoingthis.com Thu Jan 24 13:04:51 2002
From: edwards at icantbelieveimdoingthis.com (Art Edwards)
Date: Thu, 24 Jan 2002 11:04:51 -0700
Subject: CCL:Largest Linux Cluster? (fwd)
In-Reply-To: ; from Eugene.Leitl@lrz.uni-muenchen.de on Thu, Jan 24, 2002 at 05:55:24PM +0100
References:
Message-ID: <20020124110451.A23716@icantbelieveimdoingthis.com>
On Thu, Jan 24, 2002 at 05:55:24PM +0100, Eugene Leitl wrote:
>
>
> -- Eugen* Leitl leitl
> ______________________________________________________________
> ICBMTO: N48 04'14.8'' E11 36'41.2'' http://www.leitl.org
> 57F9CFD3: ED90 0433 EB74 E4A9 537F CFF5 86E7 629B 57F9 CFD3
>
> ---------- Forwarded message ----------
> Date: Thu, 24 Jan 2002 07:12:18 -0800
> From: Mark Thompson
> To: chemistry at ccl.net
> Subject: CCL:Largest Linux Cluster?
>
>
> Can anyone tell me what is currently the largest linux-based workstation
> cluster that has been successfully deployed and is being used for
> computational chemistry studies? (largest = number of nodes regardless of
> the speed of each node).
>
Sandia National Laboratories has C-Plant that runs Linux in addition to several
layers of home-grown OS on several thousand nodes. The basic node is a DEC
ev6 with myranet (sp). They use no local disk, opting for a huge disk farm.
> Mark
>
>
> =================================
> Mark Thompson
> Planaria Software
> Seattle, WA.
> http://www.planaria-software.com
>
> Download ArgusLab at
> http://www.arguslab.com
> =================================
>
>
> -= This is automatically added to each message by mailing script =-
> CHEMISTRY at ccl.net -- To Everybody | CHEMISTRY-REQUEST at ccl.net -- To Admins
> MAILSERV at ccl.net -- HELP CHEMISTRY or HELP SEARCH
> CHEMISTRY-SEARCH at ccl.net -- archive search | Gopher: gopher.ccl.net 70
> Ftp: ftp.ccl.net | WWW: http://www.ccl.net/chemistry/ | Jan: jkl at osc.edu
>
>
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
--
Arthur H. Edwards
712 Valencia Dr. NE
Abq. NM 87108
(505) 256-0834
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From Eugene.Leitl at lrz.uni-muenchen.de Thu Jan 24 11:55:24 2002
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl)
Date: Thu, 24 Jan 2002 17:55:24 +0100 (MET)
Subject: CCL:Largest Linux Cluster? (fwd)
Message-ID:
-- Eugen* Leitl leitl
______________________________________________________________
ICBMTO: N48 04'14.8'' E11 36'41.2'' http://www.leitl.org
57F9CFD3: ED90 0433 EB74 E4A9 537F CFF5 86E7 629B 57F9 CFD3
---------- Forwarded message ----------
Date: Thu, 24 Jan 2002 07:12:18 -0800
From: Mark Thompson
To: chemistry at ccl.net
Subject: CCL:Largest Linux Cluster?
Can anyone tell me what is currently the largest linux-based workstation
cluster that has been successfully deployed and is being used for
computational chemistry studies? (largest = number of nodes regardless of
the speed of each node).
Mark
=================================
Mark Thompson
Planaria Software
Seattle, WA.
http://www.planaria-software.com
Download ArgusLab at
http://www.arguslab.com
=================================
-= This is automatically added to each message by mailing script =-
CHEMISTRY at ccl.net -- To Everybody | CHEMISTRY-REQUEST at ccl.net -- To Admins
MAILSERV at ccl.net -- HELP CHEMISTRY or HELP SEARCH
CHEMISTRY-SEARCH at ccl.net -- archive search | Gopher: gopher.ccl.net 70
Ftp: ftp.ccl.net | WWW: http://www.ccl.net/chemistry/ | Jan: jkl at osc.edu
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From alvin at Maggie.Linux-Consulting.com Thu Jan 24 13:17:28 2002
From: alvin at Maggie.Linux-Consulting.com (alvin at Maggie.Linux-Consulting.com)
Date: Thu, 24 Jan 2002 10:17:28 -0800 (PST)
Subject: CCL:Largest Linux Cluster? (fwd)
In-Reply-To: <20020124110451.A23716@icantbelieveimdoingthis.com>
Message-ID:
hi art
On Thu, 24 Jan 2002, Art Edwards wrote:
> On Thu, Jan 24, 2002 at 05:55:24PM +0100, Eugene Leitl wrote:
...
> > Can anyone tell me what is currently the largest linux-based workstation
> > cluster that has been successfully deployed and is being used for
> > computational chemistry studies? (largest = number of nodes regardless of
> > the speed of each node).
> >
> Sandia National Laboratories has C-Plant that runs Linux in addition to several
> layers of home-grown OS on several thousand nodes. The basic node is a DEC
> ev6 with myranet (sp). They use no local disk, opting for a huge disk farm.
do you happen to know how they manage the huge disk farm???
- resumably raid5 systems...
- are each raid5 sub-system dual-hosted so that the other cpu
can getto the data if one of the cpu cant get to it
- does all nodes access the "disk farm" thru the gigabit ethernet
or dual-hosted scsi cables ??
- how does one optimize a disk farm ?? (hdparm seems too clumbsy)
-- in the old days.... 1980's ... there used to be dual-hosted
disk controllers where PC-HOST#1 and PC-HOST#2 can both access the same
physical CDC/DEC/Fujitsu drives
- wish i could find these dual host scsi controllers for todaysPCs
have fun linuxing
alvin
http://www.Linux-1U.net .. 8x 200GB IDE disks -->> 1.6TeraByte 1U Raid5 ..
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From joelja at darkwing.uoregon.edu Thu Jan 24 13:49:16 2002
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Thu, 24 Jan 2002 10:49:16 -0800 (PST)
Subject: CCL:Largest Linux Cluster? (fwd)
In-Reply-To:
Message-ID:
On Thu, 24 Jan 2002 alvin at Maggie.Linux-Consulting.com wrote:
>
> hi art
>
> On Thu, 24 Jan 2002, Art Edwards wrote:
>
> > On Thu, Jan 24, 2002 at 05:55:24PM +0100, Eugene Leitl wrote:
>
> ...
>
> > > Can anyone tell me what is currently the largest linux-based workstation
> > > cluster that has been successfully deployed and is being used for
> > > computational chemistry studies? (largest = number of nodes regardless of
> > > the speed of each node).
> > >
> > Sandia National Laboratories has C-Plant that runs Linux in addition to several
> > layers of home-grown OS on several thousand nodes. The basic node is a DEC
> > ev6 with myranet (sp). They use no local disk, opting for a huge disk farm.
cplant info is here:
http://www.cs.sandia.gov/cplant/
the high-level summary would something like:
myranet connected nodes in cabinets... cabinets internected with myranete.
disk storage nodes with raid5 arrays with their own myranet switch and
then cabinet level interconnects.
and unrelated observation whould be that:
dual hosted scsi systems have largely if note entirely been suplanted by
fibre-channel.
> do you happen to know how they manage the huge disk farm???
> - resumably raid5 systems...
> - are each raid5 sub-system dual-hosted so that the other cpu
> can getto the data if one of the cpu cant get to it
> - does all nodes access the "disk farm" thru the gigabit ethernet
> or dual-hosted scsi cables ??
> - how does one optimize a disk farm ?? (hdparm seems too clumbsy)
>
> -- in the old days.... 1980's ... there used to be dual-hosted
> disk controllers where PC-HOST#1 and PC-HOST#2 can both access the same
> physical CDC/DEC/Fujitsu drives
> - wish i could find these dual host scsi controllers for todaysPCs
>
> have fun linuxing
> alvin
> http://www.Linux-1U.net .. 8x 200GB IDE disks -->> 1.6TeraByte 1U Raid5 ..
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
--
--------------------------------------------------------------------------
Joel Jaeggli Academic User Services joelja at darkwing.uoregon.edu
-- PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E --
The accumulation of all powers, legislative, executive, and judiciary, in
the same hands, whether of one, a few, or many, and whether hereditary,
selfappointed, or elective, may justly be pronounced the very definition of
tyranny. - James Madison, Federalist Papers 47 - Feb 1, 1788
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From ctierney at hpti.com Thu Jan 24 13:55:06 2002
From: ctierney at hpti.com (Craig Tierney)
Date: Thu, 24 Jan 2002 11:55:06 -0700
Subject: CCL:Largest Linux Cluster? (fwd)
In-Reply-To: ; from alvin@Maggie.Linux-Consulting.com on Thu, Jan 24, 2002 at 10:17:28AM -0800
References: <20020124110451.A23716@icantbelieveimdoingthis.com>
Message-ID: <20020124115506.B8788@hpti.com>
Saying that 'they use no local disk, opting for a huge disk farm' is
not exactly correct. All nodes are diskless, and they do have a huge disk
farm. The san is for data storage, user codes, etc. The nodes are diskless
because they need to be able to switch parts the machine from classified to
non-classified depending on the situation. They cannot have any storage locally
to accomplish this. It doesn't take much disk to boot 1000 nodes remotely.
All common files are shared, and only some system specific things such as /etc
are generated unique to each node. I think (not positive) that /var and /tmp
may be small ram disks, but I don't remember exactly.
Does hdparm work with SCSI? Does it really accomplish much? I didn't think
it did. They SAN is scsi, so I don't think they are running hdparm to optimize it.
Craig
On Thu, Jan 24, 2002 at 10:17:28AM -0800, alvin at Maggie.Linux-Consulting.com wrote:
>
> hi art
>
> On Thu, 24 Jan 2002, Art Edwards wrote:
>
> > On Thu, Jan 24, 2002 at 05:55:24PM +0100, Eugene Leitl wrote:
>
> ...
>
> > > Can anyone tell me what is currently the largest linux-based workstation
> > > cluster that has been successfully deployed and is being used for
> > > computational chemistry studies? (largest = number of nodes regardless of
> > > the speed of each node).
> > >
> > Sandia National Laboratories has C-Plant that runs Linux in addition to several
> > layers of home-grown OS on several thousand nodes. The basic node is a DEC
> > ev6 with myranet (sp). They use no local disk, opting for a huge disk farm.
>
> do you happen to know how they manage the huge disk farm???
> - resumably raid5 systems...
> - are each raid5 sub-system dual-hosted so that the other cpu
> can getto the data if one of the cpu cant get to it
> - does all nodes access the "disk farm" thru the gigabit ethernet
> or dual-hosted scsi cables ??
> - how does one optimize a disk farm ?? (hdparm seems too clumbsy)
>
> -- in the old days.... 1980's ... there used to be dual-hosted
> disk controllers where PC-HOST#1 and PC-HOST#2 can both access the same
> physical CDC/DEC/Fujitsu drives
> - wish i could find these dual host scsi controllers for todaysPCs
>
> have fun linuxing
> alvin
> http://www.Linux-1U.net .. 8x 200GB IDE disks -->> 1.6TeraByte 1U Raid5 ..
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
--
Craig Tierney (ctierney at hpti.com)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From alvin at Maggie.Linux-Consulting.com Thu Jan 24 14:06:46 2002
From: alvin at Maggie.Linux-Consulting.com (alvin at Maggie.Linux-Consulting.com)
Date: Thu, 24 Jan 2002 11:06:46 -0800 (PST)
Subject: CCL:Largest Linux Cluster? -- ide
In-Reply-To: <20020124115506.B8788@hpti.com>
Message-ID:
hi ya craig
On Thu, 24 Jan 2002, Craig Tierney wrote:
> Saying that 'they use no local disk, opting for a huge disk farm' is
> not exactly correct. All nodes are diskless, and they do have a huge disk
> farm. The san is for data storage, user codes, etc. The nodes are diskless
> because they need to be able to switch parts the machine from classified to
> non-classified depending on the situation. They cannot have any storage locally
> to accomplish this. It doesn't take much disk to boot 1000 nodes remotely.
> All common files are shared, and only some system specific things such as /etc
> are generated unique to each node. I think (not positive) that /var and /tmp
> may be small ram disks, but I don't remember exactly.
>
> Does hdparm work with SCSI? Does it really accomplish much? I didn't think
> it did. They SAN is scsi, so I don't think they are running hdparm to optimize it.
hdparm is for ide.. ( am thinking on ide-based raid5...
for the other responder...
and yes...fibrechannel is good too, but again, any dual hosted controllers
for it too ???
have fun linuxing
alvin
http://www.Linux-1U.net .. 8x 200GB IDE disks -->> 1.6TeraByte 1U Raid5 ..
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From melione at carcara.lncc.br Fri Jan 25 08:22:47 2002
From: melione at carcara.lncc.br (Eduardo Melione Abreu)
Date: Fri, 25 Jan 2002 11:22:47 -0200 (BRST)
Subject: AMD Athlon with Intel Fortran Compiler
In-Reply-To: <20020124144459.A1561@icantbelieveimdoingthis.com>
Message-ID:
Hi,
Do anyone have yet installed and runned a Linux system with
an AMD Athlon processor and Intel Fortran Compiler (ifc)?
Thanks,
Melione.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From SGaudet at turbotekcomputer.com Fri Jan 25 08:55:39 2002
From: SGaudet at turbotekcomputer.com (Steve Gaudet)
Date: Fri, 25 Jan 2002 08:55:39 -0500
Subject: Largest Linux Cluster? (fwd)
Message-ID: <3450CC8673CFD411A24700105A618BD61BED73@911TURBO>
Hello,
> ---------- Forwarded message ----------
> Date: Thu, 24 Jan 2002 07:12:18 -0800
> From: Mark Thompson
> To: chemistry at ccl.net
> Subject: CCL:Largest Linux Cluster?
>
>
> Can anyone tell me what is currently the largest linux-based
> workstation
> cluster that has been successfully deployed and is being used for
> computational chemistry studies? (largest = number of nodes
> regardless of
> the speed of each node).
I can tell you what coming. IBM announced 100 million imitative to build
what promises to be the worlds most powerful supercomputer-"Blue Gene"-which
will be capable of more than one quadrillion operations per second (one
"petaflop").
Cheers,
Steve Gaudet
Linux Solutions Engineer
.....
===================================================================
| Turbotek Computer Corp. tel:603-666-3062 ext. 21 |
| 8025 South Willow St. fax:603-666-4519 |
| Building 2, Unit 105 toll free:800-573-5393 |
| Manchester, NH 03103 e-mail:sgaudet at turbotekcomputer.com |
| web: http://www.turbotekcomputer.com |
===================================================================
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From renambot at cs.vu.nl Fri Jan 25 09:28:27 2002
From: renambot at cs.vu.nl (Luc Renambot)
Date: Fri, 25 Jan 2002 15:28:27 +0100
Subject: AMD Athlon with Intel Fortran Compiler
In-Reply-To:
Message-ID:
I was thinking that too, since the Athlon MP supports SSE
instructions too.... It worth try.
Luc.
> -----Original Message-----
> From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
> Behalf Of Eduardo Melione Abreu
> Sent: Friday, January 25, 2002 2:23 PM
> To: Beowulf at beowulf.org
> Subject: AMD Athlon with Intel Fortran Compiler
>
>
> Hi,
>
> Do anyone have yet installed and runned a Linux system with
> an AMD Athlon processor and Intel Fortran Compiler (ifc)?
>
> Thanks,
> Melione.
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From ctierney at hpti.com Fri Jan 25 11:41:45 2002
From: ctierney at hpti.com (Craig Tierney)
Date: Fri, 25 Jan 2002 09:41:45 -0700
Subject: AMD Athlon with Intel Fortran Compiler
In-Reply-To: ; from melione@carcara.lncc.br on Fri, Jan 25, 2002 at 11:22:47AM -0200
References: <20020124144459.A1561@icantbelieveimdoingthis.com>
Message-ID: <20020125094145.B10127@hpti.com>
I ran the Intel Fortran Compiler on an dual 1.2 Ghz AMD system.
I was seen speed improvements of 10-30% over the Portland
Group compiler on 3 different fortran 77 codes.
I was using the SSE instructions when compiling.
The Intel compilers are fast, they are just a bit quirky sometimes.
They complain about Fortran 77 and Fortran 90 syntax unless you tell
it not too. It didn't like some of the code and would complain with
internal compiler errors until I reorganized some code. I never got
my Fortran 90 program to run with optmization. I don't think it
is all F90 codes just mine had some syntax that it couldn't digest.
Craig
> Hi,
>
> Do anyone have yet installed and runned a Linux system with
> an AMD Athlon processor and Intel Fortran Compiler (ifc)?
>
> Thanks,
> Melione.
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
--
Craig Tierney (ctierney at hpti.com)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From matz at wsunix.wsu.edu Fri Jan 25 12:26:06 2002
From: matz at wsunix.wsu.edu (Phillip Matz)
Date: Fri, 25 Jan 2002 09:26:06 -0800
Subject: AMD Athlon with Intel Fortran Compiler
In-Reply-To: <20020125094145.B10127@hpti.com>
Message-ID: <000001c1a5c5$5f0136e0$1200a8c0@chem.wsu.edu>
The 10-30% improvement you observed, was that compared to SSE code
generated by the latest PGI compilers (Rev 3.3 with SSE for AMD) or an
older PGI compiler that doesn't support SSE?
Regards,
Phil Matz
-----Original Message-----
From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org] On
Behalf Of Craig Tierney
Sent: Friday, January 25, 2002 8:42 AM
To: Eduardo Melione Abreu
Cc: Beowulf at beowulf.org
Subject: Re: AMD Athlon with Intel Fortran Compiler
I ran the Intel Fortran Compiler on an dual 1.2 Ghz AMD system. I was
seen speed improvements of 10-30% over the Portland Group compiler on 3
different fortran 77 codes. I was using the SSE instructions when
compiling.
The Intel compilers are fast, they are just a bit quirky sometimes. They
complain about Fortran 77 and Fortran 90 syntax unless you tell it not
too. It didn't like some of the code and would complain with internal
compiler errors until I reorganized some code. I never got my Fortran
90 program to run with optmization. I don't think it is all F90 codes
just mine had some syntax that it couldn't digest.
Craig
> Hi,
>
> Do anyone have yet installed and runned a Linux system with an AMD
> Athlon processor and Intel Fortran Compiler (ifc)?
>
> Thanks,
> Melione.
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
--
Craig Tierney (ctierney at hpti.com)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From ctierney at hpti.com Fri Jan 25 12:33:53 2002
From: ctierney at hpti.com (Craig Tierney)
Date: Fri, 25 Jan 2002 10:33:53 -0700
Subject: AMD Athlon with Intel Fortran Compiler
In-Reply-To: <000001c1a5c5$5f0136e0$1200a8c0@chem.wsu.edu>; from matz@wsunix.wsu.edu on Fri, Jan 25, 2002 at 09:26:06AM -0800
References: <20020125094145.B10127@hpti.com> <000001c1a5c5$5f0136e0$1200a8c0@chem.wsu.edu>
Message-ID: <20020125103353.A10246@hpti.com>
Sorry for skipping that info. I did the tests a few months
ago and the compiler version was 3.2-4. I did compile
with SSE. It said there was SSE support (not for AMD explicitly).
Was I wrong? Is 3.3 actually out? When did it come
out?
Sorry to misinform people as the version 3.3 may provide more
performance over the one I tested.
Craig
On Fri, Jan 25, 2002 at 09:26:06AM -0800, Phillip Matz wrote:
> The 10-30% improvement you observed, was that compared to SSE code
> generated by the latest PGI compilers (Rev 3.3 with SSE for AMD) or an
> older PGI compiler that doesn't support SSE?
>
> Regards,
>
> Phil Matz
>
>
> -----Original Message-----
> From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org] On
> Behalf Of Craig Tierney
> Sent: Friday, January 25, 2002 8:42 AM
> To: Eduardo Melione Abreu
> Cc: Beowulf at beowulf.org
> Subject: Re: AMD Athlon with Intel Fortran Compiler
>
>
> I ran the Intel Fortran Compiler on an dual 1.2 Ghz AMD system. I was
> seen speed improvements of 10-30% over the Portland Group compiler on 3
> different fortran 77 codes. I was using the SSE instructions when
> compiling.
>
> The Intel compilers are fast, they are just a bit quirky sometimes. They
> complain about Fortran 77 and Fortran 90 syntax unless you tell it not
> too. It didn't like some of the code and would complain with internal
> compiler errors until I reorganized some code. I never got my Fortran
> 90 program to run with optmization. I don't think it is all F90 codes
> just mine had some syntax that it couldn't digest.
>
> Craig
>
>
> > Hi,
> >
> > Do anyone have yet installed and runned a Linux system with an AMD
> > Athlon processor and Intel Fortran Compiler (ifc)?
> >
> > Thanks,
> > Melione.
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> Craig Tierney (ctierney at hpti.com)
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
--
Craig Tierney (ctierney at hpti.com)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From bill at hilfworks.com Fri Jan 25 12:37:31 2002
From: bill at hilfworks.com (Bill Hilf)
Date: Fri, 25 Jan 2002 09:37:31 -0800
Subject: AMD Athlon with Intel
References: <20020124144459.A1561@icantbelieveimdoingthis.com> <20020125094145.B10127@hpti.com>
Message-ID: <3C5197DB.FA4D581D@hilfworks.com>
Craig Tierney wrote:
>
> I ran the Intel Fortran Compiler on an dual 1.2 Ghz AMD system.
> I was seen speed improvements of 10-30% over the Portland
> Group compiler on 3 different fortran 77 codes.
> I was using the SSE instructions when compiling.
>
> The Intel compilers are fast, they are just a bit quirky sometimes.
> They complain about Fortran 77 and Fortran 90 syntax unless you tell
> it not too. It didn't like some of the code and would complain with
> internal compiler errors until I reorganized some code. I never got
> my Fortran 90 program to run with optmization. I don't think it
> is all F90 codes just mine had some syntax that it couldn't digest.
Slightly related -- does anyone have a url(s) for recent Athlon vs.
Intel benchmarks? Particularly for comp chem applications?
Thanks
Bill
--
-Bill
PGP Fingerprint: 4CE0 D72C C7A2 89B2 6B23 03DC B5E9 77CB E6F3 0D2A
http://pgpkeys.mit.edu:11371/pks/lookup?op=get&exact=on&search=0xE6F30D2A
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From d.l.farley at larc.nasa.gov Fri Jan 25 12:34:03 2002
From: d.l.farley at larc.nasa.gov (Doug Farley)
Date: Fri, 25 Jan 2002 12:34:03 -0500
Subject: AMD Athlon with Intel Fortran Compiler
References:
Message-ID: <3C51970B.7070502@larc.nasa.gov>
Eduardo Melione Abreu wrote:
>Hi,
>
>Do anyone have yet installed and runned a Linux system with
>an AMD Athlon processor and Intel Fortran Compiler (ifc)?
>
>Thanks,
>Melione.
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>
Hi,
Yeah, I have it running on some AthMP's, however, the results are
sometimes iffy in my experience, some things will compile and work fine,
others wont, but if you leave off some of the highest optimization
levels, most things have worked well so far.
--
Douglas Farley
Data Analysis and Imaging Branch
Systems Engineering Competency
NASA Langley Research Center
< D.L.FARLEY at LaRC.NASA.GOV >
< Phone +1 757 864-8141 >
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From Daniel.Kidger at quadrics.com Fri Jan 25 12:35:52 2002
From: Daniel.Kidger at quadrics.com (Daniel Kidger)
Date: Fri, 25 Jan 2002 17:35:52 -0000
Subject: AMD Athlon with Intel Fortran Compiler
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA74D2C00@stegosaurus.bristol.quadrics.com>
>The Intel compilers are fast, they are just a bit quirky sometimes.
>They complain about Fortran 77 and Fortran 90 syntax unless you tell
>it not too. It didn't like some of the code and would complain with
>internal compiler errors until I reorganized some code. I never got
>my Fortran 90 program to run with optmization. I don't think it
>is all F90 codes just mine had some syntax that it couldn't digest.
Our experience is they are robust and produce very fast code.
When we get 'internal compiler errors' - by deleting work.pc* and
recompiling
mostly cures the problem.
One other quirk is that using ifc to link produces large static binaries ,
but using icc uses dynamic linking. The static linking can easily be cured
(find ifc.cfg) but it is odd that the defaults are different?
Yours,
Daniel.
--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505
----------------------- www.quadrics.com --------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From timm at fnal.gov Fri Jan 25 12:54:37 2002
From: timm at fnal.gov (Steven Timm)
Date: Fri, 25 Jan 2002 11:54:37 -0600 (CST)
Subject: Dual Athlon MP 1U units
Message-ID:
I am just wondering how many people have managed to get a
cluster of dual Athlon-MP nodes up and running. If so,
which motherboards and chipsets are you using, and has anyone
safely done this in a 1U form factor?
Thanks
Steve Timm
------------------------------------------------------------------
Steven C. Timm (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From matz at wsunix.wsu.edu Fri Jan 25 13:09:25 2002
From: matz at wsunix.wsu.edu (Phillip Matz)
Date: Fri, 25 Jan 2002 10:09:25 -0800
Subject: AMD Athlon with Intel Fortran Compiler
In-Reply-To: <20020125103353.A10246@hpti.com>
Message-ID: <000001c1a5cb$6c6b25b0$1200a8c0@chem.wsu.edu>
No prob, I was just curious.
Yes 3.3 is out (got my copy mid Dec.) and it supports a switch for
Athlon XP SSE. I am not sure why one needs specify which architecture
one will be running the SSE compatible binaries on (it may very well
only be an aesthetic feature but still generates the same SSE code, I
don't know). The association with Athlon XP for SSE compiling may have
more to do with the specific implementation of prefetch(via -tp athlonxp
-Mvect=prefetch) which the compiler needs to know in order to optimize
the SSE code, I can honestly say I have no real idea why just that they
are treated differently (Intel SSE vs. Athlon XP SSE) in the compiler
options for PGI 3.3.
Regards,
Phil
-----Original Message-----
From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org] On
Behalf Of Craig Tierney
Sent: Friday, January 25, 2002 9:34 AM
To: Phillip Matz
Cc: 'Eduardo Melione Abreu'; Beowulf at beowulf.org
Subject: Re: AMD Athlon with Intel Fortran Compiler
Sorry for skipping that info. I did the tests a few months
ago and the compiler version was 3.2-4. I did compile
with SSE. It said there was SSE support (not for AMD explicitly). Was I
wrong? Is 3.3 actually out? When did it come out?
Sorry to misinform people as the version 3.3 may provide more
performance over the one I tested.
Craig
On Fri, Jan 25, 2002 at 09:26:06AM -0800, Phillip Matz wrote:
> The 10-30% improvement you observed, was that compared to SSE code
> generated by the latest PGI compilers (Rev 3.3 with SSE for AMD) or an
> older PGI compiler that doesn't support SSE?
>
> Regards,
>
> Phil Matz
>
>
> -----Original Message-----
> From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org] On
> Behalf Of Craig Tierney
> Sent: Friday, January 25, 2002 8:42 AM
> To: Eduardo Melione Abreu
> Cc: Beowulf at beowulf.org
> Subject: Re: AMD Athlon with Intel Fortran Compiler
>
>
> I ran the Intel Fortran Compiler on an dual 1.2 Ghz AMD system. I was
> seen speed improvements of 10-30% over the Portland Group compiler on
> 3 different fortran 77 codes. I was using the SSE instructions when
> compiling.
>
> The Intel compilers are fast, they are just a bit quirky sometimes.
> They complain about Fortran 77 and Fortran 90 syntax unless you tell
> it not too. It didn't like some of the code and would complain with
> internal compiler errors until I reorganized some code. I never got
> my Fortran 90 program to run with optmization. I don't think it is
> all F90 codes just mine had some syntax that it couldn't digest.
>
> Craig
>
>
> > Hi,
> >
> > Do anyone have yet installed and runned a Linux system with an AMD
> > Athlon processor and Intel Fortran Compiler (ifc)?
> >
> > Thanks,
> > Melione.
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> Craig Tierney (ctierney at hpti.com)
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
--
Craig Tierney (ctierney at hpti.com)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From mirabai at linuxlabs.com Fri Jan 25 12:05:41 2002
From: mirabai at linuxlabs.com (mirabai at linuxlabs.com)
Date: Fri, 25 Jan 2002 12:05:41 -0500 (EST)
Subject: Dual Athlon MP 1U units
In-Reply-To:
Message-ID:
we have put together a 51 node dual Athlon 1600MP that has been up and running
for some time now. we are using Tyan Tiger MP motherboards for slave nodes
and Thunder K-7 for the master, secondary master and file server. running
scyld.
we have 4 1u dual Athlon's running linux. We have encountered numerous power
supply issues as well as overheating. Due to have a solution soon enough.
On Fri, 25 Jan 2002, Steven Timm wrote:
>
> I am just wondering how many people have managed to get a
> cluster of dual Athlon-MP nodes up and running. If so,
> which motherboards and chipsets are you using, and has anyone
> safely done this in a 1U form factor?
>
> Thanks
>
> Steve Timm
>
>
> ------------------------------------------------------------------
> Steven C. Timm (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/
> Fermilab Computing Division/Operating Systems Support
> Scientific Computing Support Group--Computing Farms Operations
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
--
javabaw, inc., minna mirabai, asst. engineer--- linux labs
..... .... 230 peachtree #2705
linux labs atlanta.ga.us 30303
"mission critical linux" http://www.linuxlabs.com
24hr dispatch: 800.788.9319
---------------------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From cblack at eragen.com Fri Jan 25 14:18:06 2002
From: cblack at eragen.com (Chris Black)
Date: Fri, 25 Jan 2002 14:18:06 -0500
Subject: Dual Athlon MP 1U units
In-Reply-To:
References:
Message-ID: <20020125141806.B9413@getafix.EraGen.com>
We run dual 1.2GHz amd athlon mp cpu nodes with the Tyan S2462 Thunder
K7 motherboards under 2.4 kernels. At the time we got our nodes, this
was the only dual athlon board out for any length of time, so we ended
up getting it even though it has onboard scsi that we don't use. There
is a model out now without the scsi but the same chipset. Only issues
is that we needed good power supplies with these boards (460 or 435,
not sure). aapro (or some company with a similar name) makes 1U dual
athlons, but we decided against it because we were worried about
cooling issues. We actually rackmount our 2U nodes with some space between
them for some extra airflow above and below the cases.
Chris Black
On Fri, Jan 25, 2002 at 11:54:37AM -0600, Steven Timm wrote:
>
> I am just wondering how many people have managed to get a
> cluster of dual Athlon-MP nodes up and running. If so,
> which motherboards and chipsets are you using, and has anyone
> safely done this in a 1U form factor?
>
> Thanks
>
> Steve Timm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL:
From Eugene.Leitl at lrz.uni-muenchen.de Fri Jan 25 14:31:43 2002
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl)
Date: Fri, 25 Jan 2002 20:31:43 +0100 (MET)
Subject: Dual Athlon MP 1U units
In-Reply-To:
Message-ID:
On Fri, 25 Jan 2002, Steven Timm wrote:
> I am just wondering how many people have managed to get a
> cluster of dual Athlon-MP nodes up and running. If so,
> which motherboards and chipsets are you using, and has anyone
> safely done this in a 1U form factor?
I'm hearing (iX tests, iirc) the heat dissipation of 1U dual Athlons makes
high density rackmounting difficult. Both within the cluster (which could
be solved by sufficient airflow) and for air conditioning.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From zarquon at zarq.dhs.org Fri Jan 25 17:02:00 2002
From: zarquon at zarq.dhs.org (R C)
Date: Fri, 25 Jan 2002 17:02:00 -0500
Subject: Dual Athlon MP 1U units
In-Reply-To:
References:
Message-ID: <20020125220200.GA2553@zarq.dhs.org>
On Fri, Jan 25, 2002 at 11:54:37AM -0600, Steven Timm wrote:
>
> I am just wondering how many people have managed to get a
> cluster of dual Athlon-MP nodes up and running. If so,
> which motherboards and chipsets are you using, and has anyone
> safely done this in a 1U form factor?
We're in the process of testing our 1U dual athalon nodes from Racksaver.
Configuration:
16 nodes
Dual Athalon 1.53 (1800+)
512 MB PC2100 Reg/ECC (Crucial / Corsair) (We ordered Crucial ram modules
before the price hike)
20 GB IDE HDs (IBM)
S2462NG (non-scsi version)
The units themselves are solid, and hefty (roughly 30 lbs). They do draw
quite a bit of power (we are waiting for a 2nd 30 amp drop). No
problems with them so far (24 hour burnin, room temperature approx 75-80
deg F, above recommended temperature). They are noisy, as one would
expect from 1U units with these processors. Don't put them in an office.
CPU temperatures after 24 hour runs were in the 49-55 C range.
We haven't gotten our software on all the units yet, but they seem
stable. Once our school actually cut the PO, the order went through
quickly.
Robert Cicconetti
PS. Has anyone gotten Wake-on-Lan working on this motherboard?
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From alvin at Maggie.Linux-Consulting.com Fri Jan 25 21:09:12 2002
From: alvin at Maggie.Linux-Consulting.com (alvin at Maggie.Linux-Consulting.com)
Date: Fri, 25 Jan 2002 18:09:12 -0800 (PST)
Subject: Dual Athlon MP 1U units - rack save
In-Reply-To: <20020125220200.GA2553@zarq.dhs.org>
Message-ID:
hi all
i like these discussions/feedbacks of 1U chassis issues.. :-)
i've heard of problems woth the appro dual-amd chasiss tooo
and yes... its not trivial to solve
- notice they have tons of air flow too with them
large 4" blowers..
i'd be curious to see/update status of the racksaver
1U tests...
airflow outside the case might not help... since the CPU
might not have access to the outside air flow, when its
butting up against the hot power supply
- its better to expose the cpu to the outside
by leave it close to the chassis edge rather
than up against the power supply
- when the cpu is breathing air from the otuside
than the ambient temperature does help in cooling
the cpu core too
also with one side fan on the cpu for cooling, if that fan dies...
that cpu will get too hot especially when the cpu is covered
with those plastic air-flow housing
we've just gotten our protootype P4/AMD 1U-capable chassis
and ready for some heat/airflow/power supply testing
- fun stuff ??
one of my simple 1U tests - is an infinite kernel compile
and simulataneously creating 2GB-sized files and also exercising
the disks with "tree /" and see how hot it gets ...
if it passes the onboard cpu temp tests... than we're happy
thanx
alvin
http://www.Linux-1U.net ... P4 and AMD - based 1Us...
On Fri, 25 Jan 2002, R C wrote:
> On Fri, Jan 25, 2002 at 11:54:37AM -0600, Steven Timm wrote:
> >
> > I am just wondering how many people have managed to get a
> > cluster of dual Athlon-MP nodes up and running. If so,
> > which motherboards and chipsets are you using, and has anyone
> > safely done this in a 1U form factor?
>
> We're in the process of testing our 1U dual athalon nodes from Racksaver.
>
> Configuration:
> 16 nodes
> Dual Athalon 1.53 (1800+)
> 512 MB PC2100 Reg/ECC (Crucial / Corsair) (We ordered Crucial ram modules
> before the price hike)
> 20 GB IDE HDs (IBM)
> S2462NG (non-scsi version)
>
> The units themselves are solid, and hefty (roughly 30 lbs). They do draw
> quite a bit of power (we are waiting for a 2nd 30 amp drop). No
> problems with them so far (24 hour burnin, room temperature approx 75-80
> deg F, above recommended temperature). They are noisy, as one would
> expect from 1U units with these processors. Don't put them in an office.
> CPU temperatures after 24 hour runs were in the 49-55 C range.
>
> We haven't gotten our software on all the units yet, but they seem
> stable. Once our school actually cut the PO, the order went through
> quickly.
>
> Robert Cicconetti
>
> PS. Has anyone gotten Wake-on-Lan working on this motherboard?
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rgb at phy.duke.edu Fri Jan 25 22:17:48 2002
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 25 Jan 2002 22:17:48 -0500 (EST)
Subject: Dual Athlon MP 1U units
In-Reply-To:
Message-ID:
On Fri, 25 Jan 2002, Steven Timm wrote:
>
> I am just wondering how many people have managed to get a
> cluster of dual Athlon-MP nodes up and running. If so,
> which motherboards and chipsets are you using, and has anyone
> safely done this in a 1U form factor?
We're waiting on a cluster room renovation to have our full cluster
built, but we've brought up individual 2U dual nodes based on the Tyan
Tiger. We've encountered a few minor problems -- the network card
inexplicably but consistently wouldn't work in the first slot of the
riser so we had to swap it with a video card (probably unnecessary in
production but useful for assembly and debugging). We had to reflash
the 3C905s to get them to PXEboot correctly. A few other flakes.
However, once you get everything hammered out, one can PXEboot straight
into a kickstart install and really zip (about 5 minutes for a full
install of a 7.2 "cluster node" kickstart configuration over 100BT), and
they seem to work well enough in the limited tests we've run with only a
few nodes up.
Having messed inside these 2U cases, I personally would not really
recommend 1U duals, even though there are definitely vendors who will
sell them. 2U gives you three riser slots, which is useful. 2U gives
you room for a whole bunch of cooling fans (our cases have several and
we might install still more if we have thermal problems). 2U isn't
exactly >>roomy<< for these motherboards -- 1U would be downright
crowded, and I'd be very worried about heat when all nodes are really
cranking in a stack.
If you like, I'll give an update when we have them racked up. The room
is nearly finished but still needs the racks to be bolted to the floor,
security locks, an X10 or two for remote video monitoring, and we're
still trying to dicker a thermal kill for the master power panels
(anybody have recommendations or comments?). Vendor recommendations for
telco-type patch panels that permit whole bundles of cat5 to be shipped
around at once are also welcome. With luck we might be done in two
weeks.
rgb
>
> Thanks
>
> Steve Timm
>
>
> ------------------------------------------------------------------
> Steven C. Timm (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/
> Fermilab Computing Division/Operating Systems Support
> Scientific Computing Support Group--Computing Farms Operations
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From drikis at mail.dyu.edu.tw Fri Jan 25 22:44:51 2002
From: drikis at mail.dyu.edu.tw (Drikis Ivars)
Date: Sat, 26 Jan 2002 11:44:51 +0800 (CST)
Subject: Dual Athlon MP 1U units
In-Reply-To:
Message-ID:
What about application of water cooling in 1U?
http://www6.tomshardware.com/cpu/02q1/020102/index.html
Despite to http://www6.tomshardware.com/cpu/02q1/020117/index.html
I cant imagine real PC system working on water cooling...
---------------------------------------------------------------
Dr. Phys. Ivars Drikis Department of Mechanical Engineering
Da-Yeh University, Changhua
tel: 886-4-8528469 Taiwan 515
On Fri, 25 Jan 2002, Eugene Leitl wrote:
> On Fri, 25 Jan 2002, Steven Timm wrote:
>
> > I am just wondering how many people have managed to get a
> > cluster of dual Athlon-MP nodes up and running. If so,
> > which motherboards and chipsets are you using, and has anyone
> > safely done this in a 1U form factor?
>
> I'm hearing (iX tests, iirc) the heat dissipation of 1U dual Athlons makes
> high density rackmounting difficult. Both within the cluster (which could
> be solved by sufficient airflow) and for air conditioning.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From wsb at paralleldata.com Sat Jan 26 03:30:04 2002
From: wsb at paralleldata.com (W Bauske)
Date: Sat, 26 Jan 2002 02:30:04 -0600
Subject: Dual Athlon MP 1U units
References: <20020125141806.B9413@getafix.EraGen.com> <20020126020610.X59723@velocet.ca>
Message-ID: <3C52690C.BB73A4D8@paralleldata.com>
Velocet wrote:
>
> Whats the power dissipation of running dual 1.2 GHz Mp's? How about for
> 1.33Ghz regular athlons in non-SMP configs as comparison? (As well, how much
> heat comes off typical power supplies to run these systems?)
>
My TigerMP XP1600 duals take about 1.7amps at 125v.
Forgot the formula to convert to btu's. Vaguely remember a factor
of around 3.42. Not sure if that was for Watt's or VoltAmps. Assuming
a VA is approximately a Watt, 212.5 * 3.42 = 727 btu per system.
At least with that you can calculate your AC load for a rack. Say 40
1U's per rack, 29080 btu's. A ton of AC is 12000 btu's. So, 2.5 ton's
of AC per rack. Course, you have 40x1.7 amps going into the rack for
a power load of 68 Amps at 125v.
Those that know the real numbers, please correct. A VA is really around
.7 - .8 watts, so these calculations are high by maybe 20%. Figure
the extra allows you to plug in the switches/peripherals/servers in addition
to the nodes.
Wes
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From math at velocet.ca Sat Jan 26 02:06:10 2002
From: math at velocet.ca (Velocet)
Date: Sat, 26 Jan 2002 02:06:10 -0500
Subject: Dual Athlon MP 1U units
In-Reply-To: <20020125141806.B9413@getafix.EraGen.com>; from cblack@EraGen.com on Fri, Jan 25, 2002 at 02:18:06PM -0500
References: <20020125141806.B9413@getafix.EraGen.com>
Message-ID: <20020126020610.X59723@velocet.ca>
On Fri, Jan 25, 2002 at 02:18:06PM -0500, Chris Black's all...
> We run dual 1.2GHz amd athlon mp cpu nodes with the Tyan S2462 Thunder
> K7 motherboards under 2.4 kernels. At the time we got our nodes, this
> was the only dual athlon board out for any length of time, so we ended
> up getting it even though it has onboard scsi that we don't use. There
> is a model out now without the scsi but the same chipset. Only issues
> is that we needed good power supplies with these boards (460 or 435,
> not sure). aapro (or some company with a similar name) makes 1U dual
> athlons, but we decided against it because we were worried about
> cooling issues. We actually rackmount our 2U nodes with some space between
> them for some extra airflow above and below the cases.
Whats the power dissipation of running dual 1.2 GHz Mp's? How about for
1.33Ghz regular athlons in non-SMP configs as comparison? (As well, how much
heat comes off typical power supplies to run these systems?)
How many CFM of airflow are needed for typical configs of each to ensure
cooling?
/kc
>
> Chris Black
>
> On Fri, Jan 25, 2002 at 11:54:37AM -0600, Steven Timm wrote:
> >
> > I am just wondering how many people have managed to get a
> > cluster of dual Athlon-MP nodes up and running. If so,
> > which motherboards and chipsets are you using, and has anyone
> > safely done this in a 1U form factor?
> >
> > Thanks
> >
> > Steve Timm
>
--
Ken Chase, math at velocet.ca * Velocet Communications Inc. * Toronto, CANADA
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From pdiaz88 at terra.es Sat Jan 26 11:17:06 2002
From: pdiaz88 at terra.es (Pedro =?iso-8859-1?q?D=EDaz=20Jim=E9nez?=)
Date: Sat, 26 Jan 2002 16:17:06 +0000
Subject: Open Magazine: Intel C/C++ Compiler Beats GCC (and MS VC++)
Message-ID: <02012616170600.00827@duero>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
http://www.open-mag.com/754088105111.htm
(pretty slow, /. attack in progress...)
Yet Another Benchmark, though
Regards
Pedro
- --
/*
* Pedro Diaz Jimenez: pdiaz88 at terra.es, pdiaz at acm.asoc.fi.upm.es
* http://acm.asoc.fi.upm.es/~pdiaz
*
* GPG KeyID: E118C651
* Fingerprint: 1FD9 163B 649C DDDC 422D 5E82 9EEE 777D E118 C65
*
*/
- --
Physics isn't a religion. If it were, we'd have a much easier time raising
money
-- Leon Lederman
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org
iD8DBQE8UtaInu53feEYxlERAmkeAKDSIBNSLPUKssZgK4hcOR14pp8KqwCbBiZu
HfJvB26RnoHRuOPCljeBFLs=
=BW2Y
-----END PGP SIGNATURE-----
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From bransom at ucdavis.edu Sat Jan 26 12:02:37 2002
From: bransom at ucdavis.edu (Ben Ransom)
Date: Sat, 26 Jan 2002 09:02:37 -0800
Subject: Dual Athlon MP 1U units
In-Reply-To: <3C52690C.BB73A4D8@paralleldata.com>
References:
<20020125141806.B9413@getafix.EraGen.com>
<20020126020610.X59723@velocet.ca>
Message-ID: <5.0.2.1.0.20020126084920.02650240@maemail.ucdavis.edu>
Are PFC (power factor correction) power supplies standard, or something
special wrt providing only required power and therefore less heat? A
vendor told me that a dual 1900 Athlon on Tyan 2466 mboard can be run in 1U
if done with a 300w PFC power supply. Do they know some great secret, or
are they stretching physics?
PS: AMD shows one vendor's case approved with the Athlon MP running at all
available clock speeds, i.e. that would include 1900+
-Ben Ransom
At 02:30 AM 1/26/2002 -0600, you wrote:
>Velocet wrote:
> >
> > Whats the power dissipation of running dual 1.2 GHz Mp's? How about for
> > 1.33Ghz regular athlons in non-SMP configs as comparison? (As well, how
> much
> > heat comes off typical power supplies to run these systems?)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From joelja at darkwing.uoregon.edu Sat Jan 26 12:30:20 2002
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Sat, 26 Jan 2002 09:30:20 -0800 (PST)
Subject: Dual Athlon MP 1U units
In-Reply-To: <5.0.2.1.0.20020126084920.02650240@maemail.ucdavis.edu>
Message-ID:
On Sat, 26 Jan 2002, Ben Ransom wrote:
> Are PFC (power factor correction) power supplies standard,
One would assume that by power factor correction that they mean reducing
the difference between working power and apparent power... In general
switching power-supplies re fairly efficient anyway, so one would have to
evaluate their claims based on the design of the power supply. The crucial
thing to bear in mind with smp athlons is can the power-supply provide the
current rise need when the cpu's switch from idle to going full-bore.
> or something
> special wrt providing only required power and therefore less heat?
switching power supplies only provided required power awayway. if you plug
a dual athlon mainboard based system with a 560watt power supply into ac
power it should draw something like 1.7 amps... if you plug in a 300watt
power supply instead it should still draw 1.7 amps.
> A
> vendor told me that a dual 1900 Athlon on Tyan 2466 mboard can be run in 1U
> if done with a 300w PFC power supply. Do they know some great secret, or
> are they stretching physics?
probably their power supply has an extra set of caps on the 3.3 and 5
volt rails to cope with the rise in current demand.
> PS: AMD shows one vendor's case approved with the Athlon MP running at all
> available clock speeds, i.e. that would include 1900+
> -Ben Ransom
>
>
> At 02:30 AM 1/26/2002 -0600, you wrote:
> >Velocet wrote:
> > >
> > > Whats the power dissipation of running dual 1.2 GHz Mp's? How about for
> > > 1.33Ghz regular athlons in non-SMP configs as comparison? (As well, how
> > much
> > > heat comes off typical power supplies to run these systems?)
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
--
--------------------------------------------------------------------------
Joel Jaeggli Academic User Services joelja at darkwing.uoregon.edu
-- PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E --
The accumulation of all powers, legislative, executive, and judiciary, in
the same hands, whether of one, a few, or many, and whether hereditary,
selfappointed, or elective, may justly be pronounced the very definition of
tyranny. - James Madison, Federalist Papers 47 - Feb 1, 1788
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From sandy at storm.ca Sat Jan 26 12:37:22 2002
From: sandy at storm.ca (Sandy Harris)
Date: Sat, 26 Jan 2002 12:37:22 -0500
Subject: Dual Athlon MP 1U units
References:
<20020125141806.B9413@getafix.EraGen.com>
<20020126020610.X59723@velocet.ca> <5.0.2.1.0.20020126084920.02650240@maemail.ucdavis.edu>
Message-ID: <3C52E952.9B5A8DCD@storm.ca>
For 1U cases, I'd be inlined to consider the Tualatin core .13 micron Pentium
IIIs, 512K cache.
Checking datasheets on AMD's site, I find they quote both "maximum" and "typical"
power for each CPU. The range from lowest typical to highest max is:
Athlon MP 41.3 54.7
Athlon XP 53.8 70
Intel's numbers aren't directly comparable since they give only one "design
power number". Their ranges are:
Xeon 55 77.5
P4 48.9 71.8
In both cases, the new 2200 MHz on .13 are significantly lower poer than
some of the older slower parts.
Meanwhile the .13 P III with 512 K cache are at
tualtin 27.9 31.2
for 1133 to 1400 MHz parts, and reviews say performance of 1266 is comparable
to 1800 or so P4.
Seems to me these are the obvious CPU choice for a 1U chassis unless
other specific factors, like Athlon floating point performance,
weigh very heavily in your application.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From bari at onelabs.com Sat Jan 26 12:52:35 2002
From: bari at onelabs.com (Bari Ari)
Date: Sat, 26 Jan 2002 11:52:35 -0600
Subject: Dual Athlon MP 1U units
References: <20020125141806.B9413@getafix.EraGen.com> <20020126020610.X59723@velocet.ca> <3C52690C.BB73A4D8@paralleldata.com>
Message-ID: <3C52ECE3.7030607@onelabs.com>
W Bauske wrote:
> Velocet wrote:
>
>>Whats the power dissipation of running dual 1.2 GHz Mp's? How about for
>>1.33Ghz regular athlons in non-SMP configs as comparison? (As well, how much
>>heat comes off typical power supplies to run these systems?)
>>
>>
>
> My TigerMP XP1600 duals take about 1.7amps at 125v.
>
> Forgot the formula to convert to btu's. Vaguely remember a factor
> of around 3.42. Not sure if that was for Watt's or VoltAmps. Assuming
> a VA is approximately a Watt, 212.5 * 3.42 = 727 btu per system.
>
> At least with that you can calculate your AC load for a rack. Say 40
> 1U's per rack, 29080 btu's. A ton of AC is 12000 btu's. So, 2.5 ton's
> of AC per rack. Course, you have 40x1.7 amps going into the rack for
> a power load of 68 Amps at 125v.
>
> Those that know the real numbers, please correct. A VA is really around
> .7 - .8 watts, so these calculations are high by maybe 20%. Figure
> the extra allows you to plug in the switches/peripherals/servers in addition
> to the nodes.
Power is measured in volt-amps (VA) and in watts. Both numbers are
important in preparing wiring, power conditioning, and cooling.
A system's VA rating is a function of the voltage and amperage of a
system. A system's watt rating is that system's VA rating multiplied by
its "Power Factor". You can convert among amps, volts, VA, power factor,
and watts using the following formulas:
VA = amps ? volts
VA = watts ? power factor
watts = VA ? power factor
amps = watts ? (volts ? power factor)
"Power factor" is a number between zero and one representing the portion
of the power drawn by a system that actually delivers energy to the
system. A system with a power factor of one (sometimes called "unity"
power factor) is making full use of the energy it draws. A system with a
power factor of 0.75 is effectively using only three-quarters of the
energy it draws. Typical PC power supplies are not power factor
corrected and they can range from 0.7 - 0.9. Power factor corrected
power supplies typically are rated at 0.99.
All the power consumed by a computer system must end up somewhere. For
ordinary air-cooled systems, the place it ends up is in the surrounding
air, in the form of heat. Every watt drawn by a system is eventually
dissipated as heat. This tends to raise the temperature of the air in
the room that houses the system. Some method is therefore needed to keep
the temperature within the required range. The typical method is to
install additional air conditioning capacity.
Air conditioner capacity is generally measured in Btu per hour (Btu/hr),
in tons, or in KiloJoules (KJoule).
A Btu, or British thermal unit, is the amount of energy needed to change
the temperature of one pound of water by one degree Fahrenheit.
One ton of air conditioning removes 12,000 Btu of heat energy per hour.
It is important to calculate the total thermal load of the systems you
will be installing and determine if the existing air conditioning system
can handle the additional load. If not, you must provide additional
cooling capacity.
The thermal load can be determined as follows:
Add up the wattages of all the items in the room.
Calculate Btu/hour by multiplying the total wattage by 3.4129.
Calculate tons of air conditioner load by multiplying wattage by
0.000285
1 KBtu/hr = 1000 Btu/hr
12,000 Btu/hr = 1 ton of air conditioning load
The calculations described here give results that represent the
equipment's maximum thermal output.
Even if a system approaches its maximum rated wattage or "worst-case"
thermal output occasionally, it is highly unlikely it will do so for
very long. Sizing the air conditioning system for "worst-case" thermal
output, however, helps to minimize system problems later.
Besides the computer equipment being added to a site, when calculating
required air conditioner capacity, be sure to take into account the heat
load from computer equipment already installed at the site, non-computer
equipment already installed or to be installed, and other factors, such
as solar gain, outside ambient air temperatures, and even the number of
people.
One thing I don't get into here is the long term reliability of the
system based on its temperature. You can also factor in what maximum
temperature you wish to keep the CPU die below to determine the systems
mean time between failure (MTBF). Keeping an Athlon die under 40 deg. C
will greatly increase its MTBF vs its specified maximum of 90 deg C.
Bari
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From math at velocet.ca Sat Jan 26 15:35:39 2002
From: math at velocet.ca (Velocet)
Date: Sat, 26 Jan 2002 15:35:39 -0500
Subject: Dual Athlon MP 1U units
In-Reply-To: <3C52690C.BB73A4D8@paralleldata.com>; from wsb@paralleldata.com on Sat, Jan 26, 2002 at 02:30:04AM -0600
References: <20020126020610.X59723@velocet.ca> <3C52690C.BB73A4D8@paralleldata.com>
Message-ID: <20020126153539.J59723@velocet.ca>
On Sat, Jan 26, 2002 at 02:30:04AM -0600, W Bauske's all...
> Velocet wrote:
> >
> > Whats the power dissipation of running dual 1.2 GHz Mp's? How about for
> > 1.33Ghz regular athlons in non-SMP configs as comparison? (As well, how much
> > heat comes off typical power supplies to run these systems?)
> >
>
> My TigerMP XP1600 duals take about 1.7amps at 125v.
>
> Forgot the formula to convert to btu's. Vaguely remember a factor
> of around 3.42. Not sure if that was for Watt's or VoltAmps. Assuming
> a VA is approximately a Watt, 212.5 * 3.42 = 727 btu per system.
>
> At least with that you can calculate your AC load for a rack. Say 40
> 1U's per rack, 29080 btu's. A ton of AC is 12000 btu's. So, 2.5 ton's
> of AC per rack. Course, you have 40x1.7 amps going into the rack for
> a power load of 68 Amps at 125v.
>
> Those that know the real numbers, please correct. A VA is really around
> .7 - .8 watts, so these calculations are high by maybe 20%. Figure
> the extra allows you to plug in the switches/peripherals/servers in addition
> to the nodes.
not to mention the power supplies themselves (or was that part of the
measurement of your 1.7A?)
2.5 tons of A/C is required, that sounds right - that keeps the volume
of air involved at neutral temperature (say 68F) - but how many CFM's
of air are required to move the heat off the processors (well, heatsinks)
fast enough to keep them comfortable (I dont know what comfortable is -
50-55C?) Is there a rule of thumb calc for that?
/kc
>
> Wes
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
--
Ken Chase, math at velocet.ca * Velocet Communications Inc. * Toronto, CANADA
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From Peter.Koenig at phys.uni-paderborn.de Wed Jan 16 05:36:50 2002
From: Peter.Koenig at phys.uni-paderborn.de (Peter H. Koenig)
Date: Wed, 16 Jan 2002 11:36:50 +0100
Subject: charmm scalability on 2.4 kernels
References: <3C3DFEAA.4A1AAC21@phys.upb.de>
Message-ID: <3C4557C2.CE938A8E@phys.upb.de>
Hello,
Bogdan Costescu wrote:
> That is actually what I have observed during the last 3 years of running
> different versions of kernels, MPI libraries and CHARMM. Running using
> only one transport (TCP or shared mem) is always better than mixing them,
> f.e (using LAM-6.5.6):
>
> CPUs nodes real time (min) transports
> 4 4 5.95 TCP
> 4 2 7.08 TCP+USYSV
>
> As you can see, the difference is quite significant.
Do you also have the numbers for 2 dual nodes using only TCP ?
Peter Koenig
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From Peter.Koenig at phys.upb.de Wed Jan 16 05:59:29 2002
From: Peter.Koenig at phys.upb.de (Peter H. Koenig)
Date: Wed, 16 Jan 2002 11:59:29 +0100
Subject: Queueing problem
References: <3C3E05FB.41196579@phys.upb.de> <3C455791.1AC1C9B5@phys.upb.de>
Message-ID: <3C455D11.881C1D27@phys.upb.de>
Hello,
recently we acquired new machines we want to integrate into our
computational workforce. We are currently using a DQS complex (A) of
alpha-workstations.
The new machines are integrated into two complexes:
(B) a beowulf-style cluster of Linux-PC including a headnode mainly for
parallel applications and development
(C) a pool of workstations for a (student-) computer lab, which can be
used for short calculations
We are also planning on investing in a further cluster (D) which may be
open for other groups.
Since the user base for each of the complexes (except for A and B) is
different we think that we might need to separate the complexes.
The jobs are to be submitted on the workstations (A) and routed to the
appropriate queue for execution. The submission and routing of jobs
should be possible with least involvement of the user. It should be
possible to restrict routing to other complexes to certain rules e.g.
routing to the computer lab should only be possible if a given
percentage of the queues there is idle (for allowing local submissions
of jobs, which should start without larger delays).
Can this be accomplished transparently to the user ? Can someone point
me to a queuing software which allows the specification of such rules
(even if this means quitting DQS)?
As far as I understand the documentation, DQS _does_ allow routing to
other complexes, but I have neither seen any information on how this can
be accomplished nor on whether rules for routing can be specified.
Peter Koenig
--
They that can give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety.
Benjamin Franklin, "Historical Review of Pennsylvania", 1759
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From mjk at sdsc.edu Wed Jan 16 12:01:05 2002
From: mjk at sdsc.edu (mjk at sdsc.edu)
Date: Wed, 16 Jan 2002 17:01:05 GMT
Subject: cluster frustrations
In-Reply-To:
(Peter.Lindgren@experian.com)
References:
Message-ID: <200201161701.g0GH15d06316@localhost.localdomain>
> I've tried to install Rocks a number of times. I got through (once)
> to where the compute nodes were up, but I haven't been able to get
> the latest version to work yet. In fairness, I haven't tried
> contacting their list even though they seem willing to help - I'm
> just too discouraged or shy I guess.
We're happy to help, just let us know what weirdness you're seeing.
For our software the bottom line is if RedHat supports it so do we.
If it doesn't work on RedHat it's fairly nasty hardware (but still
possible). We've also got a very friendly user base now with other
people answering questions before we do.
> A reference showing how many OTHER people can manage to install
> clusters: http://Beowulf-underground.org/success.html proving I must
> be the village idiot.
This certainly isn't true. If you cannot setup a cluster it's our
fault not yours. Clusters are still way too difficult to setup, run,
and upgrade - that's why we're still here. I think I speak for the
other groups here also. Try them all and figure out what works for
you.
--
Mason Katz mjk at sdsc.edu
Group Lead, Cluster Development 858-822-3651
Grid and Cluster Computing
San Diego Supercomputer Center
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From jimlux at jpl.nasa.gov Wed Jan 16 13:12:02 2002
From: jimlux at jpl.nasa.gov (Jim Lux)
Date: Wed, 16 Jan 2002 10:12:02 -0800
Subject: cluster frustrations
References:
Message-ID: <000901c19eb9$4c170540$02a8a8c0@office1>
Jim Phillips wrote:
> When you build a cluster, you are often taking consumer-class hardware and
> driving it much harder than a normal user. You also have zero error
> tolerance across the entire cluster. While in theory this should all be
> worked out in testing, cluster users are the only people likely to see
> errors in the real world. In our case, the problem was that a BIOS
> setting of "optimal" for some PCI bus parameters was leading to occasional
> data corruption between the CPU and the network card. Since we had nice
> network cards, capable of doing their own checksumming, the errors were
> never caught. The was never an issue on the old cluster, which used cheap
> "tulip" cards and made the CPU do the checksumming.
Indeed, with most consumer operating systems (e.g. Windows), disk and
network errors are silently retried, and with the relatively low disk and
network rates for most desktop applications, you'd never notice a, say, 1%
error rate, since the few milliseconds added for the retry probably isn't
significant in the several second response time expected by the user. I
doubt most users would notice the difference between it taking 1 second to
paint a web page and 1.01 seconds. It has to get really, really bad before
there is a user noticeable degradation, probably on the order of 20-30%
loss.
In the server case, and particularly in the computationally intensive
cluster computing area, where you are loading up the machines to the limit
(or, at least, you're trying to), and you've got users (i.e. system admins)
who are sensitive to small variations in performance, that 1% error rate
would be quite noticeable, particularly if it causes cascading problems
which amplify it. (Nobody would notice if I worked 1% slower at my desk, due
to slightly slower network or disk speed, since the uncertainty in my work
output (per unit time) is much much greater than that.)
This just goes to show that good performance monitoring tools that let you
see the raw error rates are important.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From sp at scali.com Sat Jan 19 08:21:14 2002
From: sp at scali.com (Steffen Persvold)
Date: Sat, 19 Jan 2002 14:21:14 +0100
Subject: Intel 860 PCI bandwidth problem
References: <5.1.0.14.2.20020118223108.067a4330@mail.harddata.com>
Message-ID: <3C4972C9.93933132@scali.com>
Maurice Hilarius wrote:
>
> In recent test on motherboards with Intel 860 chipsets we were seeing less
> than wonderful transfer rates using Wulfkit and Myrinet cards.
>
> After some explorations on kernel issues, and other hardware forums we were
> still not seeing any reason why this was happening.
>
> Recently Intel published updated chipset errata lists, and I scanned over them.
>
> One issue quickly popped out at me, and I now know what the problem seems
> to be:
> In the file found at:
> ftp://download.intel.com/design/chipsets/specupdt/29071501.pdf
>
> Intel lists errata for the 860 chipset.
> One of these states:
> "5. Sustained PCI Bandwidth Problem:
> During a memory read multiple operation, a PCI master will read more than
> one complete cache line from memory. In this situation, the MCH pre-fetches
> information from memory to provide optimal performance. However, the MCH
> cannot provide information to the PCI master fast enough. Therefore, the
> ICH2 terminates the read cycle early to free up the PCI bus for other PCI
> masters to claim.
>
> Implication: The early termination limits the maximum bandwidth to ~90 MB/s.
>
> Workaround: None
>
> Status: Intel has no fix planned for this erratum."
>
> This effectively eliminates the 860 chipset motherboards from contention
> for HPTC clustering use, IMHO.
>
> Any thoughts from anyone on this?
>
This only affects DMA operations which uses the PCI instruction "Read multiple". Normal Wulfkit
usage (ScaMPI) is with PIO and is therefore not affected by this issue. Instead, PIO performance on
these chipsets is limited by the fact that we cannot get more than 32byte burst (yet), giving you
approx 170MByte/sec.
Regards,
--
Steffen Persvold | Scalable Linux Systems | Try out the world's best
mailto:sp at scali.no | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.12.2 -
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >300MBytes/s and <4uS latency
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From sp at scali.com Sat Jan 19 08:33:58 2002
From: sp at scali.com (Steffen Persvold)
Date: Sat, 19 Jan 2002 14:33:58 +0100
Subject: Intel 860 PCI bandwidth problem
References: <5.1.0.14.2.20020118223108.067a4330@mail.harddata.com> <20020119005846.A3462@wumpus.foo>
Message-ID: <3C4975C6.49F164B3@scali.com>
Greg Lindahl wrote:
>
> On Fri, Jan 18, 2002 at 10:40:37PM -0700, Maurice Hilarius wrote:
>
> > Implication: The early termination limits the maximum bandwidth to ~90 MB/s.
>
> That note sounds like it's not talking about DMA operation. You did
> look at the Myrinet Experiences website
>
I think you misunderstood, "memory read multiple" is the PCI instruction used by most DMA engines
(SCSI, ethernet, SCI and I would guess Myrinet) when they read from RAM (source machine). On the
other side (destination), "memory write invalidate" is normally used.
However, this errateadoesn't limit the SCI DMA bandwith to ~90MB/s either, 210MB/s is the most I've
seen so far.
Regards,
--
Steffen Persvold | Scalable Linux Systems | Try out the world's best
mailto:sp at scali.no | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.12.2 -
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >300MBytes/s and <4uS latency
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From uccatvm at ucl.ac.uk Mon Jan 21 17:27:17 2002
From: uccatvm at ucl.ac.uk (uccatvm)
Date: Mon, 21 Jan 2002 22:27:17 +0000 (GMT)
Subject: Pentium4 cluster with Myrinet
Message-ID: <200201212227.g0LMRHS14654@socrates-a.ucl.ac.uk>
Hi all,
We are in the process of procuring a fairly large computer, and one
option we are looking at is a Beowulf-type Intel (or AMD) cluster of
around 50 nodes with Myrinet. For the types of applications we are
looking at, a single node Pentium 4 with RDRAM performs much better
than a PIII, probably largely due to the better memory bandwidth.
One of the vendors tells us that the current generation of P4 or Xeon
processors are less optimised for I/O, and are therefore less suitable
for a massively parallel machine, and we are recommended to go for a
Pentium III cluster instead. Do members of this list know about
serious issues of this kind with P4s?
We have also heard horror stories about dual Pentium III machines,
with up to 40% performance loss if the second CPU is also running a
calculation. Is this really so bad? I would expect Xeons to be less
prone to this effect, because the likely bottleneck is the memory
bandwidth. Is that so? Is there any advantage in using RDRAM or DDRAM
with a PIII?
How do Athlons with DDRAM compare (both on the I/O / communication issue
and general floating point performance)?
There are different groups involved, but one of the applications we
would like to run is NWChem, a quantum chemistry program. Computation
patterns are much like Gaussian, but the program is designed for
massively parallel computers, so it runs very efficiently in parallel
(given a fast interconnect). Like Gaussian, it is very memory demanding,
doing floating point calculations on large arrays, and also uses a fair
amount of (local) scratch disk I/O.
I would be grateful for any answers to the questions above.
See you,
Tanja
--
=====================================================================
Tanja van Mourik
Royal Society University Research Fellow
Chemistry Department
University College London phone: +44 (0)20-7679-4663
20 Gordon Street e-mail: work: T.vanMourik at ucl.ac.uk
London WC1H 0AJ, UK home: tanja at netcomuk.co.uk
http://www.chem.ucl.ac.uk/people/vanmourik/index.html
=====================================================================
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From prashs08 at eng.uab.edu Thu Jan 24 15:18:01 2002
From: prashs08 at eng.uab.edu (prashs08 at eng.uab.edu)
Date: Thu, 24 Jan 2002 14:18:01 -0600
Subject: Linux Cluster New User
Message-ID: <8E2D5B75E682D3118D06009027467E725A22C8@engem0.eng.uab.edu>
Hello everyone,
I'm new to this world, I'm trying to build a Linux cluster, can anyone
suggest the best approach, I got about 16 nodes to be setup in my system.
I'm running redhat 7.2 and mandrake 8.1.
Please send me links also where I can learn more.
Thank you in advance
Prant
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From uccatvm at ucl.ac.uk Fri Jan 25 14:08:16 2002
From: uccatvm at ucl.ac.uk (uccatvm)
Date: Fri, 25 Jan 2002 19:08:16 +0000 (GMT)
Subject: Pentium 4 cluster with Myrinet
Message-ID: <200201251908.g0PJ8GR04566@socrates-a.ucl.ac.uk>
Hi all,
We are in the process of procuring a fairly large computer, and one
option we are looking at is a Beowulf-type Intel (or AMD) cluster of
around 50 nodes with Myrinet. For the types of applications we are
looking at, a single node Pentium 4 with RDRAM performs much better
than a PIII, probably largely due to the better memory bandwidth.
One of the vendors tells us that the current generation of P4 or Xeon
processors are less optimised for I/O, and are therefore less suitable
for a massively parallel machine, and we are recommended to go for a
Pentium III cluster instead. Do members of this list know about
serious issues of this kind with P4s?
We have also heard horror stories about dual Pentium III machines,
with up to 40% performance loss if the second CPU is also running a
calculation. Is this really so bad? I would expect Xeons to be less
prone to this effect, because the likely bottleneck is the memory
bandwidth. Is that so? Is there any advantage in using RDRAM or DDRAM
with a PIII?
How do Athlons with DDRAM compare (both on the I/O / communication issue
and general floating point performance)?
There are different groups involved, but one of the applications we
would like to run is NWChem, a quantum chemistry program. Computation
patterns are much like Gaussian, but the program is designed for
massively parallel computers, so it runs very efficiently in parallel
(given a fast interconnect). Like Gaussian, it is very memory demanding,
doing floating point calculations on large arrays, and also uses a fair
amount of (local) scratch disk I/O.
I would be grateful for any answers to the questions above.
See you,
Tanja
--
=====================================================================
Tanja van Mourik
Royal Society University Research Fellow
Chemistry Department
University College London phone: +44 (0)20-7679-4663
20 Gordon Street e-mail: work: T.vanMourik at ucl.ac.uk
London WC1H 0AJ, UK home: tanja at netcomuk.co.uk
http://www.chem.ucl.ac.uk/people/vanmourik/index.html
=====================================================================
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rottler at emerald.ucsc.edu Fri Jan 25 18:28:43 2002
From: rottler at emerald.ucsc.edu (Lee Rottler)
Date: Fri, 25 Jan 2002 15:28:43 -0800
Subject: Dual Athlon MP 1U units
References: <20020125220200.GA2553@zarq.dhs.org>
Message-ID: <3C51EA2B.DE923168@es.ucsc.edu>
We have a 132 node dual Athlon cluster from Racksaver with high
bandwidth SCI interconnect from Dolphin and Wulfkit software
from Scali. Component wise it is very similar to Robert's cluster
included below.
Our Configuration
132 nodes
Dual Athlon 1.4 (1500)
1024 MB PC2100 (Corsair)
61.5 Gb IDE HD
Tyan dual mobo/w dual NICS
1 frontend server
1 data i/o node (cluster NFS server)
Both of these are connected to the cluster via an Intel
gigabit card. Initially we had 3Com but had trouble with the
supplied linux driver. This may be a red herring since
yesterday we discovered that all the memory slots had dual
bank DDR sticks leading to all kinds of stability problems
independent of the Gb NIC cards. Tyan boards want dual
bank memory sticks only in slots #1 and #2. The #3 and #4
slots must have single bank DDR sticks. Sigh.
Aside from teething problems I am extremely happy with this
machine. Linpack clocked in at 301.8 Gflops running on all
264 processors. One of the user mpi codes was run on 128
processors on this machine and the same # on Seaborg and we
were 31.2% faster. Once we have everything configured the
way I want it I will run the full Pallas benchmark and
report back.
We are still in the configuration and testing phase but with
regards to heat problems in our 1Us I have not seen any
problems. We purchased a 32 node/single cpu Athlon cluster
from Racksaver since last March and although there were
problems due to the MSI mobo there was not a single heat
related failure and do not expect that to be a problem with
the dual 1Us. RackSaver has taken alot of care in optimizing
the air flow through their boxes to get optimum cooling. In
my experience heat is a non issue with RackSaver 1Us. This
is only my experience (YMMV).
Cheers,
Lee
R C wrote:
>
> On Fri, Jan 25, 2002 at 11:54:37AM -0600, Steven Timm wrote:
> >
> > I am just wondering how many people have managed to get a
> > cluster of dual Athlon-MP nodes up and running. If so,
> > which motherboards and chipsets are you using, and has anyone
> > safely done this in a 1U form factor?
>
> We're in the process of testing our 1U dual athalon nodes from Racksaver.
>
> Configuration:
> 16 nodes
> Dual Athalon 1.53 (1800+)
> 512 MB PC2100 Reg/ECC (Crucial / Corsair) (We ordered Crucial ram modules
> before the price hike)
> 20 GB IDE HDs (IBM)
> S2462NG (non-scsi version)
>
> The units themselves are solid, and hefty (roughly 30 lbs). They do draw
> quite a bit of power (we are waiting for a 2nd 30 amp drop). No
> problems with them so far (24 hour burnin, room temperature approx 75-80
> deg F, above recommended temperature). They are noisy, as one would
> expect from 1U units with these processors. Don't put them in an office.
> CPU temperatures after 24 hour runs were in the 49-55 C range.
>
> We haven't gotten our software on all the units yet, but they seem
> stable. Once our school actually cut the PO, the order went through
> quickly.
>
> Robert Cicconetti
>
> PS. Has anyone gotten Wake-on-Lan working on this motherboard?
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
--
* Lee Rottler rottler at es.ucsc.edu *
* System Administrator/Scientific Programmer Office: (831) 459-5059 *
* High Performance Computing FAX: (831) 459-3074 *
* IGPP - Earth Sciences *
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From ljain at angstrom.com Sat Jan 26 13:30:59 2002
From: ljain at angstrom.com (Lalit Jain)
Date: Sat, 26 Jan 2002 12:30:59 -0600
Subject: Dual Athlon MP 1U units
In-Reply-To: <5.0.2.1.0.20020126084920.02650240@maemail.ucdavis.edu>
Message-ID: <009601c1a697$a031b810$b201a8c0@LJAIN>
Hi,
I do know for a fact that 300W PFC power supplies will work for dual AMD
systems with up to 3 GB of memory and dual IDE HDDs. The caveat is that
300W will not drive AGP cards. Assume an extra 110 W for AGP,
especially for high performance cards such as ATI FireGL2. BTW --
Certain high performance AGP cards will not properly work with dual AMD
systems -- even when using the mem-nopentium option in LILO...
PFC power supplies are standard and are required for certain
international deployments.
Lalit Jain
Angstrom Microsystems, Inc.
27 Drydock Ave
Boston, MA 02210
617-695-0137 ext 11
-----Original Message-----
From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org] On
Behalf Of Ben Ransom
Sent: Saturday, January 26, 2002 11:03 AM
To: beowulf at beowulf.org
Subject: Re: Dual Athlon MP 1U units
Are PFC (power factor correction) power supplies standard, or something
special wrt providing only required power and therefore less heat? A
vendor told me that a dual 1900 Athlon on Tyan 2466 mboard can be run in
1U
if done with a 300w PFC power supply. Do they know some great secret,
or
are they stretching physics?
PS: AMD shows one vendor's case approved with the Athlon MP running at
all
available clock speeds, i.e. that would include 1900+
-Ben Ransom
At 02:30 AM 1/26/2002 -0600, you wrote:
>Velocet wrote:
> >
> > Whats the power dissipation of running dual 1.2 GHz Mp's? How about
for
> > 1.33Ghz regular athlons in non-SMP configs as comparison? (As well,
how
> much
> > heat comes off typical power supplies to run these systems?)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From kandas2 at alum.rpi.edu Wed Jan 2 20:50:38 2002
From: kandas2 at alum.rpi.edu (Senthil Kandasamy)
Date: Wed, 02 Jan 2002 20:50:38 -0500
Subject: Skyld Beowulf/ Diskless nodes /Installation trouble
Message-ID: <5.1.0.14.0.20020102205009.00aa1dc0@mail.alum.rpi.edu>
Hi Guys,
Hopefully someone can help me out.
First of all, I am a Chemical Engineer/Biophysicist who is fairly familiar
with linux.
I am trying to install/fix beowulf on a cluster recently purchased in our
research group.
This cluster was bought before I joined the group and scyld beowulf had
been installed on it (improperly).
Since no one else in our group was interested in parallel computing, no
body had noticed the fact that though one could send computational jobs to
the individual nodes, it could not handle parallel jobs on multiple nodes (
could not connect to host..is the error I get when I mpirun)
We have 1 master +15 diskless nodes, all dual processors.
The Scyld Beowulf (without the support, i.e. the $2 version) has been
installed on it.
However, I suspect that the NFS mounting of the individual nodes has not
been done correctly.
Since I do not have any documentation (could not find any on the
installation disk) on how to setup diskless nodes, I am kind of helpless.
The resources on the net and newsgroups have not been very helpful.
I tried to reinstall the skyld/redhat cd on the cluster, but the setup
process never really seems to be concerned about NFS mounting.
Once the set up is finished, the nodes are up and running and can handle
individual jobs using bpsh.
But I can never connect to the nodes when I try to run a parallel job using
mpirun.
Is there any definitive (and upto date) documentation/howto on how to
install a diskless beowulf cluster?
Any help would be greatly appreciated. It just kills me to ~30 GFlops just
sitting there unutilized while I try to find computer time on other
supercomputers.
Thanks.
Senthil
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From joachim at lfbs.RWTH-Aachen.DE Mon Jan 14 11:12:14 2002
From: joachim at lfbs.RWTH-Aachen.DE (Joachim Worringen)
Date: Mon, 14 Jan 2002 17:12:14 +0100
Subject: Beowulf with Gigabit Ethernet
References: <200201120841.g0C8fxR03408@blueraja.scyld.com>
Message-ID: <3C43035E.56ABF70F@lfbs.rwth-aachen.de>
> On Fri, 11 Jan 2002 alex at compusys.co.uk wrote:
>
> > I think that in general point to point performance information has a
> > limited value, whatever vendors might quote on their web-page. SCI might
> > be performing pretty well if it comes down to just latency and bandwidth
> > between two machines, but it is a ring topology. If you have
> > more machines on a ring they will share that same bandwidth.
You might want to get a little bit more informed on SCI topologies and
their scalability characteristics. One good paper I can recommend is
from the SCI Europe 98 conference, available at
http://www.scali.com/whitepaper/scieurope98/scale_paper.pdf . It gives
some general calculations of all-to-all communication scalability with
SCI-torus-topologies. The numbers are somewhat outdated, but the
principle is still correct.
Your statement is not well founded because SCI is not a ring-topology,
but a point-to-point topology. It's similar to somebody saying
"(Centralized) Switches do not scale because all the traffic needs to to
through one box". It depends on the switch, I'd rather say. And on some
other things, like packet format, routing method etc.
Regarding the limited value of point-to-point performance, you are
right, but this applies to all networks.
Joachim
--
| _ RWTH| Joachim Worringen
|_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen
| |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim
|_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From Peter.Koenig at phys.uni-paderborn.de Wed Jan 16 05:36:01 2002
From: Peter.Koenig at phys.uni-paderborn.de (Peter H. Koenig)
Date: Wed, 16 Jan 2002 11:36:01 +0100
Subject: Queueing problem
References: <3C3E05FB.41196579@phys.upb.de>
Message-ID: <3C455791.1AC1C9B5@phys.upb.de>
Hello,
recently we acquired new machines we want to integrate into our
computational workforce. We are currently using a DQS complex (A) of
alpha-workstations.
The new machines are integrated into two complexes:
(B) a beowulf-style cluster of Linux-PC including a headnode mainly for
parallel applications and development
(C) a pool of workstations for a (student-) computer lab, which can be
used for short calculations
We are also planning on investing in a further cluster (D) which may be
open for other groups.
Since the user base for each of the complexes (except for A and B) is
different we think that we might need to separate the complexes.
The jobs are to be submitted on the workstations (A) and routed to the
appropriate queue for execution. The submission and routing of jobs
should be possible with least involvement of the user. It should be
possible to restrict routing to other complexes to certain rules e.g.
routing to the computer lab should only be possible if a given
percentage of the queues there is idle (for allowing local submissions
of jobs, which should start without larger delays).
Can this be accomplished transparently to the user ? Can someone point
me to a queuing software which allows the specification of such rules
(even if this means quitting DQS)?
As far as I understand the documentation, DQS _does_ allow routing to
other complexes, but I have neither seen any information on how this can
be accomplished nor on whether rules for routing can be specified.
Peter Koenig
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From wsb at paralleldata.com Sat Jan 26 16:35:15 2002
From: wsb at paralleldata.com (W Bauske)
Date: Sat, 26 Jan 2002 15:35:15 -0600
Subject: Dual Athlon MP 1U units
References: <20020126020610.X59723@velocet.ca> <3C52690C.BB73A4D8@paralleldata.com> <20020126153539.J59723@velocet.ca>
Message-ID: <3C532113.DCFF5C71@paralleldata.com>
Velocet wrote:
>
> On Sat, Jan 26, 2002 at 02:30:04AM -0600, W Bauske's all...
> > Velocet wrote:
> > >
> > > Whats the power dissipation of running dual 1.2 GHz Mp's? How about for
> > > 1.33Ghz regular athlons in non-SMP configs as comparison? (As well, how much
> > > heat comes off typical power supplies to run these systems?)
> > >
> >
> > My TigerMP XP1600 duals take about 1.7amps at 125v.
> >
> > Forgot the formula to convert to btu's. Vaguely remember a factor
> > of around 3.42. Not sure if that was for Watt's or VoltAmps. Assuming
> > a VA is approximately a Watt, 212.5 * 3.42 = 727 btu per system.
> >
> > At least with that you can calculate your AC load for a rack. Say 40
> > 1U's per rack, 29080 btu's. A ton of AC is 12000 btu's. So, 2.5 ton's
> > of AC per rack. Course, you have 40x1.7 amps going into the rack for
> > a power load of 68 Amps at 125v.
> >
> > Those that know the real numbers, please correct. A VA is really around
> > .7 - .8 watts, so these calculations are high by maybe 20%. Figure
> > the extra allows you to plug in the switches/peripherals/servers in addition
> > to the nodes.
>
> not to mention the power supplies themselves (or was that part of the
> measurement of your 1.7A?)
>
The 1.7 amps is at the plug-in. Everything is included.
It was measured while 100% busy on several different applications.
Note while sitting idle, the Athlon system still took the same amount of
power so I suspect it doesn't idle down as was mentioned in another post.
My P4's do idle down when not busy. They take from .85 - 1.3 amps depending
on speed of the processor when busy and about half the max when idling.
You can go to your local HW store and by a meter for this. The only trick
is you have to put the sensor around only one of the wires, not both or
it can't read the current. I split open a power cable (carefully) so I
could separate the wires and measure only one. Be careful not to nick the
wires inside if you try it. Don't want anyone getting shocked. You could
have an electrician make a special wire for you too if you want to be
safe. If you nick it, try again on a new power cord. They're cheap...
> 2.5 tons of A/C is required, that sounds right - that keeps the volume
> of air involved at neutral temperature (say 68F) - but how many CFM's
> of air are required to move the heat off the processors (well, heatsinks)
> fast enough to keep them comfortable (I dont know what comfortable is -
> 50-55C?) Is there a rule of thumb calc for that?
>
That's related to the heatsink's thermal transfer efficiency. It'd be nice
if there were a simple gauge to determine CFM of a chassis but I don't
know of one. I've seen transfer rates on several HS manufacturer pages so
you might try look up your HS and see what it says. It was something like
degrees per watt of power I think.
Bari might know more about it since he works with dense packaging.
Wes
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From DMcBride at ctwenterprises.com Sat Jan 26 17:41:04 2002
From: DMcBride at ctwenterprises.com (David McBride)
Date: Sat, 26 Jan 2002 16:41:04 -0600
Subject: Channel Bonding
Message-ID: <9D349147A9D3D411B593009027B70B0303397E@CTW01>
I am a total newbie to Beowulf and was wondering if the channel bonding
could be used to join the bandwidth from two cable modems into a single
bandwidth to get some killer speed.
Thanks,
David
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rgb at phy.duke.edu Sat Jan 26 19:06:51 2002
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sat, 26 Jan 2002 19:06:51 -0500 (EST)
Subject: Dual Athlon MP 1U units
In-Reply-To:
Message-ID:
On Sat, 26 Jan 2002, Joel Jaeggli wrote:
> evaluate their claims based on the design of the power supply. The crucial
> thing to bear in mind with smp athlons is can the power-supply provide the
> current rise need when the cpu's switch from idle to going full-bore.
Or in order to turn on an ATX-style motherboard. A system can easily
place the greatest demand on a power supply in the first second it
starts up. If a whole lot of systems are trying to start up at once, it
can also be doing it with a wildly fluctuating power supply as an
inductive surge hits the main line.
Remember also that the wattage of a power supply refers to total power
deliverable on all lines, and it has several lines (to e.g.
peripherals, motherboard, switching). Peak power deliverable to the
particular lines servicing the motherboard may be less. AMD requires
"certified" power supplies, suggesting that there are power supplies
that nominally have the rated total power capacity that don't actually
provide enough power, perhaps fast enough, somewhere they think they
need it.
Be cautious. Prototyping is a good idea. So is getting a vendor to
accept all the risk if they try to sell you something nominally subspec.
Just remember that you ALWAYS risk at least your time...
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rgb at phy.duke.edu Sat Jan 26 18:20:10 2002
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sat, 26 Jan 2002 18:20:10 -0500 (EST)
Subject: Dual Athlon MP 1U units
In-Reply-To: <3C52690C.BB73A4D8@paralleldata.com>
Message-ID:
On Sat, 26 Jan 2002, W Bauske wrote:
> Velocet wrote:
> >
> > Whats the power dissipation of running dual 1.2 GHz Mp's? How about for
> > 1.33Ghz regular athlons in non-SMP configs as comparison? (As well, how much
> > heat comes off typical power supplies to run these systems?)
> >
>
> My TigerMP XP1600 duals take about 1.7amps at 125v.
>
> Forgot the formula to convert to btu's. Vaguely remember a factor
> of around 3.42. Not sure if that was for Watt's or VoltAmps. Assuming
> a VA is approximately a Watt, 212.5 * 3.42 = 727 btu per system.
>
> At least with that you can calculate your AC load for a rack. Say 40
> 1U's per rack, 29080 btu's. A ton of AC is 12000 btu's. So, 2.5 ton's
> of AC per rack. Course, you have 40x1.7 amps going into the rack for
> a power load of 68 Amps at 125v.
A ton of AC removes almost exactly 3500 Watts continuously. That's your
factor of 3.42 btu/watt. With this number you can work with nice SI
Watts and forget archaic old BTU's, although frankly the "ton" unit is
even worse...;-)
Power has been discussed on the list before a few times. It depends on
peak voltage, peak current, and relative phase (power factor). If peak
voltage is 120V, peak current is 1.7A, and they are in phase, peak power
is 204W but average power is only 1/\sqrt{2} = 0.707 of this or around
144W. I believe that somebody pointed out once that the power factor
for most hardware is close to 1 so phase differences probably don't
reduce this a whole lot, but I haven't measured itself and don't know.
At 40 1U's/rack, this is about 5800W/rack, or at >>least<< 1.6 tons of
AC per rack to remove the heat. However, the heat removal capability of
AC is itself a bit amorphous. The efficiency depends on things like the
ambient air temperature that it is trying to cool and the ambient
temperature of the environment where it is (eventually) trying to dump
the heat. To be safe you need to keep the ambient air entering the rack
quite cool, since your rack is basically a 6 KW space heater. You need
to be especially careful with airflow, since the nodes in the middle
have basically no way of rejecting heat EXCEPT to the airflow. Then, as
Wes noted, there are the other peripherals that might be in the rack --
switches, surge protectors, UPS, etc. -- which also draw current. 2-2.5
tons of AC is probably better.
One useful way to imagine the rack is as a stack of metal boxes
containing two 75W light bulbes each, all turned on inside the boxes,
with the boxes so tightly closed that hardly any light escapes. If
>>anything<< interrupts the cooling air, those boxes will get mighty hot
-- hot enough to short things out and maybe start a fire -- very
quickly.
That's basically why I worry about 1U duals. In principle they'll work
-- keep the outside air cool, pull as much cold air through the cases as
you can possibly arrange, keep the air clean (so the fans don't clog),
monitor thermal sensors and kill if they start getting too hot. You can
see, though, that they are a design that taunts Murphy's Law. Not too
robust. A little thing like an AC blower motor that blows a circuit
breaker at 3 am can reduce your $65K rack of hardware to a pile of junk
in the thirty minutes it takes you to find out and do something about
it, if you don't have fully automated (and functioning) shutdown setup.
Not that a stack of 2U duals is MUCH better. It's still hot -- we have
1800 XP's and probably will have more like 150-160W/box. If we only put
12 per rack, though, we can leave gaps between the cases and get some
cooling from the surfaces of the cases and in any event the cases have
much larger air volumes, more room for air to flow through, and more
room for bigger fans. With luck we'll have SOME time to react (or for
our automated sentries to react) if the room AC fails and the power
doesn't.
But yes, we'll need 3 racks for what you put into one. There is a
fundamental tradeoff. Space versus power density. The smaller the
volume into which you concentrate your systems, the more power per unit
volume you burn (and must get rid of) and the more careful your
engineering must be to do it robustly. Careful engineering in turn
costs money and risk, which is traded off against the nontrivial cost of
space into which to put racks. Our new space is pretty expensive (we
have 75 KW of power capacity matched to 75 KW of chiller capacity -- the
AC blower/heat exchanger unit is the size of my entire office and eats
1/4 of the room). At the moment we're not crowded, so we're going for a
relatively low density. In three years, we may need to start repacking
or replacing with more tightly packed nodes as we grow, but in the
meantime we'll enjoy slightly reduced risk and greater robustness of
design.
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rgb at phy.duke.edu Sat Jan 26 19:26:07 2002
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sat, 26 Jan 2002 19:26:07 -0500 (EST)
Subject: Dual Athlon MP 1U units
In-Reply-To: <3C52ECE3.7030607@onelabs.com>
Message-ID:
On Sat, 26 Jan 2002, Bari Ari wrote:
> VA = amps ? volts
> VA = watts ? power factor
> watts = VA ? power factor
> amps = watts ? (volts ? power factor)
>
> "Power factor" is a number between zero and one representing the portion
> of the power drawn by a system that actually delivers energy to the
> system. A system with a power factor of one (sometimes called "unity"
> power factor) is making full use of the energy it draws. A system with a
> power factor of 0.75 is effectively using only three-quarters of the
> energy it draws. Typical PC power supplies are not power factor
Dear Bari,
I liked all of your description except that of the power factor. As I
understand it, the power factor is the cosine of the phase difference
between the line voltage and the drawn current. When they are in phase,
the (rms) VA = the actual power consumed, in watts, as it is in a light
bulb. When they are \pi/2 out of phase (as they are for e.g. a perfect
capacitor or inductor hooked across an AC power supply) the VA can be
quite high (depending on the impedance of the circuit element) but the
power factor can be zero! No power is actually delivered to the
circuit, on average -- energy is stored in the capacitor and then given
back to the line. It does not actually appear as heat in the room.
Real loads are generally somewhere in between. Loads that are "mostly
resistive" have the highest power factors and only the resistive "part"
appears as heat; loads that are capacitive or inductive can have current
that lags or leads the voltage and draw less power than one might think
looking at the peak voltage and current.
The peak voltage and current are still important, of course. The power
delivery lines do have to be able to handle the peak current as they
burn energy as (1/2) I^2 R_line for that peak current.
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From rgb at phy.duke.edu Sat Jan 26 18:21:42 2002
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sat, 26 Jan 2002 18:21:42 -0500 (EST)
Subject: Please forward to beowulf.org (fwd)
Message-ID:
Forwarding this for Lee Rotller, see below. He's had trouble posting in
his own name. Probably hitting an anti-spam block by accident.
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
---------- Forwarded message ----------
Date: Sat, 26 Jan 2002 01:16:10 -0800
From: Lee Rotller
To: rgb at phy.duke.edu
Subject: Please forward to beowulf.org
Hi Robert,
I tried unsubscribing then resubscribing with no luck. Here is the
message I wanted to post. Murphy's Law says that as soon as you
post it the other three tries will make it to the list. :-)
------------------------------------------------------------------
We have a 132 node dual Athlon cluster from Racksaver with high
bandwidth SCI interconnect from Dolphin and Wulfkit software
from Scali. Component wise it is very similar to Robert's cluster
included below.
Our Configuration
132 nodes
Dual Athlon 1.4 (1500)
1024 MB PC2100 (Corsair)
61.5 Gb IDE HD
Tyan dual mobo/w dual NICS
1 frontend server
1 data i/o node (cluster NFS server)
Both of these are connected to the cluster via an Intel
gigabit card. Initially we had 3Com but had trouble with the
supplied linux driver. This may be a red herring since
yesterday we discovered that all the memory slots had dual
bank DDR sticks leading to all kinds of stability problems
independent of the Gb NIC cards. Tyan boards want dual
bank memory sticks only in slots #1 and #2. The #3 and #4
slots must have single bank DDR sticks. Sigh.
Aside from teething problems I am extremely happy with this
machine. Linpack clocked in at 301.8 Gflops running on all
264 processors. One of the user mpi codes was run on 128
processors on this machine and the same # on Seaborg and we
were 31.2% faster. Once we have everything configured the
way I want it I will run the full Pallas benchmark and
report back.
We are still in the configuration and testing phase but with
regards to heat problems in our 1Us I have not seen any
problems. We purchased a 32 node/single cpu Athlon cluster
from Racksaver since last March and although there were
problems due to the MSI mobo there was not a single heat
related failure and do not expect that to be a problem with
the dual 1Us. RackSaver has taken alot of care in optimizing
the air flow through their boxes to get optimum cooling. In
my experience heat is a non issue with RackSaver 1Us. This
is only my experience (YMMV).
Cheers,
Lee
R C wrote:
>
> On Fri, Jan 25, 2002 at 11:54:37AM -0600, Steven Timm wrote:
> >
> > I am just wondering how many people have managed to get a
> > cluster of dual Athlon-MP nodes up and running. If so,
> > which motherboards and chipsets are you using, and has anyone
> > safely done this in a 1U form factor?
>
> We're in the process of testing our 1U dual athalon nodes from Racksaver.
>
> Configuration:
> 16 nodes
> Dual Athalon 1.53 (1800+)
> 512 MB PC2100 Reg/ECC (Crucial / Corsair) (We ordered Crucial ram modules
> before the price hike)
> 20 GB IDE HDs (IBM)
> S2462NG (non-scsi version)
>
> The units themselves are solid, and hefty (roughly 30 lbs). They do draw
> quite a bit of power (we are waiting for a 2nd 30 amp drop). No
> problems with them so far (24 hour burnin, room temperature approx 75-80
> deg F, above recommended temperature). They are noisy, as one would
> expect from 1U units with these processors. Don't put them in an office.
> CPU temperatures after 24 hour runs were in the 49-55 C range.
>
> We haven't gotten our software on all the units yet, but they seem
> stable. Once our school actually cut the PO, the order went through
> quickly.
>
> Robert Cicconetti
>
> PS. Has anyone gotten Wake-on-Lan working on this motherboard?
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beow
ulf.org/mailman/listinfo/beowulf
--
* Lee Rottler rottler at es.ucsc.edu *
* System Administrator/Scientific Programmer Office: (831) 459-5059 *
* High Performance Computing FAX: (831) 459-3074 *
* IGPP - Earth Sciences *
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From djholm at fnal.gov Sat Jan 26 20:06:11 2002
From: djholm at fnal.gov (Don Holmgren)
Date: Sat, 26 Jan 2002 19:06:11 -0600
Subject: Pentium 4 cluster with Myrinet
In-Reply-To: <200201251908.g0PJ8GR04566@socrates-a.ucl.ac.uk>
Message-ID:
The current P4 motherboards with RDRAM are built with either Intel's
i850 (P4) or i860 (dual Xeon) chipsets. On the i860-based boards, the
P64H bridge used to implement the 64/66 PCI bus doesn't do a good
job; AFAIK this problem existed also with the i840 chipset. DMA
performance on this bus is far below what you'd expect. Using Myrinet's
gm_debug command, which outputs the results of some DMA timings which
occur during driver initialization, the best I've seen is about 225
MB/s for bus_read, and 315 MB/s for bus_write. On many dual Xeon
motherboards, the P64H register is often configured by the BIOS to a
setting (Soft_DT_Timer) appropriate for 4 64/66 slots. With that
setting, the rates are even worse: 145 for bus_read, 315 for
bus_write. If your dual Xeon board has only two 64/66 slots, you can
set the Soft_DT_Timer value appropriately to get the better DMA
rate. In gm_1.5, there's a #define (GM_INTEL_860) you can set in
drivers/linux/gm/gm_arch.c which will set the Soft_DT_Timer value. Or,
you can set it via setpci (look for device 8086:1360, make sure that
offset 0x50 has a value of 0x04).
On i850-based P4 boards, you'll only have a 32/33 bus. There's a
problem with those slots as well, with DMA rates down in the 90 MB/sec
range.
In practive, the i860 PCI problem limits Myrinet bandwidth to roughly
165 MB/sec (measured with gm_allsize) with the correct P64H setting, and
to roughly 125 MB/sec with the wrong setting. The best Myrinet
performance, at least according to www.myri.com pages, is something
close to 250 MB/sec on PIII motherboards based on one of the ServerWorks
chipsets.
So, yes, some (but not all!) PIII motherboards have better 64/66 PCI
buses than the current crop of Xeon motherboards. New Xeon motherboards
will be out soon (promised this quarter) based on a new ServerWorks
chipset. If past performance is a good indicator, these should have
better PCI performance. However, they also support interleaved DDR, not
RDRAM, so memory bandwidth may suffer. I don't know about i845-based P4
motherboards; perhaps Greg Lindahl's web page with gm_debug results
includes one of these systems (also DDR, not RDRAM).
Whether or not you'll suffer with the poor 64/66 bus on dual Xeons
depends on your code, of course. For our codes (lattice QCD), there's
such a huge benefit from the memory bandwidth boost from the combination
of RDRAM and the 400 MHz FSB of P4s/Xeons that we're willing to put up
with the PCI bus woes (I'm in the process of ordering 48 duals to expand
our cluster). There's much less advantage of RDRAM on a PIII board
because PIII's have only a 100 (or 133) MHz FSB. STREAMS memory
bandwidth ("Copy") numbers on PIII RDRAM boards are about 800 MB/sec
IIRC; on our P4 and Xeon boards the STREAMS number is about 1400 MB/sec
and better than 2000 MB/sec if one hand codes with SSE or uses one of
the SSE-capable compilers.
I have a small cluster of dual Athlons based on the 760MP chipset; these
have 64/33 PCI buses. gm_debug numbers on these are 240 for bus_read,
227 for bus_write, and gm_allsize has a large message asymptote of about
170 MB/sec. I've not had a chance to test a 760MPX-based motherboard,
but I believe there are good reports (or at least rumors) of the 64/66
PCI performance. For our lattice qcd code, these systems fall far
behind our comparable P4 and Xeon systems. YMMV may vary, of course,
depending on your code; we had to tweak our codes to take advantage of
(ok, defend against) the 64-byte cache line size on the P4 (128 bytes
cache lines for software prefetch).
Don Holmgren
Fermilab
On Fri, 25 Jan 2002, uccatvm wrote:
> Hi all,
>
> We are in the process of procuring a fairly large computer, and one
> option we are looking at is a Beowulf-type Intel (or AMD) cluster of
> around 50 nodes with Myrinet. For the types of applications we are
> looking at, a single node Pentium 4 with RDRAM performs much better
> than a PIII, probably largely due to the better memory bandwidth.
> One of the vendors tells us that the current generation of P4 or Xeon
> processors are less optimised for I/O, and are therefore less suitable
> for a massively parallel machine, and we are recommended to go for a
> Pentium III cluster instead. Do members of this list know about
> serious issues of this kind with P4s?
>
> We have also heard horror stories about dual Pentium III machines,
> with up to 40% performance loss if the second CPU is also running a
> calculation. Is this really so bad? I would expect Xeons to be less
> prone to this effect, because the likely bottleneck is the memory
> bandwidth. Is that so? Is there any advantage in using RDRAM or DDRAM
> with a PIII?
>
> How do Athlons with DDRAM compare (both on the I/O / communication issue
> and general floating point performance)?
>
> There are different groups involved, but one of the applications we
> would like to run is NWChem, a quantum chemistry program. Computation
> patterns are much like Gaussian, but the program is designed for
> massively parallel computers, so it runs very efficiently in parallel
> (given a fast interconnect). Like Gaussian, it is very memory demanding,
> doing floating point calculations on large arrays, and also uses a fair
> amount of (local) scratch disk I/O.
>
> I would be grateful for any answers to the questions above.
>
> See you,
>
> Tanja
> --
> =====================================================================
> Tanja van Mourik
> Royal Society University Research Fellow
> Chemistry Department
> University College London phone: +44 (0)20-7679-4663
> 20 Gordon Street e-mail: work: T.vanMourik at ucl.ac.uk
> London WC1H 0AJ, UK home: tanja at netcomuk.co.uk
>
> http://www.chem.ucl.ac.uk/people/vanmourik/index.html
> =====================================================================
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From bari at onelabs.com Sat Jan 26 21:58:11 2002
From: bari at onelabs.com (Bari Ari)
Date: Sat, 26 Jan 2002 20:58:11 -0600
Subject: Dual Athlon MP 1U units
References:
Message-ID: <3C536CC3.1070904@onelabs.com>
Robert G. Brown wrote:
> On Sat, 26 Jan 2002, Bari Ari wrote:
>
>
>>VA = amps ? volts
>>VA = watts ? power factor
>>watts = VA ? power factor
>>amps = watts ? (volts ? power factor)
>>
>>"Power factor" is a number between zero and one representing the portion
>>of the power drawn by a system that actually delivers energy to the
>>system. A system with a power factor of one (sometimes called "unity"
>>power factor) is making full use of the energy it draws. A system with a
>>power factor of 0.75 is effectively using only three-quarters of the
>>energy it draws. Typical PC power supplies are not power factor
>>
>
> Dear Bari,
>
> I liked all of your description except that of the power factor. As I
> understand it, the power factor is the cosine of the phase difference
> between the line voltage and the drawn current. When they are in phase,
> the (rms) VA = the actual power consumed, in watts, as it is in a light
> bulb. When they are \pi/2 out of phase (as they are for e.g. a perfect
> capacitor or inductor hooked across an AC power supply) the VA can be
> quite high (depending on the impedance of the circuit element) but the
> power factor can be zero! No power is actually delivered to the
> circuit, on average -- energy is stored in the capacitor and then given
> back to the line. It does not actually appear as heat in the room.
>
> Real loads are generally somewhere in between. Loads that are "mostly
> resistive" have the highest power factors and only the resistive "part"
> appears as heat; loads that are capacitive or inductive can have current
> that lags or leads the voltage and draw less power than one might think
> looking at the peak voltage and current.
>
> The peak voltage and current are still important, of course. The power
> delivery lines do have to be able to handle the peak current as they
> burn energy as (1/2) I^2 R_line for that peak current.
>
> rgb
>
>
True. That's is why Watts are used rather than Volt Amperes to determine
the amount of heat generated by a system.
A power factor corrected power supply will match the capacitive loads of
the semiconductors on the motherboard to raise the power factor closer
to 1. Resistive loads account for very little on a well designed
motherboard.
Motors are inductive loads that can be corrected with capacitance across
the load. Power companies may place large capacitors across the power
lines outside large factories that have high inductive motor loads to
correct the power factor.
Bari
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From bari at onelabs.com Sat Jan 26 22:55:02 2002
From: bari at onelabs.com (Bari Ari)
Date: Sat, 26 Jan 2002 21:55:02 -0600
Subject: Dual Athlon MP 1U units
References: <3C536CC3.1070904@onelabs.com>
Message-ID: <3C537A16.60801@onelabs.com>
Bari Ari wrote:
> A power factor corrected power supply will match the capacitive loads of
> the semiconductors on the motherboard to raise the power factor closer
> to 1. Resistive loads account for very little on a well designed
> motherboard.
Just to clarify -- Resistive loads other than the semiconductors.
Bari
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From math at velocet.ca Sun Jan 27 01:34:29 2002
From: math at velocet.ca (Velocet)
Date: Sun, 27 Jan 2002 01:34:29 -0500
Subject: Dual Athlon MP 1U units
In-Reply-To: ; from rgb@phy.duke.edu on Sat, Jan 26, 2002 at 06:20:10PM -0500
References: <3C52690C.BB73A4D8@paralleldata.com>
Message-ID: <20020127013429.N59723@velocet.ca>
On Sat, Jan 26, 2002 at 06:20:10PM -0500, Robert G. Brown's all...
> On Sat, 26 Jan 2002, W Bauske wrote:
>
> That's basically why I worry about 1U duals. In principle they'll work
> -- keep the outside air cool, pull as much cold air through the cases as
> you can possibly arrange, keep the air clean (so the fans don't clog),
> monitor thermal sensors and kill if they start getting too hot. You can
> see, though, that they are a design that taunts Murphy's Law. Not too
> robust. A little thing like an AC blower motor that blows a circuit
> breaker at 3 am can reduce your $65K rack of hardware to a pile of junk
> in the thirty minutes it takes you to find out and do something about
> it, if you don't have fully automated (and functioning) shutdown setup.
This sounds like you shouldnt have closed boxes at all - why not much more
open cases instead, so that in case some big critical fan somewhere does shut
down, then you arent risking a meltdown of your entire cluster... if its
at least slightly open to the air of the room its in, hopefully regular
convection or other air currents would be enough to keep things cool.
This makes a case (*ahem*) for a thermal power switch placed inside the rack -
if its 50C (or whatever) in the rack, its time to cut the power - I am sure
these things exist and shouldnt be too expensive. Anyone using them?
> Not that a stack of 2U duals is MUCH better. It's still hot -- we have
> 1800 XP's and probably will have more like 150-160W/box. If we only put
> 12 per rack, though, we can leave gaps between the cases and get some
> cooling from the surfaces of the cases and in any event the cases have
In case the fans in the case fail, you mean...?
> much larger air volumes, more room for air to flow through, and more
> room for bigger fans. With luck we'll have SOME time to react (or for
> our automated sentries to react) if the room AC fails and the power
> doesn't.
Why not custom mount a large number of boards in a common space with
a similar number of fans? Then if 1 or 2 (or half of the) fans fail,
there arent 1 or 2 or more boards risking burnout due to 0 cooling -
instead, all the boards involved in that enclosure are sharing half
the cooling - half being better than none, and half being great when you
put in 3 times the airflow than was actually required. Im sure people
have though of this before, and there's a reason why its not more
popular. Just wondering what all your experience is out there.
> relatively low density. In three years, we may need to start repacking
> or replacing with more tightly packed nodes as we grow, but in the
> meantime we'll enjoy slightly reduced risk and greater robustness of
> design.
In 3 years we'll hopefully have CPUs that burn 10W at 5GHz instead! :)
/kc
>
> rgb
>
> --
> Robert G. Brown http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
--
Ken Chase, math at velocet.ca * Velocet Communications Inc. * Toronto, CANADA
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From maurice at harddata.com Sun Jan 27 02:03:23 2002
From: maurice at harddata.com (Maurice Hilarius)
Date: Sun, 27 Jan 2002 00:03:23 -0700
Subject: Beowulf digest, Vol 1 #721 - 11 msgs
In-Reply-To: <200201270041.g0R0fFR20053@blueraja.scyld.com>
Message-ID: <5.1.0.14.2.20020127000007.039c0ec0@mail.harddata.com>
>From: "Lalit Jain"
>To:
>Subject: RE: Dual Athlon MP 1U units
>Date: Sat, 26 Jan 2002 12:30:59 -0600
>Organization: Angstrom Microsystems
>
>Hi,
>
>I do know for a fact that 300W PFC power supplies will work for dual AMD
>systems with up to 3 GB of memory and dual IDE HDDs. The caveat is that
>300W will not drive AGP cards. Assume an extra 110 W for AGP,
>especially for high performance cards such as ATI FireGL2. BTW --
>Certain high performance AGP cards will not properly work with dual AMD
>systems -- even when using the mem-nopentium option in LILO...
What you are talking about are AGP PRO cards, which can use UP TO 115W.
Since a cluster is not a graphics/CAD workstation, I doubt you would want
to run AGP Pro cards.
A typical AGP card, such as an ATI Rage or TNT-2, that is suitable for a
machine in a cluster, uses about 2W to 4W.
>PFC power supplies are standard and are required for certain
>international deployments.
I do not know what you mean by "standard", but they are certainly better.
Most PFC rates supplies are better regulated, and often actually deliver
more current than their non-PFC equivalents. They also do not feed back
signal noise on to the AC mains.
With our best regards,
Maurice W. Hilarius Telephone: 01-780-456-9771
Hard Data Ltd. FAX: 01-780-456-9772
11060 - 166 Avenue mailto:maurice at harddata.com
Edmonton, AB, Canada http://www.harddata.com/
T5X 1Y3
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From niles at scyld.com Sun Jan 27 05:07:36 2002
From: niles at scyld.com (Rick Niles)
Date: Sun, 27 Jan 2002 05:07:36 -0500
Subject: Skyld Beowulf/ Diskless nodes /Installation trouble
Message-ID: <200201271007.g0RA7bf28643@bowler.niles.scyld.com>
> Since no one else in our group was interested in parallel computing, no
> body had noticed the fact that though one could send computational jobs to
> the individual nodes, it could not handle parallel jobs on multiple nodes (
> could not connect to host..is the error I get when I mpirun)
>
> However, I suspect that the NFS mounting of the individual nodes has not
> been done correctly.
I don't think NFS has anything to do with your problem as NFS is not
required for MPICH and Scyld. Unless perhaps your relying on NFS to
push the data around to the nodes.
Check out the file: /etc/beowulf/fstab
> But I can never connect to the nodes when I try to run a parallel job using
> mpirun.
What is the error message? Is it something like:
"Connection failed: Permission denied" ?
> Is there any definitive (and upto date) documentation/howto on how to
> install a diskless beowulf cluster?
Diskless is the default...it should just work. (use floppies or the
CDROM to boot the nodes.)
Rick Niles
Scyld Computing Corp.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From carlos at baldric.uwo.ca Sun Jan 27 10:49:06 2002
From: carlos at baldric.uwo.ca (Carlos O'Donell Jr.)
Date: Sun, 27 Jan 2002 10:49:06 -0500
Subject: Dual Athlon MP 1U units
In-Reply-To: ; from rgb@phy.duke.edu on Sat, Jan 26, 2002 at 06:20:10PM -0500
References: <3C52690C.BB73A4D8@paralleldata.com>
Message-ID: <20020127104906.F1423@systemhalted>
> Power has been discussed on the list before a few times. It depends on
> peak voltage, peak current, and relative phase (power factor). If peak
> voltage is 120V, peak current is 1.7A, and they are in phase, peak power
> is 204W but average power is only 1/\sqrt{2} = 0.707 of this or around
> 144W. I believe that somebody pointed out once that the power factor
> for most hardware is close to 1 so phase differences probably don't
> reduce this a whole lot, but I haven't measured itself and don't know.
>
Just to insert my $0.02 :)
Small points to make about computer power supplies in general:
a. The supplies for a computer are standard issue switching supplies
(Capacitor and diode rectified AC at the inlet)
b. Your supply connected to the wall is a non-linear loads.
What does this mean for you in general?
Although the components on your Motherboard may be resistive in
general, the _way_ the supply is designed causes the following issues:
1. Poor power factor (as seen at the wall)
1a. Not to be confused with the PF seen by your supply.
2. Harmonic distortion of your power (loading at sin wave peaks)
3. Requires active power factor correction on the circuits feeding
the computers (you can't correct non-linear loads with simple
capacitor circuits).
There are infact many papers on this subject. You just need to
do a quick netsearch for them. Power Engineering has long been
tackling this problem for many years in large buildings with thousands
of comptuers.
c.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
From dinamica at webcable.com.br Sun Jan 27 12:31:37 2002
From: dinamica at webcable.com.br (dinamica at webcable.com.br)
Date: Sun, 27 Jan 2002 15:31:37 -0200
Subject: questions on SMP/duals/parallel
Message-ID: