It's looking like it's a universal problem... I am surprised that there
is no mention of it on the mpich web page. LAM can work for some of our
users but as far as I know LAM does not support the spawning of multiple
jobs by the same user from a single machine. Our clusters have front
end systems which support job runs on several subclusters and it is
quite common for users to want to start up more than one job on multiple
subclusters. I guess I will try submitting a bug report to the mpich
people and see what happens...
Thanks,
--JIM
Mark Hartner wrote:
>>7.0 through 7.3. All of these systems exhibit the same problem with
>>mpich 1.2.3, upon reboot. Mpich 1.2.1 and LAM MPI does not exhibit this
>>behavior. Has anyone experienced this problem or know what could be
>>causing it?
>>>>>>We saw the exact same problem on our cluster. We even saw it with a simple
>'hello world' program. Our solution was to switch to LAM MPI. We had a
>little trouble getting LAM MPI and MPE working, but eventually got it
>working. We can send you the bug fix if you want to use LAM and MPE.
>>Mark
>>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20020917/e0f25f19/attachment.html>