On Fri, Jan 18, 2008 at 08:33:10PM -0500, Jeff Squyres wrote:
> Barry --
>
> Could you check what apps are still running when it hangs? I.e., I
> assume that all the uptime's are dead; are all the orted's dead on the
> remote nodes? (orted = our helper process that is launched on the
> remote nodes to exert process control, funnel I/O back and forth to
> mpirun, etc.)

Here's the stack trace of the orted process on node 01. The "uname"
process was long gone (and had sent its output back with no difficulty).