[Scalapack] MPI errors with BLACS on Ubuntu 9.10

From: Ask Hjorth Larsen
Date: Tue, 24 Apr 2012 16:29:10 +0200 (CEST)

Dear Julie Langou
Thank you for the response, and sorry for the unusually long delay.
I confirm that ScaLAPACK works correctly when installed manually on
Ubuntu. This is with scalapack 2.0.1 and the standard openmpi package on
Ubuntu 10.04. As of this version the packaged scalapack will still not
run correctly (using openmpi it now hangs halfway through the test).
Best regards
Ask Hjorth Larsen
On Mon, 30 Nov 2009, julie langou wrote:

Ask,I would tend to agree with your conclusion.
We provide a python installer to install BLACS and ScaLAPACK.
(http://www.netlib.org/scalapack/)
Maybe you could give it a try and see if it solves your problem.
Regards
Julie
On Nov 26, 2009, at 4:39 PM, Ask Hjorth Larsen wrote:
Dear BLACS developers
I have installed the default Ubuntu 9.10 packages with
Blacs/Scalapack: libblacs-mpi1, libblacs-mpi-dev and so on,
using OpenMPI. ?I get an error when attempting to call
Cblacs_gridinit with certain MPI communicators.
I have attached a simple example program (blacs.c and a
corresponding makefile) which exhibits this behaviour. ?The same
program runs fine on several different non-Ubuntu computers with
BLACS/Scalapack, which leads me to believe that it could be
related to the debian package.
The program is run with 8 cpus (mpirun -np 8 testblacs). ?If
Cblacs_gridinit is called on a subcommunicator for ranks 0, 1,
2, 3, then everything works. ?If instead it is called on ranks
0, 2, 4, 6, then it gives the following error:
[askm:3868] *** An error occurred in MPI_Group_incl
[askm:3868] *** on communicator MPI_COMM_WORLD
[askm:3868] *** MPI_ERR_RANK: invalid rank
[askm:3868] *** MPI_ERRORS_ARE_FATAL (your MPI job will now
abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 3868 on
node askm exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
(The above error is caused specifically by Cblacs_gridinit, not
the explicit creation of the MPI group in the program)
In case this is relevant, here are the output and error files
from running the cblacs tests as per the command 'mpirun -np 4
cblacs_test_shared-openmpi' (-np 8 gives identical output):
http://www.student.dtu.dk/~ashj/opendir/cblacstest.outhttp://www.student.dtu.dk/~ashj/opendir/cblacstest.err
None of the tests fail, but some of them are skipped.
Any help to understand or fix this, or other places to direct
this question or report it if it is a bug, would be greatly
appreciated.
Best regards
Ask Hjorth Larsen
Center for Atomic-scale Materials Design
Technical University
ofDenmark<blacs.c><makefile.txt>_____________________________________________
__
Scalapack mailing list
Scalapack@Domain.Removed
http://lists.eecs.utk.edu/mailman/listinfo/scalapack
**********************************************
Julie Langou; Research Associate in Computer Science
Innovative Computing Laboratory;
University of Tennessee from Denver, Colorado ;-)
julie@Domain.Removed;?http://www.cs.utk.edu/~julie/