(Please forget about the segfault. It was my mistake).
I use OpenMPI-1.7.2 (build with gcc-4.7.2) to run the program. I used
contrib/platform/lanl/cray_xe6/optimized_lustre and
--enable-mpirun-prefix-by-default for configuration. As I said, it works
fine with aprun, but fails with mpirun/mpiexec.

knteran_at_mzlogin01:~/test-openmpi> ~/openmpi/bin/mpirun -np 4 ./a.out
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 4 slots
that were requested by the application:
./a.out

Either request fewer slots for your application, or make more slots
available
for use.