I'm implementing a non-Hermitian eigensolver using Scalapack routines. So far, I still use the Lapack routine ztrevc to compute the eigenvectors, after the Schur form has been computed by pzlahqr. It also obtain correct results if I use pztrevc, but this routine seems to be incredibly much slower than the serial ztrevc. Has anyone observed the same? What's the reason for this? Is there any other simple way to compute the eigenvectors in parallel?

To answer your question, I have asked around.
Below is the answer from Mark Fahey (currently at Oak Ridge National Lab., US).
Julien.

When I contributed pzlahqr and pztrevc a few years back, it was known (seereference below) that pztrevc had scaling issues. If I remember correctly,I created a subroutine pzlatrs that was used to do numerical scaling(similar to the serial zlatrs), but this routine does not scale in parallelsense. If numerical scaling is not needed, then a direct call to the level2 PB LAS pztrsv rather pztrevc would be much faster.

Since contribution of pzlahqr and pztrevc, I have not revisited pztrevc.

@Article{Fahey:2003:APE, author = "Mark R. Fahey", title = "Algorithm 826: A Parallel Eigenvalue Routine for Complex{Hessenberg} Matrices", journal = "{ACM} Transactions on Mathematical Software", volume = "29", number = "3", pages = "326--336", month = sep, year = "2003", URL = "http://doi.acm.org/10.1145/838250.838256", abstract = "A code for computing the eigenvalues of a complexHessenberg matrix is presented. This code computes the Schur decompositionof a complex Hessenberg matrix. Together with existing ScaLAPACK routines,the eigenvalues of dense complex matrices can be directly computed using aparallel QR algorithm. This parallel complex Schur decomposition routinewas developed to fill a void in the ScaLAPACK library and was based on theparallel real Schur decomposition routine already in ScaLAPACK. Thereal-arithmetic version was appropriately modified to make it work withcomplex arithmetic and implement a complex multiple bulge QR algorithm.Thisalso required the development of new auxiliary routines that performessential operations for the complex Schur decomposition, and that willprovide additional linear algebra computation capability to the parallelnumerical library community.",}

Thanks to Julien and to MarkFahey. pztrsv works well and is faster than ztrevc, at least for matrices of size 512 or larger, and 8 processors. As the equation system defining the eigenvectors is underdetermined and cannot be solved as it stands by pztrsv, I have made the following trick: I have added the equation "(i-th component of eigenvector i) =1" to the i-th equation of the system for eigenvector i of the upper triangular matrix to obtain a unique solution. Works fine.