I'm running a ScaLAPACK code to solve for the eigenvectors and eigenvalues of symmetric matrices of order 2000 x 2000, using the series of subroutines pdsytrd, pdstebz, pdstein, and pdormtr. I have the program set up correctly, but the performance seems to only improve up to about 8 processors; using more than 9 processors produces the same or even longer runtimes. I am using the grid geometries recommended in the user's guide (1 x np for np < 9, and square grids for np >= 9) and have tried blocking factors of 20, 64 (the value recommended in the user's guide), and 100, but the performance improvement seems to top out at 8 processors. Are there any changes I can make or tricks I can use to improve the scalability?

if you keep your matrix size constant and keep on increasing the number of processors, the time to solution will progressively decrease but not forever. At a point it will stagnate and finally augment. That's completely normal.

For a 2000x2000 matrix your optimal number of processors for bisection and inverse iteration is 8. This is possible.

To improve the scalability, you need to increase the size on your matix.