I recently ran routine ZHEEVD (LAPACK 3.3) and noticed very poor performance. When using the work space size given by a query to ZHEEVD, it looks like in the call to routine ZUNMTR not enough work space is provided for optimal performance. This does NOT happen for DSYEVD! Because of a lack of work space the routine can ran significantly slower (factor 2+). Maybe someone wants to give it a quick try.

As a related problem: I noticed a similar behavior in SSYEVR (LAPACK 3.4.2). For this routine, the call to SORMTR seems to be rather slow. It might the same problem of not providing optimal work space, but I am too lazy to check

The following is what I understand from reading the source code; it could be wrong.

ZHEEVD is being lazy and only asking for the optimal ZHETRD workspace size, without considering the size desired by ZUNMTR. If you use an unchanged ILAENV in Netlib Lapack, these two happen to want the same block size, so if it were as simple as that, there should be no performance difference. The problem however, is that by the time ZUNMTR gets called, there is only a size N workspace remaining, so it's essentially use the completely unblocked code. The computation of the optimal workspace is wrong, since around line 290 of ZHEEVD, it compares the minimum size LWMIN with N + ZHETRD_opt, which really should be something more like LWMIN + ZHETRD_OPT - N.

As a temporary measure, it seems you can take the optimal size returned, and add onto it the optimal size returned by ZUNMTR, and subtract N, and that should be "optimal".