Thanks for pointing this out. I believe you are referring to the master thesis of Jack Poulson for PDSYNTRD, and the paper "Accumulating Householder Transformations, Revisited" from Joffrain, Low, Quintana-Ortí, van de Geijn, and Van Zee for the "faster Householder algorithm".

Well, yes, we might definitely want to consider adding these. I am not too familiar with the work personally.

My master's thesis included work on an algorithm similar to PDSYNGST, not PDSYNTRD, but, at the time, I was unfortunately not aware of the existence of the work of Sears/Stanley/Henry and the resulting PDSYNGST/PZHENGST algorithm. Since ScaLAPACK already contains the lower-triangular routines, it is simply a matter of transcription to write the upper triangular routines. Because the upper triangle is the conjugate-transpose of the lower-triangle, simply conjugate-transposing each lower-triangular update yields the upper-triangular update, for both PDSYNGST/PZHENGST and PDSYNTRD/PZHENTRD.

This is simply A21 := A21 tril(A11)^{-H}, and so the conjugate-transpose update is A21^H := tril(A11)^{-1} A21^H, which is clearly equivalent to A12 := triu(A11)^{-H} A12. That update is translated back into PBLAS as

This is obviously a simple case (and forgive me if I translated back into PBLAS incorrectly), but the idea is pretty simple: conjugate-transpose the updates and you can easily translate a lower-triangular storage symmetric routine into an upper-triangular storage version.

This approach would require barely any work for PDSYNGST/PZHENGST, but PDSYNTRD/PZHENTRD is quite detailed...

As for the UT transform blocked Householder application algorithm, it relies on having computed 'true' (generalized) Householder reflectors. In particular, since identity is not a Householder transform, the identity shortcuts would have to be removed in favor of a legal reflection of the form:H = I - 2 [1; zeros(n-1,1)] [1 zeros(1,n-1)], so that H [alpha11; 0] = [-alpha11; 0].