NERSC Initiative for Scientific Exploration (NISE) 2011 Awards

Improving Scalability of an Eigensolver for the Kohn-Sham Electronic Structure Problem

Grady Schofield, University of Texas at Austin

Associated NERSC Project: Computational Study of Electronic and Structural Properties and Functionalization of Nanostructures (m433)Principal Investigator: James Chelikowsky, University of Texas at Austin

NISE Award:

1,000,000 Hours

Award Date:

March 2011

In the Kohn-Sham electronic structure problem, the electron density of a system of atoms is computed from the eigenvectors of the Kohn-Sham equation where for each valence electron in the system an eigenvector must be computed. For large systems, this makes it necessary to form an approximation subspace that is composed of many vectors. The algorithm in our code, called PARSEC, that forms the approximation space creates what is essentially a type of Krylov subspace. We have noticed that even for moderately sized systems, beginning the iterative process with a group of vectors, as opposed to only a single vector, does not degrade the performance of the algorithm in the sense that the final approximating subspace does not need to be substantially larger than that of space created with only a single starting vector. In fact, in the case where the potentials in the Kohn-Sham equation induce nearly degenerate states, i.e. states where eigenvalues agree to ten or so digits which occur in systems with a high degree of symmetry ( crystals for instance ), beginning the subspace creation process with multiple starting vectors improves robustness of the algorithm for resolving the nearly degenerate subspaces to a high degree of accuracy.

In the solution of the eigenproblem, we are interested in a sufficiently large number of the eigenpairs with the smallest eigenvalues. To create the approximation subspace for these eigenvectors, we apply an operator that enhances the eigencomponents of a vector in the low end of the spectrum. This operator is a high degree polynomial in the Hamiltonian matrix, whose application to a vector is computationally intensive. Trying to decrease the time to solution by adding more processors to a given job is eventually hindered by the communication overhead associated with the matrix vector multiplication routine. As long as system size is scaled up with the number of processor cores, PARSEC scales well; and decreasing the time to solution by increasing the number of processors works well. However there are certain types of applications where the code must run for a very long period of time on a relatively small system which leads to a small Hamiltonian. This is the case in molecular dynamics simulations where many thousands of time steps must be taken sequentially, solving the eigenproblem at each step. Trying to decrease the time to solution by parallelizing the matrix-vector multiplication on such small Hamiltonian matrices is like trying to split up a single floating point operation over multiple processor cores, and communication overhead quickly dominates.

We propose to further develop the block subspace creation algorithm by putting the application of each filter operator on a separate set of processors, so that the cost of filtering multiple starting vectors has a wall clock time of filtering only one vector. This development would increase the scalability of the code in any chemical scenario where anywhere from one hundred to many thousands of valence electrons are present. Moreover, the applicability of this parallelization scheme in the smaller system regime would provide a means to decrease time to solution in important computationally intensive scenarios that involve smaller Hamiltonians such as long molecular dynamics runs and in computations used to bootstrap GW calculations for excited states where a large number of eigenvectors associated with non-valence electrons need to be calculated, but for which the systems are usually small.

PARSEC is a well developed, heavily parallelized electronic structure code with three layers of parallelization built into it. It does all of the computations for spin up/down, multiple k-points, and multiple mirror images of symmetric systems simultaneously across independent groups of processors. It parallelizes matrix-vector multiplication by distributing vectors across processors. It also breaks the spectrum into slices and computes each slice across independent groups of processors with windowing filter operators. Adding another layer of parallelization is a substantial development effort whose testing and debugging will require substantial computing resources given the nature of the feature we are proposing to add. PARSEC is freely available to the scientific community and investment in development of the code will meet the NISE program's objectives by improving scalability of the code and making it easier for scientists to do their work on computationally intensive calculations This research would allow our electronic structure code to scale to a number of processors at least an order of magnitude larger than what is currently possible. The goal is to improve scaling of PARSEC for MD simulations, where currently many thousands of time steps must be taken sequentially.over a wide range of system sizes.