Abstracts

Performance analysis of various parallelization methods for BLAS3 routines on cluster architectures

Timo Betcke, TU Hamburg-HarburgAdviser: Inge Gutheil (ZAM)

Traditional parallel computers are either based on the shared memory or the distributed memory model. The new ZAMpano-Cluster of the Central Institute for Applied Mathematics at the Research Center Jülich provides both techniques of memory management. On every node of the cluster either OpenMP, a thread based parallelization technique for shared memory systems, or MPI, which is based on message passing between different processes, is possible. The nodes exchange data by using the MPI protocol. For performing operations in linear algebra it is either possible to only use MPI communication even when processors of the same node have to share data or to use a hybrid data distribution model based on OpenMP on one node and MPI for the communication between nodes. In this article the performance of both parallelization models is analyzed for the ZAMpano-Cluster and for a HP 9000-N Class Enterprise Cluster situated at the Technical University of Hamburg-Harburg by measuring the performance of ScaLAPACK matrix-multiplication routines and my own OpenMP routines based on BLAS3. The target will be to create an efficient BLAS3 matrix-multiplication which takes advantage of both worlds OpenMP and MPI.

The Ewald Summation method: Calculating long-range interactions

Zoi Cournia, AthenAdviser: Godehard Sutmann (ZAM)

An implementation of the Ewald summation method, which handles long-range interactions and which is based on the Ewald summation method is presented. The accurate description of long-range electrostatic forces is necessary for a realistic simulation of the structure and dynamics of polar or charged chemical systems. The Ewald summation method is proven to provide controlable results for the electrostatic interactions in such systems, when treated under periodic boundary conditions. The algorithm has been implemented in a Fortran 90 module. It is a stand-alonemodule, getting all relevant information via parameter lists from the calling program and thus can be easily included into existing Molecular Dynamics or Monte Carlo programs. As an example it is added to the parallel program DMMD, developed at the Central Institute for Applied Mathematics (ZAM) at Forschungszentrum Juelich. The module has been tested for a system of a molten salt.

Parallelization of the Multigrid Solver for Flows in Complex Geometries: FASTEST2D-LBR

Jun-Mei Shi, ErlangenAdviser: Bernhard Steffen, Astrid Goeke (ZAM)

The present paper is a documentation for the parallelization of the general flow and transport problem solver FASTEST2D-LBR. Firstly, the numerical methods used in the code is explained, then the essential parts of the parallelization procedures including block structured grid partitioning, block connecting, data dependence handling at block interfaces, data structures in the communication and the implementation of the communication in the code are discussed in detail.

Installation and Test of the DRAMA library on the ZAMpano cluster

Carsten Urbach, FU BerlinAdviser: Bernd Körfgen (ZAM)

For solving large scale problems on parallel computers one has to distribute the work uniformly to the processors, because it is very inefficient to have one processor working while others are waiting, i.e. the work load has to be balanced. Additionally for parallel problems one has inevitably communication between the processors. Therefore the communication has to be distributed uniformly on the processors as well to get a good performance.

Thus there is always the task to distribute or partition the problem between the processors such that the calculation and communication costs are minimized. In order to specify the costs for calculation and communication a so-called cost model is needed.

If the distribution has to be done only once, e.g. before the computation, it is not so important to have a good performance for the partitioning algorithms. But if the distribution has to be done several times during the calculation, as for instance in the case of adaptive mesh refinement or dynamic processes like crash test calculations, the performance of the partitioning algorithm plays a big role.

The DRAMA library is an interface for different existing libraries providing partitioning algorithms linked with a cost model for mesh based parallel applications. Mesh based means, that a complex area is discretized in (finite) elements represented by nodes.

This report is structured in the following way: After describing the theoretical basis of the finite element method and the DRAMA library the performance of the different available algorithms and the functionality of the DRAMA cost model are discussed. A finite element code was implemented to test DRAMA within an application code and first results are presented for this test.

Simulation of colloidal systems in aqueous solution

A software for simulating the dynamical behavior of a colloidal system in aqueous solution is presented. Target application of this software is the es precipitation process of iron in water when adding an electrolyte to the suspension. While the transport process of the electrolyte is described by a continuous density function solving the diffusion equation, molecular dynamics techniques are used for modeling the colloidal particles. Both processes are coupled by electrostatic interaction, which results in a large computational effort. We discuss the simulation model in detail with the numerical approximations and introduce some parallelization strategies for the software. Finally some benchmark results are presented, where it is also shown that for small densities of colloidal particles load balancing between the processing elements is necessary.