Abstract

Nowadays scientific applications address complex problems in nature. As a consequence they have a high demand for computation power and for an I/O infrastructure providing performant access to data. These demands are satisfied by various supercomputers in order to engage grand challenges. From the operator's point of view it is important to keep the available resources of such a multi dollar machine busy, while the end-user is concerned about the runtime of the application. Evidently it is vital to avoid idle times in an application and congestion of network as well as of the I/O subsystem to ensure maximum concurrency and thus efficiency. Load balancing is a technique which tackles this issues. While load balancing has been evaluated in detail for computational parts of a program, analysis of load imbalance for complex storage environments in High Performance Computing has to be addressed as well. Often parallel file systems like Lustre, GPFS, or PVFS2 are deployed to meet the needs of a fast I/O infrastructure. This thesis evaluates the impact of unbalanced workloads in such parallel file systems exemplarily on PVFS2 and extends the environment to allow dynamic (and adaptive) load balancing. Some cases leading to unbalanced workloads are discussed, namely unbalanced access patterns, inhomogeneous hardware, and rebuilds after crashes in an environment promising high availability. Important factors related to the performance are described, this allows to build simple performance models on which the impact of such load imbalances can be assessed. Some potential countermeasures to fix these unbalanced workloads are discussed in the thesis. While most cases could be alleviated by static load balancing mechanisms a dynamic load balancing seems important to make up for environments with fluctuating performance characteristics. In the thesis extensions to the software environment are designed and realized that provide capabilities to detect bottlenecks and to fix them by moving data from higher loaded servers to lower loaded servers. Therefore, further mechanisms are integrated into PVFS2, which allow and support dynamic decisions to move data by a load-balancer. A first heuristics is implemented using the extensions to demonstrate how they can be used to build a dynamic load-balancer. Experiments are run with balanced as well as unbalanced workloads to show the server behavior.Also a few experiments with the developed load-balancer in a real environment are made. These results demonstrate problematic issues and demonstrate that load balancing techniques could be successfully applied to increase productivity.