Department

Advisor(s)

Second Advisor

Keywords

Subject Categories

Computer Sciences

Abstract

It is often assumed that computational load balance cannot be achieved in parallel and distributed systems without the use of a priori domain knowledge, including precedence constraints and locality information. Hence, in distributed memory architectures, locality maintenance and load balancing are seen as user level activities involving compiler and runtime system support in software. Most efforts on locality conscious data remapping for load balancing require the availability of a global data dependency graph. All such software schemes need an explicit phase for the remapping-system execution, where the application execution is halted. These schemes view load-balancing and locality maintenance as top-down problems in the sense that the user maps knowledge about domain-data interactions into the corresponding execution locality characteristics about architecture-level data abstractions.

The research reported in this dissertation takes a completely different approach, showing excellent prospects for a bottom-up automated approach. In this approach, locality/load-balance are seen as ultimately concerned with architecture-level abstractions. This dissertation presents the first (to our knowledge) architecture-level scheme for extracting locality concurrent with the application execution. An artificial neural network coprocessor is used for dynamically monitoring processor reference streams to learn temporally emergent utilities of data elements in ongoing local computations. Successful extraction of locality and load information at this level considerably eases the programming burden on the user.

The bottom-up approach presented in this dissertation facilitates use of kernel-level load balancing schemes, further easing the user programming burden. A kernel-level scheme migrates data to processor memories evincing higher utilities during load-balancing. This dissertation presents the successful implementation of a locality-conscious attached kernel scheduling algorithm in time-shared multi-processor systems. By resolving the notion of "tasks" according to our memory-mapped semantics, we integrate the unit of load balancing and locality maintenance. This allows our system to migrate lightweight memory units incrementally to balance computational load while maintaining locality properties. This approach also avoids the problems inherent in migrating heavyweight tasks, by taking advantage of current trends supporting data migration over process migration.

The performance of an execution-driven simulation evaluating the proposed co-processor and kernel-level scheduler are presented for three applications. The applications chosen represent the range of load and locality fluxes encountered in parallel programs, with (a) static load and locality characteristics (Unstructured Mesh), (b) slowly varying localities for fixed dataset sizes (Barnes-Hut) and (c) rapidly fluctuating localities among slowing varying dataset sizes (WaTor). The WaTor application belongs to the class of dynamic and adaptive applications, which has been little explored by most other researchers. The performance results indicate the viability and success of the coprocessor in concurrently extracting locality information about ongoing computations. Such locality information is used succesfully by the kernel-level attached scheduling techniques in achieving and maintaining both computational locality and load balance. This is evidenced by improved individual application turnaround times as well as enhanced system throughput.

Access

Surface provides description only. Full text is available to ProQuest subscribers. Ask your Librarian for assistance.