Author

Date of Award

Degree Type

Degree Name

Department

Computer Science

First Advisor

Dr. Raheem Beyah - Committee Chair

Second Advisor

Dr. Anu Bourgeois - Committee Member

Third Advisor

Dr. Upkar Varshney - Committee Member

Abstract

In this body of work, we present the details of a novel method for passive resource discovery in cluster grid environments where resources constantly utilize inter-node communication. Our method offers the ability to non-intrusively identify resources that have available memory or CPU cycles; this is critical for lowering queue wait times in large cluster grid networks, and for memory-intensive cluster grid applica-tions such as Gaussian (computational chemistry package) and the Weather Research and Forecasting (WRF) modeling package. The benefits include: (1) low message complexity, (2) scalability, (3) load bal-ancing support, and (4) low maintainability. Using several test-beds (i.e., a small local test-bed and a 50-node Deterlab test-bed), we demonstrate the feasibility of our method with experiments utilizing TCP, UDP and ICMP network traffic. Using this technique, we observed a correlation between memory or CPU load and the timely response of network traffic. In such situations, we have observed that in highly utilized (due to multi-programming) nodes there will be numerous, active processes which require context switching or paging. The latency associated with numerous context switches or paging manifests as a de-lay signature within the packet transmission process. Our method detects this delay signature to determine the utilization of network resources. The aforementioned delay signature is the keystone that provides a correlation between network traffic and the internal state of the source node. We characterize this delay signature due to CPU utilization by (1) identifying the different types of assembly language instructions that source this delay and (2) describing how performance-enhancing techniques (e.g., instruction pipelin-ing, caching) impact this delay signature by using the LEON3, implemented as a 40 MHz development board. At the software level, results for medium sized networks show that our method can consistently and accurately identify nodes with available memory or CPU cycles (< 70% availability). At the hardware level, our results show that excessive context switching in active applications increases the average mem-ory access time, thus adding additional delay to the execution of LD instructions. Additionally, internal use of these instructions in heavily utilized situations to send network packets induces the delay signature into network traffic.