A Management System for Service level Agreements in Grid based Systems

Abstract

Grid based systems have increased the opportunity for users to deploy and execute their applications using Grid resources. These resources have varying reliability and performability, particularly when demand is high. If a Grid application is executed at such times, performance may suffer and results may be delayed. In order to overcome this problem, application management is needed to support Quality of Service (QoS) requirements. The Distributed Aircraft Maintenance Environment (DAME) is an example of a Grid based system in which users wish to attach application QoS requirements.
In light of this, an adaptive SLA (Service Level Agreement) management system is presented which has the ability to interpret application requirements and deliver management using application adaptation. An SLA specification is presented which improves contract non-repudiation by way of elements which allow requirements, guarantees to be specified and provenance to be recorded. To predict the execution time of an application, a technique using historical observations is proposed. An approach which is highly appropriate for Grid based systems which perform countless runs of the same application. This prediction is used in combination with application monitoring to determine the progress made by the application during run-time. Progress is determined by comparing an estimate of the applications remaining execution time and an execution schedule. If application progress is insufficient, a rule-based control algorithm monitors progress and infers control actions which adapt the behaviour of the application.
Experimental analysis is conducted on a local Grid test-bed and a large scale Grid infrastructure, the White Rose Grid. This shows the solution supports application executions with attached time or performance constraints; where use of the system prevents application failure or delay. Migration is useful in reducing the execution time of applications when performance degradation occurs. Mechanisms for automated monitoring and provenance capture are presented, both of which support the operation of the SLA management system.
Adaptive SLA management benefits the users of Grid based systems such as DAME, by providing Grid application management. This is in contrast to current best-effort provision which offers no such guarantee. The ability to provide these guarantees and an SLA specification makes commercial exploitation of these Grid based systems more realistic