QoS in Best-Effort Distributed Computing Infrastructures

Best-effort exploitation of Distributed Computing Infrastructures (BE-DCI) allows operators to maximize the usage of the infrastructures, and users to access the unused part at a lower cost. Because providers do not guarantee that the computing resources remain available to the user during the complete execution of their applications, they offer a diminished Quality of Service (QoS) compared to traditional infrastructures. By profiling the execution of Bag-of-Tasks (BoT) applications on several kinds of BE-DCIs, we observe that more than 20% of the BoTs half their tasks completion rate because of resources volatility.

We developed the SpeQuloS framework which enhances the QoS of BoT applications executed on BE-DCI by reducing the execution time, improving its predictability, and reporting users estimated completion time. SpeQuloS monitors the execution of the BoT on the BE-DCI, and dynamically supplies fast and reliable Cloud resources when the critical part of the BoT is executed. We investigated several strategies to decide when and how many Cloud resources should be provisioned. Performance evaluation using simulations shows that SpeQuloS can reduce BoT completion time by 26%. We obtained preliminary results after a complex deployment on a part of the European Desktop Grid Infrastructure (EDGI). We are also investigating an alternative usage scenario in which SpeQuloS mixes EC2 Spot instances and regular instances to cut the total computing price, while maintaining the same execution time.

Peer-to-peer routing for communications dependability

The extended use of IP networks for telecommunication leads to new dependability concerns. Incidents, which hit the network and disturb communications delivery to users, cannot be always prevented. Network recovery mechanisms are used to redirect communications to a non-failing part of the network. This process goal is to be fast enough to keep good delivery of network services to users.

We studied a new kind of recovery mechanism based on peer-to-peer routing (also called overlay routing). This system uses a virtual network, made of a network nodes set performing routing operations. Its advantages are that it can be used to recover any kind of IP communications, and to not depend on the network infrastructure. It also allows an end-to-end protection of a communication. This is contrary to usual recovery mechanisms, such routing protocols, which only operate inside the various networks used to forward a communication and do not cooperate between themselves. Finally, this mechanism is deployed by network users, and thus can be adapted to the dependability needs for each of their communications.

We studied and designed such system, which main goal is to enhance the users' communications dependability when affected by an incident. We studied the incident detection, which is the first step of any recovery operation. We introduced our peer-to-peer based network recovery system, and its implementation. We evaluated its efficiency with experiments performed in various test beds and simulations, and we compared it to other recovery mechanisms.

We shown that our system allows fast communication recovery if users need it. Moreover, its resources consumption are moderated and related to users needs. That system can significantly improve communications dependability when incidents hit the network, particularly if recovery mechanisms deployed by network operators cannot bring the dependability level pursued by a user.