Grid'5000 user report for

User information

Experiments

Large-scale performance tests (Distributed algorithms) (Programming) [in progress]Description: Several consensus and atomic broadcast algorithms are tested on a distributed system with a large number of nodes in order to examine how well the algorithms scale as the size of the system increases. The number of nodes available in Grid'5000 allow us to perform experiments that wouldn't have been possible in our laboratory. Furthermore, by executing experiments on several different sites in Grid'5000, we can also measure the influence of the inter-site latency on the performance of the different algorithms.Results:

Wide-area network performance tests (Networking) [in progress]Description: The performance of several atomic broadcast algorithms is measured in various wide-area network settings (i.e. with a small number of nodes, but distributed on several sites). The large number of sites in Grid'5000 allow us to perform tests in several settings: by selecting the sites that participate in the experiments, we can vary the latencies in the wide-area network and thus measure the effect of the latency on the performance of the algorithms. In a local-area network cluster or in a grid with few sites, this would not have been possible.Results:

Publications

Since the introduction of the concept of failure detectors, several consensus and atomic broadcast algorithms based on these detectors have been published. The performance of these algorithms is often affected by a trade-off between the number of communication steps and the number of messages needed to reach a decision. Some algorithms reach decisions in few communication steps but require more messages to do so. Others save messages at the expense of an additional communication step to diffuse the decision to all processes in the system. This trade-off is heavily influenced by the network latency and the message processing times. Performance evaluations of these algorithms, both in simulated or in real environments, have been published. These evaluations often consider a symmetrical setup : all processes are on the same network and have identical peer-to-peer latencies. In this paper, we evaluate the performance of three consensus and atomic broadcast algorithms using failure detectors in several wide area networks. We specifically focus on the case of a system with three processes, two of which are on a local area network and the third on a distant site and examine how this setting affects the performance of all three algorithms.

Author:

Ekwall, Richard and Schiper, Andr\'e

Details:

http://infoscience.epfl.ch/search.py?recid=88139

Keywords:

Atomic broadcast; Wide area network; Performance evaluation

Unit:

LSR

Collaborations

Success stories and benefits from Grid'5000

Overall benefits

Extremely large number of nodes, which is useful for the scalability tests

Large number of sites in different geographical locations, which is useful for the performance tests that are affected by the latency between different sites.

Very recent, dual-processor compute nodes

Easy cluster management facilities, which facilitate the management of large test sets, as well as the reproducibility of the tests. The kadeploy suite is extremely useful in this aspect.

All of these benefits enable us to launch tests at a much larger scale than in our usual testbeds. Since we're experimenting on the performance of large-scale distributed systems, these properties are essential.