The documents distributed by this server have been provided by the
contributing authors as a means to ensure timely dissemination of
scholarly and technical work on a noncommercial basis. Copyright and all
rights therein are maintained by the authors or by other copyright
holders, notwithstanding that they have offered their works here
electronically. It is understood that all persons copying this
information will adhere to the terms and constraints invoked by each
author's copyright. These works may not be reposted without the explicit
permission of the copyright holder.

Abstract

Large-scale parallel applications performing global syn- chronization may spend a significant amount of execution time waiting for the completion of a barrier operation. Consequently, numerous research works have focused on reduc- ing the communication costs of synchronization primitives. However, so far there has been no exhaustive comparison of barrier algorithms. This paper will investigate significant representatives of this family of algorithms and evaluate their diverging characteristics, with the purpose of assessing their properties within the context of a specific scenario. The first part of this work will introduce four run time complexity classes, to which all barrier algorithms are known to belong. Then, the LogP model will be used to analyze the behavior and predict the running time of a representative algorithm of each class. As these performance predictions will be scrutinized with the help of measurements conducted on original implementations based on the Open MPI framework, this work will show how to leverage the flexible component architecture of this new MPI implementation, which has proved to be an ideal research tool.