The Need for Language Support for Fault-Tolerant Distributed Systems

Cezara Dragoi, Thomas Henzinger, Damien Zufferey

Fault-tolerant distributed algorithms play an important role in many
critical/high-availability applications. These algorithms are
notoriously difficult to implement correctly, due to asynchronous
communication and the occurrence of faults, such as the network
dropping messages or computers crashing. Nonetheless there is
surprisingly little language and verification support to build
distributed systems based on fault-tolerant algorithms. In this
paper, we present some of the challenges that a designer has to
overcome to implement a fault-tolerant distributed system. Then we
review different models that have been proposed to reason about
distributed algorithms and sketch how such a model can form the basis
for a domain-specific programming language. Adopting a high-level
programming model can simplify the programmer's life and make the code
amenable to automated verification, while still compiling to
efficiently executable code. We conclude by summarizing the current
status of an ongoing language design and implementation project that
is based on this idea.