SC IJPE 26

Hua Chai and Wenbing Zhao

Abstract:

In this paper, we discuss an often-ignored, but very important issue, i.e., how to recover slow replicas quickly in a fault tolerant system. Despite the fact that the replicas are deployed in identically-equipped computing nodes, under heavy load, some replicas would lag behind due to various reasons. Quickly recovering slow replicas is important because not doing so could result in reduced throughput, high jitters in end-to-end latency, and reduced replication degree.