Parallel Computing, Failure Recovery, and Extreme Values

Abstract

A task of random size T is split into M subtasks of lengths T1, …, TM, each of which is sent to one out of M parallel processors. Each processor may fail at a random time before completing its allocated task, and then has to restart it from the beginning. If X1, …,XM are the total task times at the M processors, the overall total task time is then ZM = max1,…,MXi. Limit theorems as M → ∞ are given for ZM, allowing the distribution of T to depend on M. In some cases the limits are classical extreme value distributions, in others they are of a different type.

Preview

References

Asmussen, S., Fiorini, P., Lipsky, L., Rolski, T., Sheahan, R., 2007/8. On the distribution of total task times for tasks that must restart from the beginning if failure occurs. Mathematics of Operations Research (to appear).zbMATHGoogle Scholar