Description

As the computing power of supercomputers continues to increase
exponentially the mean time between failures (MTBF) is
decreasing. Checkpoint-restart has historically been the method
of choice for recovering from failures. However, such methods
become increasingly inefficient as the time required to complete
a checkpoint-restart cycle approaches the MTBF. There is
therefore a need to explore different ways of making computations
fault tolerant. This thesis...[Show more] studies generalisations of the sparse
grid combination technique with the goal of developing and
analysing a holistic approach to the fault tolerant computation
of partial differential equations (PDEs). Sparse grids allow one
to reduce the computational complexity of high dimensional
problems with only small loss of accuracy. A drawback is the need
to perform computations with a hierarchical basis rather than a
traditional nodal basis. We survey classical error estimates for
sparse grid interpolation and extend results to functions which
are non-zero on the boundary. The combination technique
approximates sparse grid solutions via a sum of many coarse
approximations which need not be computed with a hierarchical
basis. Study of the combination technique often assumes that
approximations satisfy an error splitting formula. We adapt
classical error splitting results to our slightly different
convention of combination level.
Literature on the application of the combination technique to
hyperbolic PDEs is scarce, particularly when solved with explicit
finite difference methods. We show a particular family of finite
difference discretisations for the advection equation solved via
the method of lines has solutions which satisfy an error
splitting formula. As a consequence, classical error splitting
based estimates are readily applied to finite difference
solutions of many hyperbolic PDEs. Our analysis also reveals how
repeated combinations throughout the computation leads to a
reduction in approximation error.
Generalisations of the combination technique are studied and
developed at depth. The truncated combination technique is a
modification of the classical method used in practical
applications and we provide analogues of classical error
estimates. Adaptive sparse grids are then studied via a lattice
framework. A detailed examination reveals many results regarding
combination coefficients and extensions of classical error
estimates. The framework is also applied to the study of
extrapolation formula. These extensions of the combination
technique provide the foundations for the development of the
general coefficient problem. Solutions to this problem allow one
to combine any collection of coarse approximations on nested
grids. Lastly, we show how the combination technique is made
fault tolerant via application of the general coefficient
problem. Rather than recompute coarse solutions which fail we
instead find new coefficients to combine remaining solutions.
This significantly reduces computational overheads in the
presence of faults with only small loss of accuracy. The latter
is established with a careful study of the expected error for
some select cases. We perform numerical experiments by computing
combination solutions of the scalar advection equation in a
parallel environment with simulated faults. The results support
the preceding analysis and show that the overheads are indeed
small and a significant improvement over traditional
checkpoint-restart methods.