ASSURE: Automatic Software Self-healing Using REscue points

Software failures in server applications are a significant
problem for preserving system availability. ASSURE is a
system that introduces rescue points to recover
software from unknown faults, while maintaining both system
integrity and availability, by mimicking system behavior under known
error conditions. Rescue points are locations in existing application
code for handling a given set of
programmer-anticipated failures, which are automatically
repurposed and tested for safely enabling fault recovery
from a larger class of (unanticipated) faults. When a fault
occurs at an arbitrary location in the program, ASSURE
restores execution to an appropriate rescue point and induces the program
to recover execution by virtualizing the
program's existing error-handling facilities. Rescue points
are identified using fuzzing, implemented using a fast coordinated
checkpoint-restart mechanism that handles multi-process and
multi-threaded applications, and, after testing,
are injected into production code using binary patching. We
have implemented an ASSURE Linux prototype that operates without
application source code and without base operating system kernel
changes.