ABHRANTA: Locating Bugs that Manifest at Large System Scales

A key challenge in developing large scale applications (both in system size and in input size) is finding bugs that are latent at the small scales of testing, only manifesting when a program is deployed at large scales. Traditional statistical techniques fail because no error-free run is available at deployment scales for training purposes. Prior work used scaling models to detect anomalous behavior at large scales without being trained on correct behavior at that scale. However, that work cannot localize bugs automatically. In this paper, the authors extend that work with automatic diagnosis technique, based on feature reconstruction, and validate their design through case studies with two real bugs from an MPI library and a DHT-based file sharing application.