With the advent of increasingly larger parallel machines, debugging
is becoming more and more challenging. In particular, applications
at this scale tend to behave non-deterministically, leading to race
condition bugs. Furthermore, gaining access to these large machines
for long debugging sessions is generally infeasible. In this paper,
we present a 3-step algorithm to perform what we call ``processor
extraction'': a procedure to record the execution of a set of
processors from a parallel application, and replay any of them in a
controlled environment. Our technique generates very low
interference in the recorded program thanks to the separation
between non-determinism elimination, and detailed processor
recording. In order to improve robustness and accuracy, we further
augmented our algorithm with a self-correction mechanism.