Suffering-Focused AI Safety: In Favor of “Fail-Safe” Measures

Summary

AI-safety efforts focused on suffering reduction should place particular emphasis on avoiding risks of astronomical disvalue. Among the cases where uncontrolled AI destroys humanity, outcomes might still differ enormously in the amounts of suffering produced. Rather than concentrating all our efforts on a specific future we would like to bring about, we should identify futures we least want to bring about and work on ways to steer AI trajectories around these. In particular, a “fail-safe” approach to AI safety is especially promising because avoiding very bad outcomes might be much easier than making sure we get everything right. This is also a neglected cause despite there being a broad consensus among different moral views that avoiding the creation of vast amounts of suffering in our future is an ethical priority.