Computer Science > Machine Learning

Abstract: Consequential decisions are increasingly informed by sophisticated
data-driven predictive models. For accurate predictive models, deterministic
threshold rules have been shown to be optimal in terms of utility, even under a
variety of fairness constraints. However, consistently learning accurate models
requires access to ground truth data. Unfortunately, in practice, some data can
only be observed if a certain decision was taken. Thus, collected data always
depends on potentially imperfect historical decision policies. As a result,
learned deterministic threshold rules are often suboptimal. We address the
above question from the perspective of sequential policy learning. We first
show that, if decisions are taken by a faulty deterministic policy, the
observed outcomes under this policy are insufficient to improve it. We then
describe how this undesirable behavior can be avoided using stochastic
policies. Finally, we introduce a practical gradient-based algorithm to learn
stochastic policies that effectively leverage the outcomes of decisions to
improve over time. Experiments on both synthetic and real-world data illustrate
our theoretical results and show the efficacy of our proposed algorithm.