Robust Submodular Observation Selection

Abstract

In many applications, one has to actively select among a set of
expensive observations before making an informed decision. For
example, in environmental monitoring, we want to select locations to
measure in order to most effectively predict spatial phenomena.
Often, we want to select observations which are robust against a
number of possible objective functions. Examples include minimizing
the maximum posterior variance in Gaussian Process regression,
robust experimental design, and sensor placement for outbreak
detection. In this paper, we present the Submodular
Saturation algorithm, a simple and efficient algorithm with strong
theoretical approximation guarantees for cases where the possible
objective functions exhibit submodularity, an intuitive
diminishing returns property. Moreover, we prove that better
approximation algorithms do not exist unless NP-complete
problems admit efficient algorithms. We show how our algorithm can
be extended to handle complex cost functions (incorporating non-unit
observation cost or communication and path costs). We also show how
the algorithm can be used to near-optimally trade off expected-case
(e.g., the Mean Square Prediction Error in Gaussian Process
regression) and worst-case (e.g., maximum predictive variance)
performance. We show that many important machine learning problems
fit our robust submodular observation selection formalism, and
provide extensive empirical evaluation on several real-world
problems. For Gaussian Process regression, our algorithm compares
favorably with state-of-the-art heuristics described in the
geostatistics literature, while being simpler, faster and providing
theoretical guarantees. For robust experimental design, our
algorithm performs favorably compared to SDP-based algorithms.