Percentile objective criteria in limiting average Markov Control Problems

View/Open

Date

Author

Metadata

Abstract

Infinite horizon Markov Control Problems, or Markov Decision
Processes (MDP's, for short), have been extensively studied since
the 1950's. One of the most commonly considered versions is
the so-called "limiting average reward" model. In this model
the controller aims to maximize the expected value of the limit-average
("long-run average") of an infinite stream of single-stage
rewards or outputs. There are now a number of good algorithms
for computing optimal deterministic policies in the limiting average
MDP's. In this paper we adopt the point of view that there are
many natural situations where the controller is interested in finding
a policy that will achieve a sufficiently high long-run average
reward, that is, a target level with a sufficiently high probability,
that is, a percentile.