The applications of performance potential to sensitivity analysis and optimization on Markov chains

This thesis is dedicated to the applications of performance potential in the sensitivity problems and decision problems on Markov chains. In the first chapter, we will briefly review the concept of performance potential and its important role in the perturbation analysis (PA) and Markov decision processes (MDP). In the second chapter, we obtain some further results for two basic single sample path based sensitivity analysis techniques for Markov chains: the potential based perturbation analysis (PA) and the likelihood ratio based ensemble average importance sampling (EAIS) method. We show that the two techniques can be derived from the same equation and therefore are closely related. A comparison between the implementations of these two techniques on a single sample path is interesting: In PA, state transitions are counted forward, and in EAIS, they are counted backward. We show that the fundamental equations in PA can be derived by applying EAIS. In this sense, the two techniques are equivalent. In the third chapter, we present a time aggregation approach to Markov decision processes. A sample path of a Markov process is divided into consecutive segments, each segment starts with a state in a subset S1 of the state space S. The starting points of all the segments form an embedded chain. The embedded chain is always Markov, which is in contrast to state aggregation which requires strict conditions for the aggregated chain to be Markov. Properly defining the performance function for this embedded chain by aggregating the performance of the original chain on the segment bewteen two consecutive embedded points, we can turn the problem of the policy iteration of the original Markov chain on the subset S1 into the same problem for the embedded Markov chain, which has a smaller state space. Iteratively applying this approach to a set of complementary subsets, we can turn the policy iteration of the original Markov chain into a series of policy iterations of the embedded chains with each subset as state space. Single sample path based estimation algorithms are developed; with these algorithms, the time aggregation approach can be implemented on-line for practical systems. A numerical example is presented to illustrate the ideas and possible computation savings.

Note

Thesis (M.Phil.)--Hong Kong University of Science and Technology, 1998