Abstract

Approximating stochastic processes by scenario trees is important in decision analysis. In this paper we focus on improving the approximation quality of trees by smaller, tractable trees. In particular we propose and analyze an iterative algorithm to construct improved approximations: given a stochastic process in discrete time and starting with an arbitrary, approximating tree, the algorithm improves both, the probabilities on the tree and the related path-values of the smaller tree, leading to significantly improved approximations of the initial stochastic process. The quality of the approximation is measured by the process distance (nested distance), which was introduced recently. For the important case of quadratic process distances the algorithm finds locally best approximating trees in finitely many iterations by generalizing multistage k-means clustering.

Mathematics Subject Classification

Notes

Acknowledgments

We thank the referees for their constructive criticism. We wish to thank two anonymous referees for their dedication to review the paper. Their valuable comments significantly improved the content and the presentation. Parts of this paper are addressed in the book Multistage Stochastic Optimization (Springer) by Pflug and Pichler, which also summarizes many more topics in multistage stochastic optimization and which had to be completed before final acceptance of this paper.

Compliance with ethical standards

Funding

This research was partially funded by the Austrian science fund FWF, project P 24125-N13 and by the Research Council of Norway, Grant 207690/E20.

Appendix 1: Scenario approximation with Wasserstein distances

Given a probability measure P we ask for an approximating probability measure, which is located on \(\Xi ^{\prime }\), that is to say its support is contained in \(\Xi ^{\prime }\). The following proposition reveals that the pushforward measure \(P^{{\mathbf {T}}}\), where the mapping \({\mathbf {T}}\) is defined in (ii) of the following proposition, is the best approximation of P located just on \(\Xi ^{\prime }\), i.e., \(P^{{\mathbf {T}}}\) satisfies

It was addressed in the introduction that the approximation can be improved by relocating the scenarios themselves, and by allocating adapted probabilities to these scenarios. The following two sections address these issues by applying the previous Proposition 1.

Optimal probabilities

The optimal measure \(P^{{\mathbf {T}}}\) in Proposition 1 notably does not depend on the order r. Moreover, given a probability measure P, Proposition 1 (ii) allows to find the best approximation, which is located just on finitely many points \(Q=\left\{ q_{1}\dots q_{n}\right\} \). The points \(q_{j}\in Q\) are often called quantizers, and we adopt this notion in what follows (see the œuvre of Gilles Pagés, e.g., Bally et al. (2005) for a comprehensive treatment).

According to Proposition 1 the best approximating measure for \(P=\sum \nolimits _{i}p_{i}\delta _{\xi _{i}}\), which is located on Q, is given by \(P^{Q}=\sum \nolimits _{j}p_{j}^{*}\delta _{q_{j}}\). For a discrete measure this can be formulated by a linear program as

Observe as well that the matrix \(\pi ^{*}\) in (29) has just \(\left| \Xi \right| \) non-zero entries, as in every row i of \(\pi ^{*}\) there is just one non-zero entry \(\pi _{i,j}^{*}\). This is a simplification in comparison with Remark 2, as the solution \(\pi \) of (4) has \(\left| \Xi \right| +\left| \Xi ^{\prime }\right| -1\) non-zero entries, if the probability measure \(P^{\prime }\) is specified.

Finally, given the support points Q, it is an easy exercise to look up the closest points according to (29), and sum up their probabilities according (30), such that the solution of (27), the closest measure to P located on Q, is immediately obtained by \(P^{Q}=\sum \nolimits _{j}p_{j}^{*}\delta _{q_{j}}\).

Optimal supporting points—facility location

Given the previous results on optimal probabilities the problem of finding a sufficiently good approximation of P in the Wasserstein is reduced to the problem of finding good locations Q, that is to minimize the function

Minimizing (33) with respect to the quantizers \(\left\{ q_{1},\dots q_{n}\right\} \) is often referred to as facility location, as in Drezner and Hamacher (2002). This problem is not convex, and no closed form solution exists in general, it hence has to be handled with adequate numerical algorithms. Moreover, it is well known that the facility location problems are is NP-hard.

For the important case of the quadratic Wasserstein distance, Proposition 1 (iii) and its proof give rise for an adaption of the k-means clustering algorithm [also referred to as Lloyd’s algorithm, cf. Lloyd (1982)], which is described in Algorithm 2. In this case the conditional average is the best approximation in terms of the Euclidean norm, such that the algorithm terminates after finitely many iterations at a local minimum.

Theorem 4

The measures \(P^{k}\) generated by Algorithm 2 are improved approximations for P, they satisfy

For other distances than the quadratic Wasserstein distance, \(P^{k}\) is possibly a good starting point to solve (33), but in general not already a local (global) minimum.

Appendix 2: Stochastic processes and trees

Any tree induces a filtration

Any tree with height T and finitely many nodes \({\mathcal {N}}\) naturally induces a filtration \({\mathcal {F}}\): First use \({\mathcal {N}}_{T}\) as sample space. For any \(n\in {\mathcal {N}}\) define the atom5\(a\left( n\right) \subset {\mathcal {N}}_{T}\) in a backward recursive way by

From the construction of the atoms it is evident that \({\mathcal {F}}_{0}=\left\{ \emptyset ,\,{\mathcal {N}}_{T}\right\} \) for a rooted tree and that \({\mathcal {F}}=\left( {\mathcal {F}}_{0},\ldots {\mathcal {F}}_{T}\right) \) is a filtration on the sample space \({\mathcal {N}}_{T}\), i.e. it holds that \({\mathcal {F}}_{t}\subset {\mathcal {F}}_{t+1}\). Notice that node m is a predecessor of n, i.e. \(m\in {\mathcal {A}}(n)\), if and only if

It is natural to introduce the notation \(i_{t}:=\nu _{t}\left( i\right) \) which denotes the state of the tree process for any final outcome \(i\in {\mathcal {N}}_{T}\) at stage t. It then holds that \(i_{T}=i\), and moreover that \(i_{t}\in {\mathcal {A}}(i_{\tau })\) whenever \(t\le \tau \), and finally—for a rooted tree—\(i_{0}=0\). The sample path from the root node 0 to a final node \(i\in {\mathcal {N}}_{T}\) is

Any filtration induces a tree

On the other hand, given a filtration \({\mathcal {F}}=\left( {\mathcal {F}}_{0},\ldots {\mathcal {F}}_{T}\right) \) on a finite sample space \(\Omega \) it is possible to define a tree, representing the filtration: Just consider the sets \(A_{t}\) that collect all atoms that generate \({\mathcal {F}}_{t}\) (\({\mathcal {F}}_{t}=\sigma \left( A_{t}\right) \)), and define the nodes

Hence filtrations on a finite sample space and finite trees are equivalent structures up to possibly different labels, and in the following, we will not distinguish between them.

Measures on trees

Let P be a probability measure on \({\mathcal {F}}_{T}\), such that \(\left( {\mathcal {N}}_{T},{\mathcal {F}}_{T},P\right) \) is a probability space. The notions introduced above allow to extend the probability measure to the entire tree via the definition (cf. Fig. 3)

In multistage stochastic programming, a decision maker has the possibility to influence the results to be expected at the very end of the process by making a decision \(x_{t}\) at any stage t of time, having available the information which occurred up to the time when the decision is made, that is \(\xi _{0:t}\). The decision has to be taken prior to the next observation \(\xi _{t+1}\) (e.g., a decision about a new portfolio allocation has to be made before knowing next days security prices).

This nonanticipativity property of the decisions is modeled by the assumption that any \(x_{t}\) is measurable with respect to \({\mathcal {F}}_{t}\) (\(x_{t}\lhd {\mathcal {F}}_{t}\)), such that again