Approximating Solutions for Nonlinear Dynamic Tracking Games

Abstract

This paper presents the OPTGAME algorithm developed to iteratively approximate equilibrium solutions of ‘tracking games’, i.e. discrete-time nonzero-sum dynamic games with a finite number of players who face quadratic objective functions. Such a tracking game describes the behavior of decision makers who act upon a nonlinear discrete-time dynamical system, and who aim at minimizing the deviations from individually desirable paths of multiple states over a joint finite planning horizon. Among the noncooperative solution concepts, the OPTGAME algorithm approximates feedback Nash and Stackelberg equilibrium solutions, and the open-loop Nash solution, and the cooperative Pareto-optimal solution.

Appendix

Derivation of the Feedback Nash Equilibirum Solution for a LQDG

All players have access to the complete state information and seek control rules that respond to the currently observed state. Here we will describe the corresponding feedback Nash equilibrium solution for iteration step \(k\), \(\{\hat{\mathbf {x}}_t^*\!(k)\}_{t=1}^T\) and \(\{\hat{\mathbf {u}}_t^{i*}\!(k)\}_{t=1}^T\)\(\forall i\!\in \!\{1,\ldots ,n\}\), that minimizes Eq. 1 subject to Eq. 5 by applying the method of dynamic programming. We set up player \(i\)’s (\(i\!=\!1\ldots ,n\)) individual cost-to-go function for the terminal period, \(T\),

where the scalar \(\xi _T\!(k)\) is the sum of all terms that do not depend on \(\hat{\mathbf {x}}_T^{*}\!(k)\) and \(\mathbf {u}_T^{i}\!(k)\) and is, thus, without any relevance for our further calculations. From Eq. 5 we know that the optimal state vector for the terminal period can be derived by the use of the state vector optimized for the previous time period, \(\hat{\mathbf {x}}_{T-1}^{*}\!(k)\), and the optimal control variables, \(\hat{\mathbf {u}}_{T}^{i*}\!(k)\)\(\forall i\!\in \!\{1,\ldots ,n\}\). To derive the latter, in Eq. 51 we replace \(\hat{\mathbf {x}}_T^{*}\!(k)\) by the right-hand side of Eq. 5, and compute the optimal values of \(J^{i*}_T\!(k)\)\(\forall i\!\in \!\{1,\ldots ,n\}\) by minimizing \(J^{i}_T\!(k)\) with respect to \(\mathbf {u}_T^i\!(k)\), i.e.,

Under the assumption that all players act simultaneously, we can derive optimal control variables of the form \(\hat{\mathbf {u}}_{T}^{i*}\!(k)\!=\mathbf {G}_T^{i}\!(k)\hat{\mathbf {x}}_{T-1}^*\!(k)+\mathbf {g}_T^{i}\!(k)\).14 Plugging these into Eq. 52 we arrive at Eq. 13 and Eq. 14 for \(t=T\) respectively, from which we can compute the feedback matrices, \(\mathbf {G}_T^{i}\!(k)\) and \(\mathbf {g}_T^{i}\!(k)\). The optimal state can then be determined by \(\hat{\mathbf {x}}_{T}^*\!(k)\!=\mathbf {K}_T\!(k)\hat{\mathbf {x}}_{T-1}^*\!(k)+\mathbf {k}_T\!(k)\) (cf. Eq. 17 for \(t=T\)) with \(\mathbf {K}_T\!(k)\) and \(\mathbf {k}_T\!(k)\) given by Eq. 11 and Eq. 12 for \(t=T\) respectively.

For the derivation of period-\((T\!-\!1)\) parameter matrices of the value function, i.e., the Riccati matrices for time period \(T\!-\!1\), we set up the cost-to-go function \(J^{i*}_T\!(k)+J^{i}_{T-1}\!(k)\) and replace \(\hat{\mathbf {x}}_{T}^*\!(k)\) and \(\hat{\mathbf {u}}_{T}^{i*}\!(k)\) by Eq. 17 and Eq. 18 for \(t=T\!-\!1\) respectively.

where the scalar \(\psi _{T-1}\!(k)\) is without any relevance for further calculations since it is the sum of all terms that do not depend on \(\hat{\mathbf {x}}_{T-1}^{*}\!(k)\) and \(\mathbf {u}_{T-1}^{i}\!(k)\). Collecting all terms containing \(\hat{\mathbf {x}}_{T-1}^{*}\!(k)\) we get

and can identify the Riccati matrices for \(T\!-\!1\) by comparing coefficients with Eq. 53. The Riccati matrices are then determined by Eq. 9 and Eq. 10 for \(t=T\!-\!1\) respectively. Then, we minimize the objective function of player \(i\) (\(i=1,\ldots ,n\)), i.e.,

analogously to what was done for period \(T\): In Eq. 53 we replace \(\hat{\mathbf {x}}_{T-1}^{*}\!(k)\) by the linearized system dynamics, \(\mathbf {A}_{T-1}\!(k)\hat{\mathbf {x}}_{T-2}^*\!(k)+\sum \nolimits _{i=1}^n\mathbf {B}_{T-1}^{i}\!(k)\mathbf {u}_{T-1}^{i}\!(k)+\mathbf {c}_{T-1}\!(k)\), compute the expression’s first derivative with respect to \(\mathbf {u}_{T-1}^{i}\!(k)\)\(\forall i\!\in \!\{1,\ldots ,n\}\), and set the derivative equal to zero. A little algebra yields optimal control variables of the form \(\hat{\mathbf {u}}_{T-1}^{i*}\!(k)\!=\mathbf {G}_{T-1}^{i}\!(k)\hat{\mathbf {x}}_{T-2}^*\!(k)+\mathbf {g}_{T-1}^{i}\!(k)\) with \(\mathbf {G}_{T-1}^{i}\!(k)\) and \(\mathbf {g}_{T-1}^{i}\!(k)\) being derived by solving \(2n\) linear matrix equations consisting of Eq. 13 and Eq. 14\(\forall i\!\in \!\{1,\ldots ,n\}\) (for \(t\!=\!T\!-\!1\)) respectively. The optimal state variable for period \(t\!=\!T\!-\!1\) can, then, be determined by \(\hat{\mathbf {x}}_{T-1}^*\!(k)\!=\mathbf {K}_{T-1}\!(k)\hat{\mathbf {x}}_{T-2}^*\!(k)+\mathbf {k}_{T-1}\!(k)\) (cf. Eq. 17 for \(t\!=\!T\!-\!1\)) with \(\mathbf {K}_{T-1}\!(k)\) and \(\mathbf {k}_{T-1}\!(k)\) given by Eq. 11 and Eq. 12 for \(t\!=\!T\!-\!1\) respectively, with the Riccati matrices being determined by Eq. 9 and Eq. 10 (determined again by comparing coefficients).

The procedure sketched for \(t\!=\!T\) and \(t\!=\!T\!-\!1\) can be extended to period \(t\!=\!T\!-\!2\) and generalized to any other period \(t\!=\!\tau (\tau \ge 1)\) by induction. The existence of uniquely determined Riccati matrices for all periods \(t\!\in \!\{1,\ldots ,T\}\) of the LQDG, i.e., each player \(i\) seeking to minimize Eq. 1 subject to Eq. 5, can readily be verified according to, e.g. Basar and Olsder (1999), if the penalty matrices for the states are nonnegative definite (which is what we assumed).

To conclude, the LQDG at iteration step \(k\) is solved by starting with the terminal conditions \(\mathbf {P}_{iT}\!(k)\) and \(\mathbf {p}_{iT}\!(k)\), and integrating the Riccati equations (Eqs. 9 and 10) backward in time. Utilizing both Riccati matrices, \(\mathbf {P}_t^{i}\!(k)\) and \(\mathbf {p}_t^{i}\!(k)\), and feedback matrices, \(\mathbf {G}_t^{i}\!(k)\) and \(\mathbf {g}_t^{i}\!(k)\), computed for all players and for all time periods, i.e., \(\forall i\!\in \!\{1,\ldots ,n\}\) and \(\forall t\!\in \!\{1,\ldots ,T\}\), the \(k\)th iteration of the feedback Nash equilibrium path for the state variable, \(\{\hat{\mathbf {x}}_t^*\!(k)\}_{t=1}^T\), and the \(k\)th iteration of player \(i\)’s equilibrium path for their own control variable, \(\{\hat{\mathbf {u}}_t^{i*}\!(k)\}_{t=1}^T\), are determined by Eq. 17 and Eq. 18 respectively, both being initiated with \(\hat{\mathbf {x}}_{0}^*\!(k)\!=\!\bar{\mathbf {x}}_{0}\) (where \(\mathbf {K}_t\!(k)\) and \(\mathbf {k}_t\!(k)\) are defined by Eq. 11 and Eq. 12 respectively).

Given the values of optimal states and controls, the scalar values of the loss functions can be determined.