Introduction to stochastic control theory

August 2018 · 9 minute read

I had my first contact with stochastic control theory in one of my Master’s courses about Continuous Time Finance. I found the subject really interesting and decided to write my thesis about optimal dividend policy which is mainly about solving stochastic control problems.

In this post I want to give you a brief overview of stochastic control theory based on excerpts form my thesis. Let’s get started.

What is stochastic control theory?

Imagine, you are a CEO and want to decide if you should pay dividends and if yes, what should the dividend payout ratio be? You will probably start by looking at your cash balance. Then you ask yourself: How much cash can I actually pay as dividend without risking to become insolvent? You remember that last year, raw material prices rose unexpectedly and unplanned repairs had to be made that squeezed the company wallet quite a lot, but sales of an old product also rose. Most of these events were not really foreseeable and therefore random from the company’s perspective. This is where stochastic control theory comes into play. Stochastic control theory helps us find a dividend policy, i.e. a control law, so that we maximize the expected value of all future discounted dividend payments, i.e. the value function. The evolution of the company cash reserve is called the state process.

The following section closely follows the chapter “Stochastic Control Theory” from Björk (2009).

A fairly general class of stochastic control problems can be written like this:

where \(\mu, \sigma\) are the drift and volatility of the stochastic differential equation and \(u_t\) is the control law that is used to control or steer the state process \(X\).

Before we continue, we need to define which type of control law we allow in our problems. It is quite natural to demand that the control law should only depend on past values of the state process. Feedback control laws are one class of control laws that satisfy this property and also the one we will consider. Formally, we can write feedback control laws the following way:

In most practical circumstances, the control law will likely have to obey additional control constraints. We will call the class of admissible control laws \(\mathcal{U} \subset \mathbb{R}^k\) and make the following definition:

We call \(u^*\) the optimal control law for the given problem. But how do we actually find the optimal value function and the associated control law?

In essence, we are interested in answering two questions:

Does an optimal control law exist?

If yes, how can we find it?

We will focus on the second part and to find the optimal control law, we will rely on dynamic programming. To deal with some technical problems, we make the following (sometimes rather strong) assumption:

The following is assumed to hold:

An optimal control law \(u^*\) exists

The optimal value function is \(C^{1,2}\)

Interchanging limits and expectations is justified

Dynamic programming principle

The idea of dynamic programming can be summarized as follows:

Consider two strategies:
- Strategy I: Use the optimal control law \(u^*\) and
- Strategy II: Use an arbitrary control law \(u\) on \([t, t+h]\) and then switch to the optimal control law \(u^*\) for the remaining time interval \((t+h, T]\).

Then compute the expected value of both strategies. Strategy II by definition cannot be better than strategy I. If we let \(h \to 0\), we obtain a partial differential equation, called Hamilton-Jacobi-Bellman equation that we can use to solve our problem.

It can be shown that if any component of \([A, B]_t\) is continuous (here: \(s\)) and any component is of finite variation (again \(s\)), the quadratic variation is zero.
The same logic applies to \([X,s]\). When we applied the conditional expectation, we used the fact that given \((t,x)\) we know the term \(V(t,x)\) (i.e. it is \(\mathcal{F}_t\)-measurable), so we can bring it outside of the conditional expectation operator. Sufficient integrability means that \(V_x \sigma^u\) needs to be bounded so that we get a martingale with expectation zero in our case. Plugin this result back into the inequality from before gives us:

Hamilton-Jacobi-Bellman equation

For each \((t, x) \in [0,T] \times \mathbb{R}^n\) the supremum in the HJBE is obtained by \(u = u^*\).

This means, that the optimal value function \(V\) solves the Hamilton-Jacobi-Bellman equation, but not all functions that solve the HJBE need to be optimal. Solving the HJBE is therefore only necessary, but not sufficient for a solution to be optimal. However, the HJBE is also a sufficient condition for the optimal control problem in certain cases: This is known as the verification theorem for dynamic programming.

Different Hamilton-Jacobi-Bellman equations need different verification theorems. For example, the HJBE associated with the optimal dividend problem is slightly different and requires a different verification theorem (e.g. Schmidli: ‘Stochastic Control in Insurance’ in case you are interested). We continue to follow Bjork and present a verification theorem for the HJBE above.

Verification theorem:

Suppose we have \(H(t,x)\) and \(g(t,x)\) s.t.:

\(H\) is sufficiently integrable and solves the Hamilton-Jacobi-Bellman equation: