Random Stopping Times in Stopping Problems

A mixed strategy is a probability distribution over the set of pure strategies. When a player implements a mixed strategy, she chooses a pure strategy at the outset of the game, and follows that pure strategy all through the game.

A behavior strategy is a function that assigns a mixed action to each of the player’s information sets. When a player implements a behavior strategy, whenever the play reaches an information set the player chooses an action according to the mixed action that corresponds to that information set.

Thus, if the play crosses twice the same information set, a mixed strategy will choose the same action in both visits, while a behavior strategy will choose each time the action to play independently of past play.

The well known Kuhn’s Theorem states that in extensive-form games with perfect recall, the notions of mixed strategies and behavior strategies are the same: it is irrelevant whether the player makes her choices when the play visits the information set, or whether she makes these choices at the outset of the game, because the condition of perfect recall implies in particular that the play can cross each information set only once.

Kuhn’s Theorem holds for finite games: the depth of the game is finite, and there are finitely many actions in each decision nodes. The theorem can be extended to the case of infinite trees (countably many nodes), provided the number of actions in each decision nodes is finite.

In stopping problems the number of nodes is of the cardinality of the continuum, and therefore Kuhn’s Theorem does not apply. Stopping problems are a model that is used in mathematical finance: at every stage an agent receives payoff-relevant information, and she has to choose when to stop (when to sell stocks, or when to exercise options). The payoff is given by some stochastic process. Yuri Kifer asked me a question that boiled down to the following: in the model of stopping problems, are mixed strategies and behavior strategies equivalent?My first response was “sure”. Yuri wanted to see a proof. When I tried to write down the proof, a technical issue emerged. It required some time and the energy of three people to be able to provide a definite answer: “sure”.

The underlying probability space is (Ω,F,p). Time is discrete, and there is a filtration (F(n)), that is, the sequence (F(n)) is an increasing sequence of sub-sigma-algebras of F. The sub-sigma-algebra F(n) captures the information that the decision maker has at stage n. For convenience we assume that F is the union of all F(n), and we denote F(∞) = F.

A stopping problem is given by a stochastic process (X(n)) that is adapted to the filtration (here n ranges over N* = N union {∞}). That is, for each n, the random variable X(n) is measurable w.r.t. the sigma-algebra F(n). The random variable X(∞) is the decision maker’s payoff if she never stops.

A pure strategy of the decision maker is a stopping time σ. That is, a function σ : Ω → N*. Thus, a player can decide never to stop. The expected payoff to the decision maker if she implements the pure strategy σ is E[X(σ)].

A mixed stopping time is a probability distribution over (pure) stopping times. To save the need to define a structure of probability space over the space of stopping times, we adopt Aumann’s definition for mixed strategies in large spaces: a mixed stopping time is a measurable function μ : [0,1] x Ω → N* such that for every r in [0,1], μ(r,•) is a stopping time. The interpretation is that the decision maker chooses at the outset of the game a number r in the unit interval according to the Lesbegue measure, and implements the stopping time μ(r,•). The decision maker’s expected payoff is then E[X(μ(r,•))].

A behavior stopping time chooses a probability to stop at every information set. Therefore it is given by an adapted sequence (β(n)) of [0,1]-valued functions, where n ranges over all natural numbers. At each stage n the decision maker stops with probability β(n,ω), where ω is the true state of nature. The probability that the decision maker never stops is then the infinite product Π(1-β(n)), where the product is over all natural numbers n. The decision maker’s expected payoff is then

Every behavior stopping time (β(n)) has an equivalent mixed stopping time: given a behavior stopping time, define a random variable ρ(n) to be the probability that the decision maker stops before or at time n:

ρ(n) := 1-(1-β(1))(1-β(2))…(1-β(n)).

Then define μ(r,•) to be the first n such that ρ(n) ≤ r.

The converse is also true: every mixed stopping time has an equivalent behavior stopping time.The naive way to construct the behavior stopping time is to calculate for each n the conditional probability of stopping at stage n. A technical issue arises since the conditional probability is determined up to sets of measure 0. But what is important for us is the expected payoff: a mixed stopping time is equivalent to a behavior stopping time if they yield the same expected payoff in all stopping problems. To calculate expectations, what happens in events with measure 0 is not important, and every mixed stopping time has an equivalent behavior stopping time.

All details will appear in a new paper by your humble servant, Boris Tsirelson, and Nicolas Vieille.

Meta

2 comments

I think the problem you describe is alread covered by the version of Kuhn’s theorem in “Repeated Games” by Mertens, Sorin, Zamir, where it is Theorem 1.8 of the second chapter. They also show that one can deal with the conditioning on measure zero sets in a satisfactory way. For games, this is necessary since whether an information set has measure zero or not depends on the behavior of other players and the behavior of the players is not fixed in Kuhn’s theorem.