Stopping Problems

by The Bayesian Observer

Optimal stopping problems frequently pop up in real life decision making. I have been collecting a bunch of stopping problems, with the aim of clustering similar problems together, and finding general solution strategies to specific flavors of problems. Here is my (growing) list:

Hiring, Marriage, Parking, Home selling: You are to hire an employee from a pool of N candidates. Each candidate has a fixed integer as its rank, but the rank of a candidate becomes known only after the interview. After each interview, you must decide whether to hire that candidate or to interview the next one. On moving to the next candidate, you loose the opportunity to hire the one you just interviewed. If you hire none of the first N-1 candidates, you must hire the N^th. When should you stop interviewing? Other versions of the same problem: Marriage: Replace candidates with potential mates. House selling: The goal is to sell a house at max profit. Parking: To look for a parking spot so as to park your car nearest to the entrance of a supermarket. You can either choose an empt spot or pass up on it in the hope of getting a better one. Other versions: (i) N is potentially infinite, but there is a cost proportional to the amount of time you take to decide. (ii) In the house selling problem, instead of ranks, one observes a real number as the score of a candidate. The goal is to stop at the candidate with as high a score as possible.

Bug finding, proof-reading: Review of a piece of code or writing has a certain cost per review $C. The number of bugs is a random number with some known distribution. An undetected bug costs $B. Each time a review is done, each bug present has an iid chance of being detected and fixed, with some known probability p. When to stop reviewing?

Gaussian process excursions: You are given a Gaussian process X(n), n = 0, 1, 2.. such that X(n) is Gaussian distributed with mean zero, and a known covariance function K(i,j) between X(i) and X(j). You are allowed to observer a maximum of N successive samples of X(t), and at each step decide whether to stop or continue. The reward upon stopping at is . This problem generalizes problem #1, by allowing observations to be correlated rather than IID.

A drunkard can walk either 1 step west or 1 step east at each time step, with equal probability. When must she stop, in order to maximize her distance from where she started? What if the number of steps is limited to N? Another version of the same problem: You have a shuffled deck of cards and you must draw cards from it, one at a time. A red card gives you $1 and a black card makes you $1 poorer. When should you stop picking cards in order to maximize profit?

You are to toss an unbiased coin repeatedly. When you stop tossing, your reward is the fraction of tosses that came up heads. When do you stop tossing? What if you are allowed only N tosses?

You are to throw an unbiased die a maximum of N times. When you stop throwing, your reward is the number on the upturned face of the die. How do you decide when to stop? This is a bit like the version of the Hiring problem in #1 above in which each candidate has a score rather than just a rank.

Two players A and B play a game. Player A writes down two different integers chosen from the set {1, 2, …, 100} on two separate pieces of paper and places them face downward on a table. Player B’s task is to pick one of the pieces of paper, and either stop if she believe it is the larger of the two number, or pick the 2nd one if she believe the 2nd one is the larger one. What should be the strategy of player B? What should be the strategy of player A if her goal is to make player B loose?

A discrete time stochastic process that produces iid random variables from a known distribution, changes at a certain time T to a different distribution. The task is to detect the change as quickly as posible after it has occurred, assuming the the cost of detection is proportional to the time since the change and that there is a fixed cost to a false alarm.

[One armed Bandit] As a doctor, you have two possible treatments for a certain disease. Treatment A is known to work with probability p_A, and treatment B has an unknown probability of working, but with some prior distribution over the probability of working. Given N patients to be treated sequentially, how do you decide which treatment to use, if the goal is the maximize the number of patients cured. This is called a 1-armed bandit problem because of the following equivalent problem: A casino slot-machine has two lever. Pulling a lever provides the player with either $1 or $0. The lever on the left is known to have a probability of p_left of providing $1 in winnings, while the lever on the right has an unknown probability. How do you go about picking levers if you have N tries? If you have infinitely many tries? These are stopping problems because, if, by virtue of trials of treatment A, if ever becomes known that treatment B is better, then it is optimal to continue using treatment B thenceforth. The problem therefore boils down to one in which one need only find a time at which to switch from using treatment A to treatment B.

Finite number of steps

It turns out that stopping problems are non-trivial to solve when the number of tries is infinite (e.g Question #2 above). But when the number of tries is finite, then dynamic programming is often useful in arriving at an optimal strategy.

For a simple example of how DP can help find an optimal stopping strategy, let us take Question #3 with N=3. A finite horizon provides a way to reason about the problem at each step by working backwards. After having tossed the die 2 times, it makes sense to toss the die a 3rd time only if the expected outcome of the 3rd toss is better than the 2nd toss. Since tosses are IID, the expected outcome of the 3rd toss is 3.5 and therefore it makes sense to go into the 3rd toss only if the 2nd toss is a 1,2 or 3, but not if it is a 4,5, or 6. Applying the same logic, working backwards one stage: after having thrown the die the first time, it makes sense to toss a 2nd time only if the expected reward from continuing the game is higher than the present reward. The expected reward from going into the 2nd throw can be computed by considering two disjoint cases: if the game stops with the 2nd throw and if the game continues into the 3rd throw. This gives: (3/6)(4+5+6)/2 + (3/6)(1+2+3+4+5+6)/2 = 4.25. Therefore the 1st throw is good enough to end the game with only if it is > 4.25, that is, if it is a 5 or a 6.

Problem #1 has been shown to have a relatively simple solution, which was pretty surprising when I first came across it. The solution, for reasonably large values of N, is: observe the first ~37% of the candidates without stopping, and then pick the first one that beats all the ones before it. The intuition is that observing some of the candidates without stopping provides information about the quality of candidates, which provides a baseline for making a decision. Stopping too soon is suboptimal because you do not have sufficient information to make an informed decision. Stopping too late is suboptimal because you do not have much choice left. The 37% solution is provable optimal and is guaranteed to find the best candidate 37% of the time in repeated trials. A proof is readily available by googling.