2
Stochastic games on graphs Games for synthesis - Reactive system synthesis = finding a winning strategy in a two-player game Game played on a graph - Infinite number of rounds - Player’s moves determine successor state - Outcome = infinite path in the graph When is randomness more powerful ? When is randomness for free ?

3
Games for synthesis - Reactive system synthesis = finding a winning strategy in a game Game played on a graph - Infinite number of rounds - Player’s moves determine successor state - Outcome = infinite path in the graph When is randomness more powerful ? When is randomness for free ? … in game structures ? … in strategies ? Stochastic games on graphs

4
Interaction: how players' moves are combined. Information: what is visible to the players. Classification according to Information & Interaction Round 1 Stochastic games on graphs

5
Interaction: how players' moves are combined. Information: what is visible to the players. Classification according to Information & Interaction Round 2 Stochastic games on graphs

6
Interaction: how players' moves are combined. Information: what is visible to the players. Classification according to Information & Interaction Round 3 Stochastic games on graphs

7
Mode of interaction Classification according to Information & Interaction Interaction General case: concurrent & stochastic Player 1’s move Player 2’s move Probability distribution on successor state Players choose their moves simultaneously and independently -player games

8
Mode of interaction Classification according to Information & Interaction Interaction General case: concurrent & stochastic Player 1’s move Player 2’s move Probability distribution on successor state Players choose their moves simultaneously and independently -player games (MDP) if A 1 or A 2 is a singleton.

20
Value Probability of finite prefix of a play: induces a unique probability measure on measurable sets of plays: and a value function for Player 1:

21
Value Probability of finite prefix of a play: induces a unique probability measure on measurable sets of plays: and a value function for Player 1: Our reductions preserve values and existence of optimal strategies.

22
Outline Randomness in game structures - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

23
Preliminary Rational probabilitiesProbabilities only 1. Make all probabilistic states have two successors

30
Probability to go from s to s’ 0 : Reduction Simulate probabilistic state with concurrent state

31
Reduction Simulate probabilistic state with concurrent state Probability to go from s to s’ 0 : If, then

32
Probability to go from s to s’ 0 : If, then Reduction Simulate probabilistic state with concurrent state

33
Probability to go from s to s’ 0 : If, then Simulate probabilistic state with concurrent state Each player can unilaterally decide to simulate the original game. Reduction

34
Simulate probabilistic state with concurrent state Player 1 can obtain at least the value v(s) by playing all actions uniformly at random: v’(s) ≥ v(s) Player 2 can force the value to be at most v(s) by playing all actions uniformly at random: v’(s) ≤ v(s) Reduction

35
For {complete,one-sided,partial} observation, given a game with rational probabilities we can construct a concurrent game with deterministic transition function such that: and existence of optimal observation-based strategies is preserved. The reduction is in polynomial time for complete-observation parity games. Reduction

36
Information leak from the back edges… Partial information

37
Information leak from the back edges… Partial information No information leakage if all probabilities have same denominator q  use 1/q=gcd of all probabilities in G

38
Information leak from the back edges… Partial information No information leakage if all probabilities have same denominator q  use 1/q=gcd of all probabilities in G The reduction is then exponential.

39
Example

40
Outline Randomness in game structure - for free with complete-observation, concurrent - for free with one-sided, turn-based Randomness in strategies - for free in (PO)MDP Corollary: undecidability results

42
Reduction Simulate probabilistic state with imperfect information turn-based states Player 1 states Player 2 state Player 1 observation

43
Reduction Each player can unilaterally decide to simulate the probabilistic state by playing uniformly at random: Player 2 chooses states (s,0), (s,1) unifiormly at random Player 1 chooses actions 0,1 unifiormly at random Simulate probabilistic state with imperfect information turn-based states

44
Games with rational probabilities can be reduced to turn- based-games with deterministic transitions and (at least) one-sided complete observation. Values and existence of optimal strategies are preserved. Reduction

45
Randomness for free In transition function (this talk):

46
When randomness is not for free Complete-information turn-based ( ) games - in deterministic games, value is either 0 or 1 [Martin98] - MDPs with reachability objective can have values in [0,1] Randomness is not for free. -player games (MDP & POMDP) - in deterministic partial info -player games, value is either 0 or 1 [see later] - MDPs have value in [0,1] Randomness is not for free.

54
Randomness for free in strategies For every randomized observation-based strategy, there exists a pure observation-based strategy such that: 1½-player game (POMDP), s 0 initial state. Proof. (assume alphabet of size 2, and fan-out = 2) Given σ, we show that the value of σ can be obtained as the average of the value of pure strategies σ x :

60
Randomness for free in strategies Given an infinite sequence x=(x n ) n≥0  [0,1] ω, define for all s 0, s 1, …, s n : σ x is a pure and observation-based strategy ! σ x plays like σ, assuming that the result of the coin tosses is the sequence x. The value of σ is the « average of the outcome » of the strategies σ x.