MAT109 · Erich Prisner · Franklin College · 2007-2009

Theory Chapter 5:
Sequential Games II:
Perfect Information and Randomness

Note to the Teacher: ....................
Play SEQUENTIAL QUIZ SHOW with three students, 5 real questions, like "Who won the Fields Medal in Mathematics in 1980?".

RANDOM NIM(n,p): n stones lie on the desk. As in NIM, Ann and Beth alternate to remove either one or two stones. The player who has to move but cannot (since there is no stone left) loses, and the payoffs are also identical to the preceding game. However between these moves of Ann and Beth, randomly either 0 or 1 stones are also removed (with probability p and 1-p respectively).

Class Activity: Play at least 20 plays of RANDOM NIM(5,0.5) against the computer . Put in the values "5" and "0.5" into the text field before you start. Try to win more often than lose.

The game is sequential, with two players, but between these moves of the players there are the random moves. Such games are called sequential with randomness, and are discussed in this chapter.

1. Extensive Form extended

To describe and discuss sequential games with some random moves, we need to merge the former concept of an extensive form of a sequential game as described here, and the concept of a probability tree discussed in the probability chapter. In addition to the vertices of the extensive form that correspond to positions of the game where one of the players is supposed to make a decision, we need random vertices. At these vertices the random moves are performed. Therefore, there are arcs out of these random vertices to other vertices, positions that can be reached from these random positions. These arcs are labeled by the appropriate probabilities to reach the other position from the random position. Of course, the probabilities attached to all outgoing arcs of such a random vertex must sum up to 1. It is also understood at the moment that all these probabilities are known to every player; this is part of the complete information requirement. Thus now we have three kinds of vertices: random vertices, position vertices belonging to some player, and end vertices. The start vertex is either a random vertex or belongs to some player.

Here is the digraph for RANDOM NIM(5,0.5). The random vertices are in green, and the probability labels of the random moves are suppressed, since all of them have the value 0.5. The number shown at each vertex indicates the number of stones still on the desk at that time.

2. Analyzing the Game: Backward Induction again

We will analyze such games using backward induction. At each vertex, the expected payoffs of all players will be written. Remember that such expected payoffs are only meaningful if we have at least an interval scale for the payoffs. .......

BAUSTELLE: Note that in reality things are more complicated. Monetary reward is transferred by so-called utility functions into something which is comparable. Note that almost everybody would prefer sure $1,000,000 over an expected payoff of the same amount by getting $2,000,000 with probability 1/2 and nothing with probability 0.
It is quite obvious how to attack such payoffs at random vertices: If all successors of such a vertex have assigned payoffs, we just use the expected values at that random vertex.

Procedure for Backward Induction with Randomness: The goal is to have expected values for each player's payoff assigned at every vertex. As long as not all vertices have values attached, do the following:

(1) Find a vertex with no values attached yet, but with the property that all its successors in the digraph have values attached. Unless all vertices have values, such a vertex can always been found. Call this vertex "V".

(2) At vertex V, either one player, say "X" has to move, or it is a random vertex. In the first case we proceed to step (P3), in the second case to step (R3).

(P3) If V is a player vertex belonging to player X, then we proceed as in sequential games without randomness. We identify the successor vertex of V for which the value for player X is highest. This is the vertex where player X wants to move to (since the values will turn into payoffs eventually). Let's call this other vertex "W". Move to step (P4).

(P4) Then we copy the values of vertex W to vertex V. Go back to step (1).

(R3) If V is a random vertex, then all successors may occur with the given probabilities. Therefore the value at V for any given player is the expected value of the values of that player at the successor vertices, i.e. the sum of the products of probabilities to arrive there and the values. Compute this expected value for all player values. Go back to step (1)

Such games have been systematically investigated by Kuhn. Zermelo's result holds for them accordingly, just for expected payoffs.

Theorem [Kuhn 1953]: Every finite perfect-information sequential game can be analyzed using backward induction. The resulting recommended moves have the property that any player cannot win by changing some of her moves, provided all other players keep playing their moves.

Although Nash equibiria have not been defined for sequential games yet, this condition has the same flavor.

Click here to see the expected values in our example: Ann has a slight advantage in that version of the game.

3. Decision Theory: Alone against Nature

In this section we look at the special case of 1-player games. These games are more optimization problems than games, but backward induction can be demonstrated nicely at them.

1-CANDIDATE QUIZ SHOW: You play a quiz show with up to 6 questions. The questions are worth $1, $1, $2, $4, $8, and $16. After each round, you have the option to stop and take the money you have won so far, or hear the next question. If you have heard the next question, there is no way out---you have to answer it correctly, to go into the next round. If you give the wrong answer, all your money is lost. You also know that in round n the probability of giving a wrong answer is 2/(9-n). (Thus it increases from 2/8 over 2/7, 2/6, 2/5, 2/4 to 2/3 over the six rounds.) When do you stop?
Class Activity: Play the game in this applet.

Here is the extensive form as game tree. At each round you have the option to dare the next question or to stop. If you stop, an edge goes to a leaf, an end situation, and the red number written at that leaf shows how much you have won. If you dare the next question, an edge goes to a green vertex, where a random experiment is performed. This random experiment has two outcomes---the answer can be correct or wrong. The probabilities for the two outcomes are written below the green edge. If the answer was wrong, you have again reached a leaf, a final situation, with a payoff of 0. Otherwise, you are facing the decision whether to stop or to dare the next question of the next round.

We try to fill in the missing expected payoffs starting at the end. If you have reached the rightmost green random vertex, i.e. if you have answered the first five questions successfully and dared to answer the last one, your expected payoff will be (2/3)·0 + (1/3)·32 = 32/3. Therefore, at the player's last decision vertex, you have to decide between expected payoff of 32/3 if you dare the last question, and a sure payoff of 16 if you stop. Since 16 > 32/3, it is rational to stop there. Therefore every rational player will stop there, and the expected payoff at that vertex is 16. Now at the second last random vertex, if you are trying to answer the fifth question, your expected payoff is (2/4)·0 + (2/4)·16 = 8. Therefore if you face the decision whether to allow the fifth question (after answering four questions successfully), you face the choice between an expected payoff of 8 if you dare the next question, and a sure payoff of 8 if you stop. We make the assumption here that you always go for the higher payoff, expected or sure, therefore in this case you would be indifferent, but expect 8 at that decision.

If you now go one step further to the right, the expected payoff of the next (random) vertex is (2/5)·0 + (3/5)·8 = 24/5, which is higher than the 4 you would get by stopping after the third question. So you would in any case try to answer the fourth question, and never try to answer the last question. If this procedure is continued, we get an expected payoff of 12/7 at the very beginning.

5 Envelopes

The amount of $10000 is distributed into 5 envelopes. One envelope, the "bomb", remains empty. In the basic version the player knows how much is put into each envelope, let's say $1000, $2000, $3000, $4000, and 0. Then the envelopes are shuffled and in each round you, the player, can either choose to get one of the remaining envelopes or keep the money you have collected so far and quit. If you ask for a new envelope, randomly one of the remaining envelopes is selected and handed to the player. If this envelope contains nothing (the bomb), the player you lose all the money you collected in the previous envelopes, otherwise you add the money to the money taken so far.

Class Activity: Play it in this applet.

The extensive form can be seen here. Since it is so huge, and since the situations where you have drawn first $2000, and then $3000, or where you have drawn first $3000, and then $2000 are really identical situation for you, it is better to identify such situations and move to the digraph. This digraph can be seen in this Excel sheet, where backwards induction is done automatically. There are five envelopes, containing amounts A, B, C, D, and 0. You can change the amounts A, B, C, D by typing in different numbers into cells B4 to B7. Actually the random vertices are omitted, all vertices are decision vertices, they are labeled by the bills received so far, and they are colored black if they are final situations (with nothing to decide), red if you rather stop, and green if you rather proceed. The expected value if proceeding is shown in the grey cell below the situation name, and the payoff if stopping, the accumulated payoff, is shown in the yellow cell. In each case, the payoff of that decision vertex is the maximum of both values. Therefore the expected payoff of the original version is $3200.

Instead of giving an analysis, which is left as a project, let us just give a few hints. The extensive form can be seen here. Since it is so huge, and since the situations where you have drawn first $2000, and then $3000, or where you have drawn first $3000, and then $2000 are really identical situation for you, it is better to identify such situations and move to the digraph. If we label the envelopes by A, B, C, D, and E, E containing 0, then the vertices where the player has to decide are that where nothing has been drawn yet (the start position), the four cases where one of A, B, C, or D has been drawn, the six cases where two of A, B, C, D have been drawn, the four cases where three of A, B, C, D have been drawn, and maybe also the case where four envelopes have been drawn, A, B, C, and D. The other cases, where envelope E was among those drawn, are end vertices and don't require a decision from the player. Thus 1+4+6+4+1 vertices belong to the player. Furthermore, for each such situation, provided the player continues, there is a random vertex. Thus the game digraph would have 1+4+6+4+1=16 vertices belonging to the player, 16 random vertices, and the end vertex where the player has drawn envelope E and gains nothing, and the cases where the player stops after having drawn some envelopes. There are 4+6+4+1=15 such cases, so there are 16 end vertices in the game digraph.

OH-NO

The player rolls a die repeatedly. After each roll, as much dollars as the die shows are added to a heap, which initially is empty. The game ends if either

Class Activity: Play it in this applet.

Analysis: There is obviously only one player, playing against Nature. Unfortunately it is not a finite game---the extensive form is infinite. For this reason, backward induction cannot be applied, but we can still solve the game using a similar idea. Intuition may tell us that we want to roll the die again provided the heap is small, but may want to stop if the heap has reached some size. The question is: From which heap size on would we stop?
 
The positions for the player are determined by the value of the heap obtained so far, so we denote these positions by integers. Each position n has an expected value v(n) for the player attached---the expected payoff the player would obtain when having reached that position and playing optimally from then on. Since in every position the player has the option to stop, obtaining the value of the heap as payoff, v(n) ≥ n for all natural numbers n. If the player is at position n and continues, he or she will expect
(1/6)·0 + (1/6)·(v(n+2)) + (1/6)·(v(n+3)) + (1/6)·(v(n+4)) + (1/6)·(v(n+5)) + (1/6)·(v(n+6)).
Since we started just at any position n, and don't use backward induction, we don't know the expected values at these positions n+2, ... n+6 yet, but we know that they are at least n+2, ... n+6. Therefore, the player at position n who continues to play expects at least
(1/6)·(n+2) + (1/6)·(n+3) + (1/6)·(n+4) + (1/6)·(n+5) + (1/6)·(n+6) = (1/6)·(5n+20).
That means that the player will continue playing if this value (1/6)·(5n+20) is larger than n, i.e. if 5n+20 > 6n, or 20 > n.
 
We still don't know what to do when facing heap sizes of 20 or more, but using the assumption that the best strategy is to continue for heap sizes below some critical value m and stop after, we can easily obtain that 20 is this critical heap size. Thus the best strategy is to roll until you have 20 points. Then you stop. The expected value, however, is smaller than 20 if using this optimal strategy. How large is it?

But ...

... what about the two approaches we have learned so far: Normal (Matrix) Form for simultaneous games and the Extensive Form for sequential games. Is each form limited to these restricted sort of games? The answer is no, just go on reading.
...
...

References

Exercises