RANDOM NIM(n,p): n stones lie on the desk. As in NIM, Ann and Beth alternate to remove either one or two stones. The player who has to move but cannot (since there is no stone left) loses, and the payoffs are also identical to the preceding game. However between these moves of Ann and Beth, randomly either 0 or 1 stones are also removed (with probability p and 1-p respectively).
The game is sequential, with two players, but between these moves of the players there are the random moves. Such games are called sequential with randomness, and are discussed in this chapter.
To describe and discuss sequential games with some random moves, we need to merge the former concept of an extensive form of a sequential game as described here, and the concept of a probability tree discussed in the probability chapter. In addition to the vertices of the extensive form that correspond to positions of the game where one of the players is supposed to make a decision, we need random vertices. At these vertices the random moves are performed. Therefore, there are arcs out of these random vertices to other vertices, positions that can be reached from these random positions. These arcs are labeled by the appropriate probabilities to reach the other position from the random position. Of course, the probabilities attached to all outgoing arcs of such a random vertex must sum up to 1. It is also understood at the moment that all these probabilities are known to every player; this is part of the complete information requirement. Thus now we have three kinds of vertices: random vertices, position vertices belonging to some player, and end vertices. The start vertex is either a random vertex or belongs to some player.
Here is the digraph for RANDOM NIM(5,0.5). The random vertices are in green, and the probability labels of the random moves are suppressed, since all of them have the value 0.5. The number shown at each vertex indicates the number of stones still on the desk at that time.

We will analyze such games using backward induction. At each vertex, the expected payoffs of all players will be written. Remember that such expected payoffs are only meaningful if we have at least an interval scale for the payoffs. .......
|
(1) Find a vertex with no values attached yet, but with the property that all its successors in the digraph have values attached. Unless all vertices have values, such a vertex can always been found. Call this vertex "V". (2) At vertex V, either one player, say "X" has to move, or it is a random vertex. In the first case we proceed to step (P3), in the second case to step (R3). | |
|
(P3) If V is a player vertex belonging to player X, then we proceed as in sequential games without randomness. We identify the successor vertex of V for which the value for player X is highest. This is the vertex where player X wants to move to (since the values will turn into payoffs eventually). Let's call this other vertex "W". Move to step (P4). (P4) Then we copy the values of vertex W to vertex V. Go back to step (1). |
(R3) If V is a random vertex, then all successors may occur with the given probabilities. Therefore the value at V for any given player is the expected value of the values of that player at the successor vertices, i.e. the sum of the products of probabilities to arrive there and the values. Compute this expected value for all player values. Go back to step (1) |
Such games have been systematically investigated by Kuhn. Zermelo's result holds for them accordingly, just for expected payoffs.
Although Nash equibiria have not been defined for sequential games yet, this condition has the same flavor.
In this section we look at the special case of 1-player games. These games are more optimization problems than games, but backward induction can be demonstrated nicely at them.
Here is the extensive form as game tree. At each round you have the option to dare the next question or to stop. If you stop, an edge goes to a leaf, an end situation, and the red number written at that leaf shows how much you have won. If you dare the next question, an edge goes to a green vertex, where a random experiment is performed. This random experiment has two outcomes---the answer can be correct or wrong. The probabilities for the two outcomes are written below the green edge. If the answer was wrong, you have again reached a leaf, a final situation, with a payoff of 0. Otherwise, you are facing the decision whether to stop or to dare the next question of the next round.
The amount of $10000 is distributed into 5 envelopes. One envelope, the "bomb", remains empty. In the basic version the player knows how much is put into each envelope, let's say $1000, $2000, $3000, $4000, and 0. Then the envelopes are shuffled and in each round you, the player, can either choose to get one of the remaining envelopes or keep the money you have collected so far and quit. If you ask for a new envelope, randomly one of the remaining envelopes is selected and handed to the player. If this envelope contains nothing (the bomb), the player you lose all the money you collected in the previous envelopes, otherwise you add the money to the money taken so far.
The extensive form can be seen here. Since it is so huge, and since the situations where you have drawn first $2000, and then $3000, or where you have drawn first $3000, and then $2000 are really identical situation for you, it is better to identify such situations and move to the digraph. This digraph can be seen in this Excel sheet, where backwards induction is done automatically. There are five envelopes, containing amounts A, B, C, D, and 0. You can change the amounts A, B, C, D by typing in different numbers into cells B4 to B7. Actually the random vertices are omitted, all vertices are decision vertices, they are labeled by the bills received so far, and they are colored black if they are final situations (with nothing to decide), red if you rather stop, and green if you rather proceed. The expected value if proceeding is shown in the grey cell below the situation name, and the payoff if stopping, the accumulated payoff, is shown in the yellow cell. In each case, the payoff of that decision vertex is the maximum of both values. Therefore the expected payoff of the original version is $3200.
Instead of giving an analysis, which is left as a project, let us just give a few hints. The extensive form can be seen here. Since it is so huge, and since the situations where you have drawn first $2000, and then $3000, or where you have drawn first $3000, and then $2000 are really identical situation for you, it is better to identify such situations and move to the digraph. If we label the envelopes by A, B, C, D, and E, E containing 0, then the vertices where the player has to decide are that where nothing has been drawn yet (the start position), the four cases where one of A, B, C, or D has been drawn, the six cases where two of A, B, C, D have been drawn, the four cases where three of A, B, C, D have been drawn, and maybe also the case where four envelopes have been drawn, A, B, C, and D. The other cases, where envelope E was among those drawn, are end vertices and don't require a decision from the player. Thus 1+4+6+4+1 vertices belong to the player. Furthermore, for each such situation, provided the player continues, there is a random vertex. Thus the game digraph would have 1+4+6+4+1=16 vertices belonging to the player, 16 random vertices, and the end vertex where the player has drawn envelope E and gains nothing, and the cases where the player stops after having drawn some envelopes. There are 4+6+4+1=15 such cases, so there are 16 end vertices in the game digraph.
The player rolls a die repeatedly. After each roll, as much dollars as the die shows are added to a heap, which initially is empty. The game ends if either