Very often games contains some random features. In poker and other card games, the first move is not done by the players but by the dealer, who deals the cards randomly. Other possibilities are that outcomes don't only depend on the players' moves, but also on some random elements. Or, if the outcomes only depend on the player's moves, still the payoffs attached to these outcomes may change randomly. Later we will also see that even some completely deterministic games, like ROCK-SCISSORS-PAPER are best played using some random device. For this reason, the theory of probability has to be discussed.
A lot of mathematical reasoning is concerned with trying to predict outcomes of certain situations. If I create a sphere of radius 20cm, how large will the surface area be? If I put together 20ml of a 20% acid solution and 30ml of a 50% acid solution, what kind of solution do I get? What will the speed of a falling apple be after 0.1 seconds? If the roulette ball has an initial speed of 3 meter per seconds, and a given initial direction, and the roulette wheel is spinning at a certain speed, where will the ball end? If the interest rate increases by 0.05, and productivity increases by 3%, how will it affect unemployment rate? Unfortunately, in many situations, models from the sciences, social sciences, or economics are not strong enough to predict these outcomes. Or, the models may be accurate, but the data available is just not sufficient, which is the case in the roulette example. In those cases, the outcomes seem random or unpredictable to us.
This is where Probability Theory and Statistics enter. There is usually a better prediction than "I don't know what the outcome will be, it is random anyway." Although you cannot predict the outcome in these cases, you may be able to describe the likelihood of the different possible outcomes. And this may tell you "what you may expect". Of course, expectations are not always what you get, but we will see that in the long run they will.
Probability Theory uses some terminology which we have to get used to. For instance, the situations whose possible outcomes are to be predicted are called experiments. Of course, this notion implies that, like scientific experiments, these experiments are not singular and isolated, but can rather be repeated over and over in a controlled way since their settings and conditions are very clearly defined. Examples are rolling a die, throwing a coin, or selecting a card from a shuffled 52-card deck. But the weather tomorrow, or the outcome of elections, are also experiments. We concentrate on one feature of the experiment, like which number the die shows, or what card is chosen, thereby neglecting other possible features (as, for instance, how long the die rolls). Therefore when describing an experiment you should always describe the procedure and what you are looking at.
Very often only finitely many possible outcomes are possible in these experiments---these outcomes are called the simple events. (Non-simple) events are groups of simple events. For instance, each one of the 52 possible cards is a simple event of the experiment of drawing a card from a shuffled deck of cards. Examples of events are all aces (combining four simple events) or all diamonds (combining 13 simple events), but an event does not have to be expressible as elegantly as in these two examples. Sometimes events can only be described by listing all the simple events they consist of. For instance, drawing a club King, or heart 8, or diamond 3 is also an event. Formal mathematics language calls these "groups" of simple event "subsets" of the set of simple events.
The task of Probability Theory is to assign a value p(A) to each event A. This value p(A) is called the probability that the outcome of the experiment belongs to A. These numbers are between 0 and 1, where 0 means "impossible" and 1 means "absolutely certain", but everything between these two extremes is possible.
There are different ways to obtain these probabilities. One way is to perform the experiment often and measure how often the event occurs. If you perform the experiment n times, and if in k of the cases event A occurs, then the empirical probability p(A) of that event is defined to be the relative frequency k/n of the event. An obvious disadvantage is that we have to perform the experiment repeatedly, and also that the probabilities we get vary slightly.
However, we are considering so-called a priori models here, where we want to avoid having to perform the experiment at all. Rather we are interested in obtaining values of theoretical probabilities p(A) using mathematical reasoning (theory). These values should predict the relative frequency (empirical probability) as described above in the following sense:
Assume that the theoretical probability for the event "head" in the experiment "flipping a coin" is 1/2. Assume that frequencies of "head" after 100, 1000, and 10000 throws are 49, 550, and 5200. Then the corresponding relative frequencies are 49/100=0.49, 550/1000=0.55, and 5200/10000=0.52. The "closer and closer" part stated above does not say that these relative frequencies cannot temporarily move away from 0.5, like in the example from the close 0.49 to the a little disappointing 0.55. What it says is that eventually the relative frequency will come closer to 0.5 than it was before almost surely. We can also formulate it like this. A deviation of, say 0.03 (value below 0.47 or above 0.53) is always possible, but the probability for such a fixed deviation decreases as the number of trials increases. Actually the probability for such a fixed deviation in the relative frequency approaches 0. But note also that the absolute deviation can always exceed a fixed value. In the example above, the absolute deviation between the number of heads and the predicted number of heads is -1, 50, 200, and these numbers may increase slowly as the number of trials increases.
Let me mention a common misconception of the Law of Large Numbers. You throw a coin and get a sequence of "Head", "Tail", "Tail", "Head", "Tail", "Head", "Tail", "Tail", "Tail", "Tail", "Tail". How likely is "Head" next? "Many people would say it is higher than 50%, since it is "due". Others may question the fairness of the coin and rather predict another "Tail". However, if we know that we have a fair coin, that the theoretical probability of 50% is the right one, then the probability for "Head" is still exactly 50% for all future rounds. Probability Theory and the Law of Large Numbers are not "fair" in the way of actively fighting against an existing deviation. Rather this deviation may absolutely persist, but then the relative deviation will still go to 0.Note that the probabilities of all simple events always add up to 1, since the event containing all simple events will occur surely. Therefore it is simple to find these probabilities when it is known that all outcomes are equally likely (like for a fair die). If there are n possible outcomes, n simple events, then each probability equals 1/n.
Note that the probability p(A) of any event A equals the sum of the probabilities of the simple events contained in it. If all simple events have the same probability then p(A) equals the number of simple events in A divided by the total number of simple events.
More difficult is the situation when the outcomes are not equally likely, like for a biased die. Sometimes one has to use the empirical probability p(A), which is defined as the relative frequency, the number of observed outcomes of A divided by the total number of experiments. In such a case it is sometimes possible to look at the situation differently, to define the simple events differently in such a way that they have again equal probabilities. Take the following three examples:
Very often, a numerical value is attached to each outcome of an experiment. In our context of games it usually expresses how valuable the outcome is, the payoff for somebody. Such a value is usually called a random variable, abbreviated by a letter, say X. If the simple events (possible outcomes) of an experiment are A1, A2,... An and the payoffs, the amounts attached to these simple events are X1, X2,... Xn, respectively, then how high a payoff would you expect? Probably the average of all these values, provided all outcomes are equally likely, i.e. if p(A1) = p(A2) = ... = p(An) = 1/n. Note that the average in this case equals the sum of the products p(Ai)·Xi of probabilities and amount attached. If the outcomes are not equally likely, the expectation would be tilted a little in the direction of the payoffs attached to the more likely outcomes. Instead of an average, we take a weighted average, where the probabilities of the outcomes are the weights. Therefore, the expected value for the random variable X is defined by
E(X) = p(A1)·X1 + p(A2)·X2 + ... + p(An)·Xn.
Although the expected value is usually not just the average of all possible values the random variable can take, it is an average. The same as the probability of an outcome approaches the relative frequency of that outcome if one repeats the experiment often, the expected value of a random variable approaches the average of the random variable when repeating the experiment often.
Therefore the expected value of a random value is not the value that occurs most frequently. This value is usually called "mode" but not used in our context.
Note that expected values can only be computed for numerical values. If you a draw a marble from a container containing six blue ones and six yellow ones, the expected outcome is not a green marble. Actually there is no such thing as "expected outcome", just expected value of the payoff/random variable.
Assume there are three possible outcomes: "bad", "good", and "very good", and let's assume we encode these outcomes by the numbers 0, 1, and 2. Then would you prefer a 60% chance of a very good outcome and a 40% chance of a bad outcome over a good outcome? Well, the arithmetic is easy: 60%·2+40%·0 = 1.2 > 1, so you would. However one could question whether the assignment of the numbers 0, 1, and 2 to the three outcomes is really appropriate. Maybe the difference between "very good" and "good" is small, but the difference between "good" and "bad" is large? In that case, risking a bad outcome in 40% of the cases would seem rather reckless.
There are four types of measurement scales. A nominal scale is the weakest one. There are only possible outcomes, like "red", "green", "blue", for example, but no natural order in it. This type of measurement scale cannot be used for payoffs/random variables, since they must be expressed in numbers, and in order to do so, the possible outcomes must have a natural ordering. Features of outcomes are ordinal if there is such a natural ordering. An example is given above with "bad", "good", and "very good". However, in order to use such outcomes for expected values, the differences between the outcomes must also be comparable. This third level is called an interval scale. In the previous example, the outcomes are interval if we can tell whether "good" is closer to "bad" or to "very good", and to what extend. Then of course encoding by 0,1,2 may not be appropriate, maybe rather 0,2,3 is more appropriate. The final fourth level is a ratio scale, having a natural zero, but for our purposes, in expected values and games, we will never require ratio scales.
Some experiments consist of two or more sub-experiments which are
performed sequentially. Such an experiment is called multistep experiment.
In the simplest example,
we first have an experiment with possible outcomes A and B. If the outcome is A, a second experiment is
performed with possible outcomes C and D. If the outcome of the first experiment is B, a second
experiment with possible outcomes E and F is performed.
The graph to the right visualizes the situation, and it is called a
probability tree. Starting with the leftmost vertex,
one of the two vertices in the middle is reached after the first experiment, and then
one of the four vertices to the right can be reached.
The probabilities for the corresponding outcome are written below the corresponding line.
Of course the similarity to extensive forms or Game Trees in sequential games is obvious.
What was called position there is now called
situation---the combination of all what is known so far.
Now how likely is outcome A in the first round A and outcome C in the second round? The probability for that combined event is the product of the probabilities on the arcs, p(A)·p(C).
Very often in multistep experiments, the possible outcomes of the second-round experiments
are the same, but the probabilities for these outcomes differ depending on the situation, on the outcome
of the first-round experiment. An example is shown to the right.
Since more than one arc in the tree has C as outcome,
but the probabilities are maybe different, we need a more involved notation.
Let p(C|A) denote a conditional probability,
the probability for outcome C provided we are in situation A.
Thus p(C|A) and p(C|B) may not be identical.
Totally, for the combined experiment, four
different outcomes are possible: A and C (A&C), A and D (A&D), B and C (B&C), and B and D (B&D).
How likely is each? It turns out that the total probability for each of the combined outcomes,
of the four vertices to the right, are the products of the probabilities of the path
leading to this vertex when starting at the leftmost vertex. Therefore
Example 9: AIDS tests, even when good and useful, are not perfect. Assume your hospital uses one that gives gives a positive result in 97.7% of the cases if the person has the AIDS virus, and a negative result in 92,6% of the cases where the person is not infected. Further assume that 0.5% in the population in your city carries the AIDS virus.
If someone from your city tells you that he or she has been tested positive with this test, what is the probability that he or she really carries the virus? [A. Rossman, B. Chance 2004]
We model this as a two-step experiment.
In the first round, the outcome is AIDS infected ("A") with probability 0.5% and not AIDS
infected ("H" for healthy) with probability 99.5%. Then, in the second step, the test is positive ("P")
or negative ("N") with probabilities depending on the outcome of the first step.
The first and third outcome of the two-step experiment combined are the outcome of a positive test.
The first outcome of the two-step experiment means the person has AIDS and is tested positive.
The probability for that equals
P(A)·P(P|A) = 0.5%·97.7% = 0.4985%
The third outcome means the person is not AIDS infected but still has been tested positive.
The probability for that outcome equals
P(H)·P(P|H) = 99.5%·7.4% = 7.363%
Then the probability that the person tested positive is AIDS infected equals
0.4985%/(0.4985%+7.363%) = 6.34%, so there is not real reason for panic yet.
Let's say we draw three cards from a stack. A, C, respectively E mean a red card in the first,
second, respectively third drawing, and B, D, respectively F mean a black card in the first,
second, respectively third drawing. Assume that at the end we don't care about the ordering
but rather only on how many black and red cards are drawn altogether. Therefore
we don't distinguish between A&C&F and A&D&E and B&C&E---in each of these three cases
we have two red cards and one black one.
Also after the second drawing we don't have to distinguish
between the situation A&D --- a red card in the first drawing and a black one in the second---
and B&C---a black card in the first drawing and a red one in the second. In both cases we have
one red and one black cards in the hand, and 25 red cards and 25 black cards remaining in the stack.
So the prospects for the future, for the third drawing are also identical.
This is expressed by the fact that both subtrees starting at A&D respectively B&C are identical.
In these cases we are allowed to identify these states (and also the subtrees starting at these
vertices). What we get by doing so is called a
probability digraph.
Now in such a digraph, the probability for a combined event
like A&C&F would be obtained like this. You look at all paths starting at the leftmost (start) vertex,
always moving to the right, and arriving at the vertex in question. For each of these
paths you multiply the probabilities along all edges. And then you make the sum of all these
products, summed over all these paths. In the example above, there are three such paths,
A,C,F, A,D,E, and B,C,E. Therefore
P(A&C&F=A&D&E=B&C&E) =
p(A)·p(C|A)·p(F|A&C) +
p(A)·p(D|A)·p(E|A&D) +
p(B)·p(C|B)·p(E|B&C).
Please see the following for another application of this method, and also for remarks on the history of probability theory.
Let's now turn back to Games. Very often, the payoffs in a simultaneous game do not only depend on the choices of the players, but also have a random component. Consider the following example:
Later we will see that sequential games involving randomness are modeled by defining a random player, called "Nature". For simultaneous games this is not necessary. All we have to do is to replace the payoffs by expected payoffs.
| 2 | 4 | 5 | |
| 2 | 9500, 9500 | 13500, 11000 | 13500,13750 |
| 4 | 11000, 13500 | 19000, 19000 | 27000, 13750 |
| 5 | 13750, 13500 | 13750, 27000 | 23750, 23750 |
The probability of an event equals the sum of the probabilities of the simple events "contained" in that event. Therefore, if we know all probabilities of the simple events, computing these probabilities is rather simple. This is in particular true in the many cases where all simple events have equal probabilities. Still, in some cases manually counting all these simple events seems to be a little too tedious. For instance, let A be the event that a randomly chosen card from a 52 card deck is either a King or red. So how many cards, simple events, does A contain? .................