Administrative info PA3 out in the next day or so Review Recall the coin flipping game from last time. We flip a biased coin that has probability p of heads n times, and for each heads we win $1 and we lose $1 for each tails. We are interested in how much total money we win. In general, we determined that if we flip the coin n times, then W(ω) = 2 H(ω) - n, where H(ω) is the number of heads in ω. We abbreviate this statement as W = 2H - n. We then demonstrated that the distribution of H is Pr[H = i] = C(n, i) p^i (1-p)^(n-i) for integer i, 0 <= i <= n. This is a binomial distribution with parameters n and p, denoted by H ~ Bin(n, p). Once we computed the distribution of H, we computed the distribution of W = 2H - n. We have Pr[W = j] = Pr[2H - n = j] = Pr[H = (j+n)/2] = C(n, (j+n)/2) p^[(j+n)/2] (1-p)^[n-(j+n)/2] for integer (j+n)/2, 0 <= (j+n)/2 <= n. Solving for j, we get -n <= j <= n as the range of values W can take on, but with the caveat that it only takes on even values of n is even and odd values if n is odd so that (j+n)/2 is an integer. Recall the exam example from last time. We had n students, each of who receives a random exam back. We were interested in how many students get their own exam back. Calling this random variable X, we then defined X_i to be an indicator random variable that is 1 if the ith student gets his or her own exam back and 0 otherwise. Then X = X_1 + ... + X_n. We defined the expected value of a random variable X to be E(X) = ∑_{ω ∈ Ω} X(ω) Pr[ω]. or equivalently E(X) = ∑_{a ∈ A} a * Pr[X = a], where A is the set of all values that X can take on. We determined that in the coin flipping example, E(W) = 6p - 3 when n = 3. In the example of passing back exams, for n = 3, we had E(X) = 3 Pr[X=3] + 1 Pr[X=1] + 0 Pr[X=0] = 3/6 + 1/2 = 1. Suppose we roll a fair die. We calculated the expected value of N, the number that shows, as Pr[N=i] = 1/6 for 1 <= i <= 6, and E(X) = 1 Pr[N=1] + 2 Pr[N=2] + ... + 6 Pr[N=6] = 1 * 1/6 + 2 * 1/6 + ... + 6 * 1/6 = 7/2. We then computed in a tedious manner that if we roll two dice, the expected value of their sum S is E(S) = 7. As a final example, suppose we pick 100 Californians at uniformly at random. How many Democrats do we expect out of this group, given that 44.5\% of Californians are Democrat? Intuitively, we'd expect 44.5, but how can we arrive at that without computing a large distribution? Linearity of Expectation Suppose we have two random variables X and Y. Let Z = X + Y. What is E(Z)? We have, from the first definition of expectation, E(Z) = ∑_{ω ∈ Ω} Z(ω) Pr[ω] = ∑_{ω} (X(ω) + Y(ω)) Pr[ω] = ∑_{ω} X(ω) Pr[ω] + ∑_{ω} Y(ω) Pr[ω] = E(X) + E(Y). Thus, E(X+Y) = E(X) + E(Y). We can similarly show that E(cX) = cE(X), where c is a constant. These two facts are known as "linearity of expectation." Linearity of expectation is a powerful tool for computing expectations. We have already seen examples of defining a random variable in terms of other random variables, which allows us to use linearity of expectation. Let's go back to the example of rolling two dice. Let N_1 be the value of the first die, N_2 the value of the second die, and S = N_1 + N_2 the sum. Then we compute E(N_1) = E(N_2) = 7/2. Then E(S) = E(N_1) + E(N_2) = 7, as before. This computation, however, is much simpler than using the distribution of S. In the exam example, let us compute the distribution of X_i, which is 1 if the ith student gets his or her own exam back and 0 otherwise. There are n choices of exam, only one of which is a match, so Pr[X_i=1] = 1/n Pr[X_i=0] = 1 - 1/n. What is E(X_i)? It is E(X_i) = Pr[X_i=1] = 1/n. Note that in general for an indicator random variable Y, E(Y) = Pr[Y=1]. Now let us compute E(X), where X is the total number of students who get their own exam back. Since X = X_1 + ... + X_n, we have E(X) = E(X_1) + ... + E(X_n) = 1/n + ... + 1/n = 1. This matches what we get when n = 3. Notice that the expected number of students who get their own exam back is always 1, regardless of n! This is quite surprising. Let us proceed in the same manner to compute E(H), the expected number of heads when flipping a biased coin n times. Let H_i be an indicator random variable that is 1 if the ith flip is heads, 0 otherwise. Then Pr[H_i] = p, so E(H_i) = p. Then the number of heads is just H = H_1 + ... + H_n, so E(H) = E(H_1) + ... + E(H_n) = p + ... + p = np. Again, this is much simpler than using the distribution of H. In general, for a random variable X ~ Bin(n, p), we have E(X) = np. In our coin flipping game, our winnings W were given by W = 2H - n. Thus, E(W) = 2 E(H) - n = 2np - n = n (2p - 1). (Note that the expectation of a constant E(c) is just c, so E(n) = n.) Plugging in n = 3, we get E(W) = 6p - 3, as before. Again, this method is much easier than using the distribution of W. Finally, how many Democrats do we expect in a group of 100 random Californians? Let D be the number of Democrats, D_i an indicator random variable if the ith person is a Democrat. Then Pr[D_i=1] = 0.445, so E(D_i) = 0.445. Then E(D) = 44.5, as we expected. We could have also noticed that D ~ Bin(100, 0.445) and immediately concluded that E(D) = 100 * 0.445 = 44.5. Geometric Distribution We have already seen one important distribution, the binomial distribution. We will look at two more important discrete distributions. Suppose I take a written driver's license test. Since I don't study, I only have a probability p of passing the test, mostly by getting lucky. Let T be the number of times I have to take the test before I pass. (Assume I can take it as many times as necessary, perhaps by paying a not negligible fee.) What is the distribution of T? (Fun fact: A South Korean woman took the test 950 times before passing.) [Note: By "before passing," we mean that she passed in the 950th attempt, not the 951st. We may use this phrase again.] Before we determine the distribution of T, we should figure out what the sample space of the experiment is. An outcome consists of a series of 0 or more failures followed by a success, since I keep retaking the test until I pass it. Thus, if f is a failure and c is passing, we get the outcomes Ω = {c, fc, ffc, fffc, ffffc, ...}. How many outcomes are there? There is no upper bound on how many times I will have to take the test, since I can get very unlucky and keep failing. So the number of outcomes is infinite! What is the probability of each outcome? Well, let's assume that the result of a test is independent each time I take it. (I really haven't studied, so I'm just guessing blindly each time.) Then the probability of passing a test is p and of failing is c, so we get Pr[c] = p, Pr[fc] = (1-p)p, Pr[ffc] = (1-p)^2 p, ... Do these probabilities add to 1? Well, their sum is ∑_{ω ∈ &Omega} Pr[ω] = ∑_{i=0}^∞ (1-p)^i p = p ∑_{i=0}^∞ (1-p)^i = p 1/(1-(1-p)) [sum of geom. series r^i is 1/r if -1 < r < 1] = 1. So this probability assignment is valid. We continue with this example next time.