Administrative info HW6 out Review Recall that a random variable is a function from the sample space Ω to the real numbers R. It assigns a real number to each sample point. For each value a that a random variable X can take on, X = a is an event that consists of all the outcomes ω for which X(ω) = a. This event has probability Pr[X = a] = ∑_{ω : X(ω)=a} Pr[ω]. The events X = a partition the sample space, so they are disjoint for distinct a and their union is the sample space. The distribution of a random variable X is the set of probabilities Pr[X = a] for every possible value of a. The probabilities must add up to 1, since the X = a partition the sample space. Recall the coin flipping game from last time. We flip a biased coin that has probability p of heads n times, and for each heads we win $1 and we lose $1 for each tails. We are interested in how much total money we win. Let's call W our winnings. If we flip the coin three times, then W(hhh) = 3 W(hth) = 1 W(thh) = 1 W(tth) = -1 W(hht) = 1 W(htt) = -1 W(tht) = -1 W(ttt) = -3 In general, we determined that if we flip the coin n times, then W(ω) = 2 H(ω) - n, where H(ω) is the number of heads in ω. We abbreviate this statement as W = 2H - n. We then demonstrated that the distribution of H is Pr[H = i] = C(n, i) p^i (1-p)^(n-i) for integer i, 0 <= i <= n. This is a binomial distribution with parameters n and p, denoted by H ~ Bin(n, p). As another example, suppose we throw n balls into m bins, uniformly at random, with each ball thrown independently of the others. Let B_1 be the number of balls in bin 1. What is the distribution of B_1? Each ball has probability 1/m of going into the first bin, and there are n balls. Let's make an analogy with coin flipping. Let's refer to throwing a ball as flipping a coin, and a ball going into bin 1 as a heads. Then this is the same as flipping a coin n times with a bias of p = 1/m, so the distribution of B_1 is B_1 ~ Bin(n, 1/m). Anytime we have an experiment with n mutually independent trials, each of which has probability p of success, then the number of successes X has distribution X ~ Bin(n, p). Once we computed the distribution of H, we computed the distribution of W = 2H - n. We have Pr[W = j] = Pr[2H - n = j] = Pr[H = (j+n)/2] = C(n, (j+n)/2) p^[(j+n)/2] (1-p)^[n-(j+n)/2] for integer (j+n)/2, 0 <= (j+n)/2 <= n. Solving for j, we get -n <= j <= n as the range of values W can take on, but with the caveat that it only takes on even values of n is even and odd values if n is odd so that (j+n)/2 is an integer. This demonstrated the value of writing one random variable in terms of others. Computing the distribution of W directly would have been more difficult than computing it from H. Another example of writing a random variable in terms of others is with indicator random variables. Recall the exam example from last time. We had n students, each of who receives a random exam back. We were interested in how many students get their own exam back. Calling this random variable X, we then defined X_i to be an indicator random variable that is 1 if the ith student gets his or her own exam back and 0 otherwise. Then X = X_1 + ... + X_n. In the case of n = 3, we had (1,2,3) (1,3,2) (2,1,3) (2,3,1) (3,1,2) (3,2,1), and the values of the X_i were 1 1 0 0 0 0 X_1 1 0 0 0 0 1 X_2 1 0 1 0 0 0 X_3, so the values of X were 3 1 1 0 0 1 X. Expectation Sometimes we are interested in less information than the entire distribution of a random variable. In particular, we may be interested in an "average" value. For example, if I play the coin flipping game many times, I want to know how much I'd win on average in each game. Let's review the distribution of W_f, the amount won, when the game has 3 flips and uses a fair coin. Pr[W_f=3] = 1/8 Pr[W_f=1] = 3/8 Pr[W_f=-1] = 3/8 Pr[W_f=-3] = 1/8 What is the "average" value? It looks like it should be 0. What if we were using a coin with bias p? Then the amount won W_p has distribution Pr[W_p=3] = p^3 Pr[W_p=1] = 3 p^2 (1-p) Pr[W_p=-1] = 3 p (1-p)^2 Pr[W_p=-3] = (1-p)^3 Now what is the "average" value? Now it's not obvious what it is. Formally, we define the "expected value" of a random variable X to be E(X) = ∑_{ω ∈ Ω} X(ω) Pr[ω]. Each sample point ω contributes its value of the random variable X(ω) to the expected value according to how likely the outcome is. The expected value is also known as the "expectation," "average," or "mean." Since X(ω) = a for any outcome in X = a, Pr[X = a] = ∑_{ω : X(ω)=a} Pr[ω], and the X = a partition the sample space, we can equivalently write E(X) = ∑_{a ∈ A} a * Pr[X = a], where A is the set of all values that X can take on. Now that we have a formal definition for expectation, let us determine the expectation of W_f and W_p. For W_f, we have E(W_f) = 3 Pr[W_f=3] + 1 Pr[W_f=1] - 1 Pr[W_f=-1] - 3 Pr[W_f=-3] = 3 * 1/8 + 1 * 3/8 - 1 * 3/8 - 3 * 1/8 = 0, as expected. For W_p, we have E(W_p) = 3 Pr[W_p=3] + 1 Pr[W_p=1] - 1 Pr[W_p=-1] - 3 Pr[W_p=-3] = 3 * p^3 + 1 * 3 p^2 (1-p) - 1 * 3 p (1-p)^2 - 3 * (1-p)^3 = 3p^3 + 3p^2 - 3p^3 - 3p + 6p^2 - 3p^3 - 3 + 9p - 9p^2 + 3p^3 = 6p - 3. In the example of passing back exams, we had (1,2,3) (1,3,2) (2,1,3) (2,3,1) (3,1,2) (3,2,1) 3 1 1 0 0 1 X. What is E(X)? Using the first definition, we get E(X) = 3 Pr[(1,2,3)] + 1 Pr[(1,3,2)] + ... = 3 * 1/6 + 1 * 1/6 + 1 * 1/6 + 0 * 1/6 + 0 * 1/6 * 1 * 1/6 = 1. We can also compute this using the distribution: Pr[X=3] = 1/6 Pr[X=1] = 1/2 Pr[X=0] = 1/3 E(X) = 3 Pr[X=3] + 1 Pr[X=1] + 0 Pr[X=0] = 3/6 + 1/2 = 1. Suppose we play a game of roulette. There are 38 slots on a roulette wheel, 18 of which are black, 18 of which are red, and two of which are green. If we bet $1 on black (which we always do), how much do we expect to win in a single game? Let W be our winnings. If black comes up, we win $1, otherwise we lose $1. So then Pr[W=1] = 18/38 = 9/19 Pr[W=-1] = 20/38 = 10/19 E(W) = 1 Pr[W=1] - 1 Pr[W=-1] = 9/19 - 10/19 = -1/19. Suppose we roll a fair die. What is the expected value of the resulting number? Let N be a random variable corresponding to the number. Then Pr[N=i] = 1/6 for 1 <= i <= 6, and E(X) = 1 Pr[N=1] + 2 Pr[N=2] + ... + 6 Pr[N=6] = 1 * 1/6 + 2 * 1/6 + ... + 6 * 1/6 = 1/6 (1 + 2 + ... + 6) = 1/6 * 6 * 7 / 2 = 7/2. Note that this isn't actually a value that N can take on, but it is the expectation or average. Suppose we roll two fair dice. Let S be the sum of the values on the two dice. What is E(S)? We can compute the distribution Pr[S=2] = 1/36 Pr[S=3] = 1/18 Pr[S=4] = 1/12 ... Pr[S=12] = 1/36. This is quite tedious, and at the end, we get E(S) = 7 = 2 E(N). It seems like there should be a simpler way to arrive at this. As a final example, suppose we pick 100 Californians at uniformly at random. How many Democrats do we expect out of this group, given that 44.5\% of Californians are Democrat? Intuitively, we'd expect 44.5, but how can we arrive at that without computing a large distribution?