Administrative info HW5 due tomorrow HW6 out tonight Review In defining events, we noted that we often do not care about a specific outcome to a random experiment but whether or not the outcome is part of a special set of outcomes, which we called events. For example, when flipping a fair coin 100 times, we may care about whether or not the number of heads and tails is the same. So we define E as the event that we get 50 heads and compute Pr[E] = C(100, 50) / 2^100. Random Variables Sometimes, what we care about is a numerical value of an outcome. For example, if we receive $1 for each heads in 100 flips of a fair coin and lose $1 for each tails, we care about how much total money we earn or lose. For any particular outcome ω, this is just the number of heads in ω minus the number of tails. We can compute the amount of money we win or lose for every one of the 2^n outcomes. This is a "random variable." A random variable is a function that assigns a value to each sample point. More formally, a random variable X is a function from Ω, the sample space to R, the set of real numbers. The value of the random variable at sample point ω is denoted as X(ω) (like any other function). (Note: a random variable is neither random nor a variable, since it is a function. Why is it called a random variable? I don't know, but since the outcome of an experiment is random, the value of the random variable is a function of a random outcome.) Let's go back to coin flipping. Suppose I flip a fair coin once. Let X be a random variable that is +1 if I get heads, -1 if I get tails. What is X(ω) for each outcome ω? Well, X(h) = +1 and X(t) = -1. What if I flip a fair coin three times, where X is the amount I win if I win $1 for each heads, lost $1 for each tails? Then X(hhh) = 3 X(hth) = 1 X(thh) = 1 X(tth) = -1 X(hht) = 1 X(htt) = -1 X(tht) = -1 X(ttt) = -3 In general, if I flip a fair coin n times, then if X is the amount I win, X(ω) = H(ω) - T(ω), where H(ω) is the number of heads in ω and T(ω) is the number of tails in ω. Notice that H and T are also random variables, since they assign a real number to each sample point. Defining a random variable in terms of simpler random variables is a very useful procedure. (Notice a common theme with induction, counting, probability, and now random variables? All involve reducing a hard problem to simpler problems.) We can actually note that H(ω) + T(&omega) = n, since each flip must be heads or tails. So we can further write X(ω) = H(ω) - (n - H(ω)) = 2 H(ω) - n. Suppose that rather than handing back your exams individually in section, we hand back a random exam to each person in lecture. Let X be the number of students who get their own homeworks back. What is X(ω) for each sample point ω? First, let us determine the sample space. Each outcome is just a permutation of the n students in class {1, ..., s}. For example, ω = (2, 3, ..., n, 1) corresponds to student i getting the exam for student mod(i+1, n). Since each outcome is a permutation, there are n! outcomes, which we assume to have uniform probability. Now let's define a series of simpler random variables. Let X_i be a random variable that is 1 if the ith person gets his or her own homework back, 0 otherwise. Such a 0/1-valued random variable is called an "indicator random variable" and is a very common and useful type of random variable. Then we have X(ω) = X_1(ω) + ... + X_n(ω). As a concrete example, suppose n = 3. Then the outcomes are (1,2,3) (1,3,2) (2,1,3) (2,3,1) (3,1,2) (3,2,1) Then the values of the X_i are 1 1 0 0 0 0 X_1 1 0 0 0 0 1 X_2 1 0 1 0 0 0 X_3 and the values of X are 3 1 1 0 0 1 X. We can use the same procedure in the case of coin flipping. Here, X is the total amount won in n flips. Let X_i be the amount won in the ith flip, +1 if it is heads, -1 if it is tails. Then X(ω) = X_1(ω) + ... + X_n(ω). Then in the case of n = 3, we have hhh hht hth htt thh tht tth ttt 1 1 1 1 -1 -1 -1 -1 X_1 1 1 -1 -1 1 1 -1 -1 X_1 1 -1 1 -1 1 -1 1 -1 X_1 3 1 1 -1 1 -1 -1 -3 X as before. Distributions As with events, we often don't care about the value of a random variable at each outcome. Rather, we care about the probability that the random variable takes any particular value. In fact, we can define events in terms of random variables. We define "X = a" to be the event that the random variable X takes on value a. More formally, X = a ≡ {&omega : &omega ∈ Ω ∧ X(&omega) = a}, i.e. X = a is the set of outcomes ω for which X(ω) = a. In the above example with three coin flips, we have (X = 3) = {hhh} (X = 1) = {hht, hth, thh} (X = -1) = {htt, tht, tth} (X = -3) = {ttt}. Now since each X = a is an event, we can compute the probability Pr[X = a]. With the coin flips, we have Pr[X = 3] = 1/8 Pr[X = 1] = 3/8 Pr[X = -1] = 3/8 Pr[X = -3] = 1/8 This set of probabilities is called the "distribution" of the random variable X. We can also draw a graph to depict the distribution. Pr[X=a] | 3/8 | * * | * * 1/8 | * * * * --+--+--+--+--+--+--+-- -3 -2 -1 0 1 2 3 a Note that since X is a function from Ω to R, each ω has exactly one value a such that X(ω) = a, so ω is in exactly one of the events X = a. Thus, the events X = a partition the sample space. This means that (1) (X = a_1 ∩ X = a_2) = ∅ if a_1 ≠ a_2 (2) ∪_{a ∈ A} (X = a) = Ω, where A is the set of all possible values that X(ω) can take on. These two facts imply that the sum of all probabilities in the distribution of X is 1. In the example of passing back exams, what is the distribution of X, the number of students who get their own exam back? For n = 3, we have Pr[X = 3] = 1/6 Pr[X = 1] = 1/2 Pr[X = 0] = 1/3. What about arbitrary n? Let's come back to that later. Let's take another look at the coin flipping example, but for arbitrary n. The distribution of X, the amount of money won, seems non-trivial. But since we know that X(&omega) = 2 H(ω) - n, let's first compute a distribution for H, the number of heads. If we flip a fair coin n times, in how many outcomes are there exactly i heads? This is just choosing i out of the n flips to be heads, so there are C(n, i) outcomes. Then |H = i| = C(n, i), so Pr[H = i] = C(n, i) / 2^n. This is the distribution of H, where i is an integer 0 <= i <= n. Here is a graph representation of the distribution of H when n = 5: Pr[X=a] | * * 9/32 | * * | * * 7/32 | * * | * * 5/32 | * * * * | * * * * 3/32 | * * * * | * * * * 1/32 | * * * * * * --+--+--+--+--+--+-- 0 1 2 3 4 5 a You can see the beginnings of a bell curve. It follows that the distribution of X is Pr[X = i] = Pr[2H-n = i] = Pr[H = (i+n)/2], where -n <= i <= n. If (i+n)/2 is not an integer (e.g. i i is odd and n is even), then this is 0. Now suppose we are flipping a biased coin with probability p of heads. Then what is the distribution of H, the number of heads? First, how many outcomes are in H = i? As before, there are C(n, i). But now we can't just divide by the size of the sample space, since it is not uniform. Instead, we use the definition of the probability of an event, that it is the sum of the probabilities of the outcomes in the event. What is the probability of each outcome in H = i? We already computed this in a previous lecture as p^i (1-p)^(n-i). So Pr[H = i] = C(n, i) p^i (1-p)^(n-i), where i is an integer 0 <= i <= n. This is known as the "binomial distribution" with parameters p and n, where p is the probability of getting heads in any one flip and n is the number of flips. We use the shorthand H ~ Bin(n, p) to denote that H is a random variable with a binomial distribution with parameters n and p. The graph of a binomial distribution with parameters p and n is bell-shaped, though it will be skewed in one direction if p is not 1/2. See the reader for an example. The binomial distribution comes up in any experiment with n independent trials, each with probability of success p. As another example, suppose we are sending n packets over a network, where we choose the path from source to destination randomly and independently for each packet. Suppose that the probability that a single packet reaches its destination is p. Then if X is the number of packets that reach the destination, X ~ Bin(n, p), so Pr[X = i] = C(n, i) p^i (1-p)^(n-i) for 0 <= i <= n.