Administrative info HW6 due tomorrow MT2 next Tuesday Same location and policies as MT1 Cover through polling/LLN (Wednesday) Review We have already seen one important distribution, the binomial distribution. A random variable X ~ Bin(n, p) has the distribution Pr[X = i] = C(n, i) p^i (1-p)^(n-i) for integer i, 0 <= i <= n. This distribution arises whenever we have a fixed number of trials n, the trials are mutually independent, the probability of success of any one trial is p, and we are counting the number of successes. We aslo computed the expectation of a binomial distribution using indicator random variables: X = X_1 + ... + X_n X_i = { 1 if the ith trial is successful { 0 otherwise E(X_i) = Pr[X_i = 1] = p E(X) = E(X_1) + ... + E(X_n) = np. Now we turn our attention to two more important discrete distributions. Geometric Distribution Suppose I take a written driver's license test. Since I don't study, I only have a probability p of passing the test, mostly by getting lucky. Let T be the number of times I have to take the test before I pass. (Assume I can take it as many times as necessary, perhaps by paying a not negligible fee.) What is the distribution of T? (Fun fact: A South Korean woman took the test 950 times before passing.) [Note: By "before passing," we mean that she passed in the 950th attempt, not the 951st. We may use this phrase again.] Before we determine the distribution of T, we should figure out what the sample space of the experiment is. An outcome consists of a series of 0 or more failures followed by a success, since I keep retaking the test until I pass it. Thus, if f is a failure and c is passing, we get the outcomes Ω = {c, fc, ffc, fffc, ffffc, ...}. How many outcomes are there? There is no upper bound on how many times I will have to take the test, since I can get very unlucky and keep failing. So the number of outcomes is infinite! What is the probability of each outcome? Well, let's assume that the result of a test is independent each time I take it. (I really haven't studied, so I'm just guessing blindly each time.) Then the probability of passing a test is p and of failing is c, so we get Pr[c] = p, Pr[fc] = (1-p)p, Pr[ffc] = (1-p)^2 p, ... Do these probabilities add to 1? Well, their sum is ∑_{ω ∈ &Omega} Pr[ω] = ∑_{i=0}^∞ (1-p)^i p = p ∑_{i=0}^∞ (1-p)^i = p 1/(1-(1-p)) (sum of geom. series r^i is 1/(1-r) if -1 < r < 1) = 1. So this probability assignment is valid. Since the event T = i has only the single outcome f^{i-1}c, we get Pr[T=1] = p, Pr[T=2] = (1-p)p, Pr[T=3] = (1-p)^2 p, ... as the distribution of T, and the probabilities sum to 1, as required for a random variable. The distribution of T is known as a "geometric distribution" with parameter p, T ~ Geom(p). This arises anytime we have a sequence of independent trials, each of which has probability p of succes, and we want to know when the first success occurs. (This is unlike the binomial distribution, when we wanted to know how many success occur in a fixed number n of independent trials.) Now how many times can I expect to take the test before passing? We want E(T). We get E(T) = p + 2(1-p)p + 3(1-p)^2 p + ... = ∑_{i=1}^∞ i(1-p)^{i-1}. This isn't a pure geometric series, so directly computing the sum is harder. Let's use another method. It turns out that for any random variable X that only takes on values in N, E(X) = Pr[X >= 1] + Pr[X >= 2] + Pr[X >= 3] + ... = ∑_{i=1}^∞ Pr[X >= i]. Proof: Let p_i = Pr[X = i]. Then by definition, E(X) = 0 p_0 + 1 p_1 + 2 p_2 + 3 p_3 + 4 p_4 + ... = p_1 + (p_2 + p_2) + (p_3 + p_3 + p_3) + (p_4 + p_4 + p_4 + p_4) + ... = (p_1 + p_2 + p_3 + p_4 + ...) + (p_2 + p_3 + p_4 + ...) + (p_3 + p_4 + ...) + ... (combining columns from previous step) = Pr[X >= 1] + Pr[X >= 2] + Pr[X >= 3] + Pr[X >= 4] + ... Now what is Pr[T >= i]? This is the probability that I fail the first i-1 tests, so Pr[T >= i] = (1-p)^(i-1). Then E(T) = ∑_{i=1}^∞ Pr[X >= i] = ∑_{i=1}^∞ (1-p)^(i-1) = ∑_{j=0}^∞ (1-p)^j (with j = i - 1) = 1/(1-(1-p)) (geometric series) = 1/p. So I expect to take the test 1/p times before passing. Here's another way to calculate E(T). E(T) = p + 2p(1-p) + 3p(1-p)^2 + 4p(1-p)^3 + ... (1-p)E(T) = p(1-p) + 2p(1-p)^2 + 3p(1-p)^3 + ... pE(T) = p + p(1-p) + p(1-p)^2 + p(1-p)^3 + ... = 1 E(T) = 1/p In the second line, we multiplied E(T) by (1-p) and added some whitespace to line up terms with the previous line. Then we subtracted (1-p)E(T) from E(T) to get the third line. The resulting right-hand side is the sum of the probabilities of each event T = i, so it must be 1. To summarize, for a random variable X ~ Geom(p), we've computed (1) Pr[X = i] = (1-p)^(i-1) p (2) Pr[X >= i] = (1-p)^(i-1) (3) E(X) = 1/p. Other examples of geometrically distributed random variables are the number of runs before a system fails, the number of shots that must be taken before hitting a target, and the number of coin flips before heads appears. Coupon Collector Redux Recall the coupon collector problem. We buy cereal boxes, each of which contains a baseball card for one of the n Giants players. How many do I expect to buy before I get a Panda card? Let P be the number of boxes I buy before I get the Panda. Then, each time I buy a box, I have 1/n chance of getting the Panda, and the boxes are independent. So P ~ Geom(1/n), and E(P) = n. Now suppose I want the entire team? Let T be the number of boxes I buy to get the entire team. It is tempting to define a separate random variable for each player, P = # of boxes to get the Panda B = # of boxes to get the Beard F = # of boxes to get the Freak ... but T ≠ P + B + F + ... (Can you see why? If we just consider these three players and it takes me 1 box to get the Panda, 2 to get the Beard, 3 to get the Freak, then T = 3, but P + B + F = 1 + 2 + 3 = 6.) So we need another approach. Let's instead define random variables P_i as the number of boxes it takes to get a new player after I get the (i-1)th player. (In the above example, P_1 = P_2 = P_3 = 1, so T = P_1 + P_2 + P_3.) Then it is the case that T = P_1 + ... + P_n, and we can appeal to linearity of expectation. Now E(P_i) is not constant for all i. In particular, I always get a new player in the first box, so Pr[P_1 = 1] = 1 and E(P_1) = 1. But then for the second box, I can get the same player as the first, so Pr[P_2 = 1] ≠ 1. Note, however, that I do have probability (n-1)/n of getting a new player, and P_2 is the first occurrence of a new player. So P_2 ~ Geom((n-1)/n), and E(P_2) = n/(n-1). By the same reasoning, P_i ~ Geom((n-i+1)/n), so E(P_i) = n/(n-i+1). So by linearity of expectation, E(T) = n/n + n/(n-1) + n/(n-2) + ... + n/2 + n/1 = n ∑_{i=1}^n 1/i. The above sum has a good approximation ∑_{i=1}^n 1/i ≈ ln(n) + γ, where γ ≈ 0.5772 is Euler's constant. So we get E(T) ≈ n(ln(n) + 0.58). Recall our previous result, were we computed that in order to have a 50% chance of getting all n cards, we needed to buy n ln(2n) = n(ln(n) + ln(2)) ≈ n(ln(n) + 0.69) boxes. It is not the case in general that Pr[X > E(X)] ≈ 1/2. The simplest counter example is an indicator random variable Y, Pr[Y = 1] = p. Then E(Y) = p, so Pr[Y > E(Y)] = p ≠ 1/2. So the two results for coupon collecting are not directly comparable. Poisson Distribution Suppose we throw n balls into n/λ bins, where n is large and λ is a constant. We are interested in how many balls land in bin 1. Call this X, then X ~ Bin(n, λ/n), and E(X) = λ. In more detail, the distribution is Pr[X = i] = C(n, i) (λ/n)^i (1 - λ/n)^(n-i), for 0 <= i <= n. We know n is large, so let's approximate this distribution. Let's define p_i ≡ Pr[X = i]. Then we have p_0 = Pr[X = 0] = (1 - λ/n)^n. Recall the Taylor series for e^x: e^x = 1 + x + x^2/2! + x^3/3! + ... Plugging in x = -y, we get e^{-y} ≈ 1 - y, so (1 - λ/n) ≈ e^{-λ/n} (1 - λ/n)^n ≈ (e^{-λ/n})^n = e^{-λ}. Thus, p_0 ≈ e^{-λ}. What about p_i in the general case? Let's look at the ratio p_i/p_{i-1}. p_i/p_{i-1} = [C(n,i) (λ/n)^i (1-λ/n)^{n-i}]/ [C(n,i-1) (λ/n)^{i-1} (1-λ/n)^{n-i+1}] = [C(n,i) λ/n]/ [C(n,i-1) (1-λ/n)] = [C(n,i) λ/n]/ [C(n,i-1) (n-λ)/n] = C(n,i)/C(n,i-1) λ/(n-λ). Now let's look at the ratio C(n,i)/C(n,i-1). We have C(n,i)/C(n,i-1) = (n!/[i!(n-i)!])/ (n!/[(i-1)!(n-i+1)!]) = (i-1)!/i! (n-i+1)!/(n-i)! = 1/i (n-i+1) = (n-i+1)/i. Plugging this in to our expression for p_i/p_{i-1}, we get p_i/p_{i-1} = (n-i+1)/i λ/(n-λ) = (n-i+1)/(n-λ) λ/i. Now in the limit n -> ∞, (n-i+1)/(n-λ) -> 1, so p_i/p_{i-1} ≈ λ/i, p_i ≈ p_{i-1} λ/i. This gives us a recurrence: p_0 = exp(-λ) p_1 = exp(-λ) λ p_2 = exp(-λ) λ^2/2 p_3 = exp(-λ) λ^3/(2*3) p_4 = exp(-λ) λ^4/(2*3*4) ... p_i = exp(-λ) λ^i/i! So we get a new distribution Pr[X = i] = (λ^i)/i! e^{-λ}, i ∈ N. This is a "Poisson distribution" with parameter λ, and we write X ~ Poiss(λ). (Note that though in the original binomial distribution, i is restricted to 0 <= i <= n, here it is not.) Let's check to make sure this is a proper distribution. We have ∑_{i=0}^∞ p_i = ∑_{i=0}^∞ (λ^i)/i! e^{-λ} = e^{-λ} ∑_{i=0}^∞ (λ^i)/i! = e^{-λ} e^{λ} (using the Taylor series above) = 1. Now let's compute E(X): E(X) = ∑_{i=0}^∞ i (λ^i)/i! e^{-λ} = e^{-λ} ∑_{i=0}^∞ i (λ^i)/i! = e^{-λ} ∑_{i=1}^∞ i (λ^i)/i! = e^{-λ} ∑_{i=1}^∞ (λ^i)/(i-1)! = λ e^{-λ} ∑_{i=1}^∞ (λ^(i-1))/(i-1)! = λ e^{-λ} ∑_{j=0}^∞ (λ^j)/j! (with j = i - 1) = λ e^{-λ} e^{λ} = λ. This is the same as that of the original binomial disribution. The Poisson distribution is widely used for modeling rare events. It is a good approximation of the binomial distribution when n >= 20 and p <= 0.05, and a very good approximation when n >= 100 and np <= 10.