Administrative info MT2 stats: μ ≈ 77, σ ≈ 16 Updated MT2 solutions posted on Piazza Exams can be picked up from Soda front office PA3 due Thursday HW8 due next Wednesday Final exam next Thursday Review Recall that a probability space consists of the following: (1) A random experiment. (2) The sample space (set of possible outcomes). (3) The probability of each possible outcome of the experiment. Further recall that the probabilities must satisfy the following: (1) ∀ω∈Ω . 0 <= Pr[ω] <= 1 (2) ∑_{ω∈Ω} Pr[ω] = 1 An event is a subset of the sample space, i.e. a set of outcomes from the sample space. The probability of an event E is the sum of the probabilities of the outcomes in E: Pr[E] = ∑_{ω∈E} Pr[ω]. In the case of a uniform distribution, this simplifies to Pr[E] = |E|/|Ω|. A random variable is a function from the sample space Ω to the reals. Continuous Probability Suppose that SPECTRE has captured James Bond. They handcuff him, knock him out, and stuff him in the back of a plane to be transported from their hideout to their secret underground lair 1000 miles away. When the plane lands at the lair, they discover that Bond has apparently woken up mid-flight, slipped out of his handcuffs, killed the flight attendant, and parachuted out of the plane, to land in the desert somewhere along the 1000-mile flight path. Since SPECTRE knows nothing about when Bond escaped, his escape point is equally likely to be anywhere along this 1000-mile segment. What is the probability that he is at each point along this segment? In the discrete version of this problem, we would assume that he could be at any of a finite number of positions, say one per mile. The sample space would be Ω = {1, 2, ..., 1000}, with Pr[ω] = 1/|Ω| = 1/1000 for each sample point ω ∈ Ω. If we try to follow the same procedure here, we would define the sample space as the set of real numbers Ω = [0, 1000]. There are infinitely many possibilities, so we get Pr[ω] = 0 for each sample point! And this is in fact the case; the probability that Bond lands at any exact position is 0. How do we make sense of this sample space? Rather than working with outcomes, we work directly with events. Recall that an event E is a subset of the sample space, and that in a uniform sample space, the probability of E is the size of E divided by the size of the sample space: Pr[E] = |E|/|Ω|. By analogy here, we use intervals as events. For example, the interval [0, 50] is an event that Bond lands within 50 miles from the hideout. What is the probability of this event? Again, it should be the "size" of this interval divided by the "size" of the sample space. Exactly what "size" means requires measure theory and is beyond the scope of this class, but it should be intuitively clear that the size we need here is the length of an interval. (Later, we will see another definition of "size" for infinite sets that is not applicable here.) Thus, Pr[[0, 50]] = (length of [0, 50]) / (length of [0, 1000]) = 1/20. Suppose that SPECTRE sends out its henchman in special dune buggies (with frickin' lasers!) from each base. However, these buggies have only a 50-mile range. What is the probability that Bond landed in range of the buggies? Let E be the event that Bond landed within 50 miles of either base. Then E = [0, 50] ∪ [950, 1000], so Pr[E] = Pr[[0, 50]] + Pr[[950, 1000]] (since the intervals are disjoint) = 1/10. Suppose that one of the buggies finds Bond. In running away, he shoots over his shoulder at the buggy. Suppose that he hits a random spot on the buggy, which is 5 feet long and 4 feet high. The gas tank presents a circular target with a radius of 1 foot. What is the probability that Bond hits the gas tank, causing the buggy to explode and allowing him to escape? By analogy with the 1D case, the sample space here is the rectangle Ω = [0, 5] × [0, 4], and the "size" of Ω is its area, 20 square feet. Then the "size" of the event G, that Bond hits the gas tank, is the area of the gas tank, π square feet. So the probability that he hits the gas tank is Pr[G] = π/20. Continuous Random Variables In discrete probability, we defined random variables as functions from the sample space Ω to R. In reality, the range of a discrete random variable must be a finite or countably infinite subset of R (we will define what countably infinite means later). It is impossible for the range to be a continuous subset of R. In continuous probability, however, the range of a random variable may be a continuous subset of R. For example, if we define a random variable X corresponding to Bond's landing position, then X takes on any value in the range [0, 1000], with uniform probability. Thus, we have Pr[X = a] = 0 for all possible values of a. As with the events above, we can instead use intervals Pr[a < X <= b] that have meaningful probabilities. (Note that it doesn't matter if we include the endpoints or not, since they have 0 probability.) If we know the value of Pr[X <= x] for all values of a, then we can compute the probability of an interval as Pr[a < X <= b] = Pr[X <= b] - Pr[X <= a]. Thus, if we have a function G(x) = Pr[X <= x], we have all the information we need about a continuous random variable. This is called the "cumulative distribution function", abbreviated "cdf". Note that it is necessary to define this function for all values of x ∈ R. In the case of Bond's position, Pr[X <= x] is the probability of the interval [0, x] when x is in the range [0, 1000]. Thus, the cdf is { 0 if x < 0 G(x) = { x/1000 if 0 <= x <= 1000 { 1 if x > 1000. Now what is the probability that Bond is within 50 miles of the center? We have Pr[450 < X <= 500] = Pr[X <= 500] - Pr[X <= 450] = G(550) - G(450) = 550/1000 - 450/1000 = 1/10. As a more complex example, suppose that Bond hits the gas tank when shooting at the buggy at some uniformly random location on the gas tank. Let Y be the distance (in feet) of where he hits from the center of the tank. What is the cdf of Y? The area of the tank is 1 (we will leave off units, but know that we are working in feet and square feet). The probability that he hits less than y from the center is the area of the circle of radius x divided by the total area, or πy^2/(π1^2) = y^2. Thus, the cdf of Y is { 0 0 if y < 1 F(y) = { y^2 if 0 <= y <= 1 { 1 1 if y > 1. Now we determine the probability of any interval for Y. For example, Pr[0.5 < Y <= 0.6] = Pr[Y <= 0.6] - Pr[Y <= 0.5] = F(0.6) - F(0.5) = 0.36 - 0.25 = 0.11. While the cdf lets us do probability calculations, it does not give us a good idea about where the value of the random variable is more likely to be. For example, Y above is not uniformly distributed but is more likely to be further away from the center, since there is more area to hit there. This is hard to tell from the cdf. What we'd like to know is the probability in some tiny interval around a particular y: Pr[y < Y <= y + δ]. If we compare the value of this probability for different y, we get an idea of where the value of the random variable Y is more likely to be located. Of course, the probability above depends on how small of an interval we use, i.e. the size of δ. To remove this dependency, let's look at the ratio Pr[y < Y <= y + δ] / δ. This is the probability per unit length near y, or the "probability density" at y. To get an exact expression, we take the limit lim_{δ->0} Pr[y < Y <= y + δ] / δ = lim_{δ->0} (Pr[Y <= y + δ] - Pr[Y <= y]) / δ = lim_{δ->0} (F(y + δ) - F(y)) / δ = d/dy F(y), recalling the fundamental theorem of calculus. This leaves us with a function f(y) = d/dy F(y), where f(y) is the "probability density function" or "pdf". Of course, the fundamental theorem of calculus also tells us how to undo this operation to get from the pdf to the cdf, by integrating: F(y) = ∫_{-∞}^y f(x) dx. Thus, the pdf and cdf contain the same information, and we can obtain probabilities for intervals by integrating: Pr[a < Y <= b] = F(b) - F(a) = ∫_a^b f(x) dx. We can make sense of this by discretizing, by dividing the interval into a large number n of smaller intervals, each of size δ = (b - a) / n, we get Pr[y < Y <= y + δ] ≈ f(y) δ, so Pr[a < Y <= b] ≈ ∑_{i=0}^{n-1} Pr[a + iδ < Y <= a + (i+1)δ] ≈ ∑_{i=0}^{n-1} f(a + iδ) δ. Taking the limit of n -> ∞ gives us the integral above. As an example, what is the pdf of X, Bond's position? It is { 0 if x < 0 g(x) = d/dx F(x) = { 1/1000 if 0 <= x <= 1000 { 0 if x > 1000. The probability density is 0 outside the range [0, 1000] and uniform within that range, as we expect. Then we can use the pdf to compute Pr[450 < X <= 550] = ∫_{450}^{550} g(x) dx = ∫_{450}^{550} 1/1000 dx = x/1000 |_{450}^{550} = 550/1000 - 450/1000 = 1/10. For the pdf of Y, the distance from the center of the gas tank to the location of Bond's shot, is { 0 if x < 0 f(y) = d/dx F(y) = { 2x if 0 <= x <= 1 { 0 if x > 1. So the density is higher further away from the center, as expected. Since the pdf and cdf give us the same information, we will use whichever is more convenient. Often, the cdf is easier to determine directly. However, the pdf is what allows us to compute expectations and variances. Continuous Expectation and Variance In the discrete case, the expectation of a random variable Z is E(Z) = ∑_{a ∈ A} a * Pr[Z = a], where A is the set of all values that Z can take on. Continuous random variables can take on any real number. However, if we discretize a continuous random variable Y, we get something like E(Y) ≈ ∑_{b ∈ B} b Pr[b < Y <= b + δ] = ∑_{b ∈ B} b f(b) δ, where B is a countably infinite set of values that are δ apart. Then if we undo the discretization, we get E(Y) = ∫_{-∞}^{+∞} x f(x) dx. This is the expectation of a continuous random variable. The variance of a continuous random variable is exactly the same as for a discrete random variable: Var(Y) = E(Y^2) - E(Y)^2, where E(Y) is as above and E(Y^2) = ∫_{-∞}^{+∞} x^2 f(x) dx. As an example, what is E(X), Bond's expected position? It is E(X) = ∫_{-∞}^{+∞} x g(x) dx = ∫_{0}^{1000} x/1000 dx. = x^2/2000 |_0^{1000} = 1000^2/2000 = 1000/2 = 500, as we would expect. Then E(X^2) = ∫_{-∞}^{+∞} x^2 g(x) dx = ∫_{0}^{1000} x^2/1000 dx. = x^3/3000 |_0^{1000} = 1000^3/3000 = 1000^2/3. Then Var(X) = E(X^2) - E(X)^2 = 1000^2/3 - 1000^2/4 = 1000^2/12. In general, for a random variable Z that is uniformly distributed in the interval [0, d], we get E(Z) = d/2 Var(Z) = d^2/12. A random variable W that is uniformly distirbuted in the interval [a, a+d] is just W = Z + a, so we get E(W) = E(Z) + a = a + d/2 Var(W) = Var(Z) = d^2/12.