Administrative info
  MT2 stats: μ ≈ 77, σ ≈ 16
  Updated MT2 solutions posted on Piazza
  Exams can be picked up from Soda front office
  PA3 due Thursday
  HW8 due next Wednesday
  Final exam next Thursday

Review
  Recall that a probability space consists of the following:
  (1) A random experiment.
  (2) The sample space (set of possible outcomes).
  (3) The probability of each possible outcome of the experiment.

  Further recall that the probabilities must satisfy the following:
  (1) ∀ω∈Ω . 0 <= Pr[ω] <= 1
  (2) ∑_{ω∈Ω} Pr[ω] = 1

  An event is a subset of the sample space, i.e. a set of outcomes
  from the sample space. The probability of an event E is the sum of
  the probabilities of the outcomes in E:
    Pr[E] = ∑_{ω∈E} Pr[ω].
  In the case of a uniform distribution, this simplifies to
    Pr[E] = |E|/|Ω|.

  A random variable is a function from the sample space Ω to the
  reals.

Continuous Probability
  Suppose that SPECTRE has captured James Bond. They handcuff him,
  knock him out, and stuff him in the back of a plane to be
  transported from their hideout to their secret underground lair 1000
  miles away. When the plane lands at the lair, they discover that
  Bond has apparently woken up mid-flight, slipped out of his
  handcuffs, killed the flight attendant, and parachuted out of the
  plane, to land in the desert somewhere along the 1000-mile flight
  path.

  Since SPECTRE knows nothing about when Bond escaped, his escape
  point is equally likely to be anywhere along this 1000-mile segment.
  What is the probability that he is at each point along this segment?

  In the discrete version of this problem, we would assume that he
  could be at any of a finite number of positions, say one per mile.
  The sample space would be
    Ω = {1, 2, ..., 1000},
  with
    Pr[ω] = 1/|Ω| = 1/1000
  for each sample point ω ∈ Ω.

  If we try to follow the same procedure here, we would define the
  sample space as the set of real numbers
    Ω = [0, 1000].
  There are infinitely many possibilities, so we get
    Pr[ω] = 0
  for each sample point! And this is in fact the case; the probability
  that Bond lands at any exact position is 0.

  How do we make sense of this sample space? Rather than working with
  outcomes, we work directly with events. Recall that an event E is a
  subset of the sample space, and that in a uniform sample space, the
  probability of E is the size of E divided by the size of the sample
  space:
    Pr[E] = |E|/|Ω|.
  By analogy here, we use intervals as events. For example, the
  interval [0, 50] is an event that Bond lands within 50 miles from
  the hideout. What is the probability of this event? Again, it should
  be the "size" of this interval divided by the "size" of the sample
  space. Exactly what "size" means requires measure theory and is
  beyond the scope of this class, but it should be intuitively clear
  that the size we need here is the length of an interval. (Later, we
  will see another definition of "size" for infinite sets that is not
  applicable here.) Thus,
    Pr[[0, 50]] = (length of [0, 50]) / (length of [0, 1000])
      = 1/20.

  Suppose that SPECTRE sends out its henchman in special dune buggies
  (with frickin' lasers!) from each base. However, these buggies have
  only a 50-mile range. What is the probability that Bond landed in
  range of the buggies?

  Let E be the event that Bond landed within 50 miles of either base.
  Then
    E = [0, 50] ∪ [950, 1000],
  so
    Pr[E] = Pr[[0, 50]] + Pr[[950, 1000]]
                                   (since the intervals are disjoint)
      = 1/10.

  Suppose that one of the buggies finds Bond. In running away, he
  shoots over his shoulder at the buggy. Suppose that he hits a random
  spot on the buggy, which is 5 feet long and 4 feet high. The gas
  tank presents a circular target with a radius of 1 foot. What is the
  probability that Bond hits the gas tank, causing the buggy to
  explode and allowing him to escape?

  By analogy with the 1D case, the sample space here is the rectangle
    Ω = [0, 5] × [0, 4],
  and the "size" of Ω is its area, 20 square feet. Then the
  "size" of the event G, that Bond hits the gas tank, is the area of
  the gas tank, π square feet. So the probability that he hits the
  gas tank is
    Pr[G] = π/20.

Continuous Random Variables
  In discrete probability, we defined random variables as functions
  from the sample space Ω to R. In reality, the range of a
  discrete random variable must be a finite or countably infinite
  subset of R (we will define what countably infinite means later). It
  is impossible for the range to be a continuous subset of R.

  In continuous probability, however, the range of a random variable
  may be a continuous subset of R. For example, if we define a random
  variable X corresponding to Bond's landing position, then X takes on
  any value in the range [0, 1000], with uniform probability. Thus, we
  have Pr[X = a] = 0 for all possible values of a.

  As with the events above, we can instead use intervals
    Pr[a < X <= b]
  that have meaningful probabilities. (Note that it doesn't matter if
  we include the endpoints or not, since they have 0 probability.) If
  we know the value of
    Pr[X <= x]
  for all values of a, then we can compute the probability of an
  interval as
    Pr[a < X <= b] = Pr[X <= b] - Pr[X <= a].
  Thus, if we have a function
    G(x) = Pr[X <= x],
  we have all the information we need about a continuous random
  variable. This is called the "cumulative distribution function",
  abbreviated "cdf". Note that it is necessary to define this function
  for all values of x ∈ R.

  In the case of Bond's position, Pr[X <= x] is the probability of the
  interval [0, x] when x is in the range [0, 1000]. Thus, the cdf is
           { 0       if x < 0
    G(x) = { x/1000  if 0 <= x <= 1000
           { 1       if x > 1000.
  Now what is the probability that Bond is within 50 miles of the center?
  We have
    Pr[450 < X <= 500] = Pr[X <= 500] - Pr[X <= 450]
      = G(550) - G(450)
      = 550/1000 - 450/1000
      = 1/10.

  As a more complex example, suppose that Bond hits the gas tank when
  shooting at the buggy at some uniformly random location on the gas
  tank. Let Y be the distance (in feet) of where he hits from the center
  of the tank. What is the cdf of Y?

  The area of the tank is 1 (we will leave off units, but know that we
  are working in feet and square feet). The probability that he hits
  less than y from the center is the area of the circle of radius x
  divided by the total area, or πy^2/(π1^2) = y^2. Thus, the
  cdf of Y is
           { 0    0 if y < 1
    F(y) = { y^2  if 0 <= y <= 1
           { 1    1 if y > 1.

  Now we determine the probability of any interval for Y. For example,
    Pr[0.5 < Y <= 0.6] = Pr[Y <= 0.6] - Pr[Y <= 0.5]
      = F(0.6) - F(0.5)
      = 0.36 - 0.25
      = 0.11.

  While the cdf lets us do probability calculations, it does not give
  us a good idea about where the value of the random variable is more
  likely to be. For example, Y above is not uniformly distributed but
  is more likely to be further away from the center, since there is
  more area to hit there. This is hard to tell from the cdf.

  What we'd like to know is the probability in some tiny interval
  around a particular y:
    Pr[y < Y <= y + δ].
  If we compare the value of this probability for different y, we get
  an idea of where the value of the random variable Y is more likely
  to be located.

  Of course, the probability above depends on how small of an interval
  we use, i.e. the size of δ. To remove this dependency, let's
  look at the ratio
    Pr[y < Y <= y + δ] / δ.
  This is the probability per unit length near y, or the "probability
  density" at y. To get an exact expression, we take the limit
    lim_{δ->0} Pr[y < Y <= y + δ] / δ
      = lim_{δ->0} (Pr[Y <= y + δ] - Pr[Y <= y]) / δ
      = lim_{δ->0} (F(y + δ) - F(y)) / δ
      = d/dy F(y),
  recalling the fundamental theorem of calculus. This leaves us with a
  function
    f(y) = d/dy F(y),
  where f(y) is the "probability density function" or "pdf". Of
  course, the fundamental theorem of calculus also tells us how to
  undo this operation to get from the pdf to the cdf, by integrating:
    F(y) = ∫_{-∞}^y f(x) dx.
  Thus, the pdf and cdf contain the same information, and we can
  obtain probabilities for intervals by integrating:
    Pr[a < Y <= b] = F(b) - F(a)
      = ∫_a^b f(x) dx.
  We can make sense of this by discretizing, by dividing the interval
  into a large number n of smaller intervals, each of size
    δ = (b - a) / n,
  we get
    Pr[y < Y <= y + δ] ≈ f(y) δ,
  so
    Pr[a < Y <= b]
      ≈ ∑_{i=0}^{n-1} Pr[a + iδ < Y <= a + (i+1)δ]
      ≈ ∑_{i=0}^{n-1} f(a + iδ) δ.
  Taking the limit of n -> ∞ gives us the integral above.

  As an example, what is the pdf of X, Bond's position? It is
                       { 0       if x < 0
    g(x) = d/dx F(x) = { 1/1000  if 0 <= x <= 1000
                       { 0       if x > 1000.
  The probability density is 0 outside the range [0, 1000] and uniform
  within that range, as we expect. Then we can use the pdf to compute
    Pr[450 < X <= 550] = ∫_{450}^{550} g(x) dx
      = ∫_{450}^{550} 1/1000 dx
      = x/1000 |_{450}^{550}
      = 550/1000 - 450/1000
      = 1/10.

  For the pdf of Y, the distance from the center of the gas tank to the
  location of Bond's shot, is
                       { 0   if x < 0
    f(y) = d/dx F(y) = { 2x  if 0 <= x <= 1
                       { 0   if x > 1.
  So the density is higher further away from the center, as expected.

  Since the pdf and cdf give us the same information, we will use
  whichever is more convenient. Often, the cdf is easier to determine
  directly. However, the pdf is what allows us to compute expectations
  and variances.

Continuous Expectation and Variance
  In the discrete case, the expectation of a random variable Z is
    E(Z) = ∑_{a ∈ A} a * Pr[Z = a],
  where A is the set of all values that Z can take on.

  Continuous random variables can take on any real number. However, if
  we discretize a continuous random variable Y, we get something like
    E(Y) ≈ ∑_{b ∈ B} b Pr[b < Y <= b + δ]
      = ∑_{b ∈ B} b f(b) δ,
  where B is a countably infinite set of values that are δ
  apart. Then if we undo the discretization, we get
    E(Y) = ∫_{-∞}^{+∞} x f(x) dx.
  This is the expectation of a continuous random variable.

  The variance of a continuous random variable is exactly the same as
  for a discrete random variable:
    Var(Y) = E(Y^2) - E(Y)^2,
  where E(Y) is as above and
    E(Y^2) = ∫_{-∞}^{+∞} x^2 f(x) dx.

  As an example, what is E(X), Bond's expected position? It is
    E(X) = ∫_{-∞}^{+∞} x g(x) dx
      = ∫_{0}^{1000} x/1000 dx.
      = x^2/2000 |_0^{1000}
      = 1000^2/2000
      = 1000/2 = 500,
  as we would expect.

  Then
    E(X^2) = ∫_{-∞}^{+∞} x^2 g(x) dx
      = ∫_{0}^{1000} x^2/1000 dx.
      = x^3/3000 |_0^{1000}
      = 1000^3/3000
      = 1000^2/3.
  Then
    Var(X) = E(X^2) - E(X)^2
      = 1000^2/3 - 1000^2/4
      = 1000^2/12.

  In general, for a random variable Z that is uniformly distributed in
  the interval [0, d], we get
    E(Z) = d/2
    Var(Z) = d^2/12.
  A random variable W that is uniformly distirbuted in the interval
  [a, a+d] is just
    W = Z + a,
  so we get
    E(W) = E(Z) + a
      = a + d/2
    Var(W) = Var(Z)
      = d^2/12.