Administrative info
  Updated MT2 solutions posted on Piazza
  Exams can be picked up from Soda front office
  PA3 due today
  HW8 due next Wednesday
  Final exam next Thursday
  No regrades for HW8 (not enough time) or final exam (UCB policy)

Review
  In the continuous sample spaces we consider in this class, the
  probability of any particular outcome is 0. So instead of working
  with outcomes, we work directly with events.

  EX: James Bond jumps out of a plane and lands at a position
      uniformly distributed in [0, 1000]. The probability that he
      lands in an interval [a, b], where 0 <= a <= b <= 1000, is
      (b-a)/1000.

  Often, we work directly with random variables. A continuous random
  variable has a range that includes a continuous subset of R. Thus,
  for a random variable X, Pr[X = a] = 0 for any a. Again, we work
  with intervals, such as Pr[a < X <= b], which can have non-zero
  probability.

  We can describe a continuous random variable X in two ways.
  (1) The cumulative distribution function (cdf):
        F(x) = Pr[X <= x].
      Then
        Pr[a < X <= b] = F(b) - F(a).
      The cdf F(x) must satisfy a few properties.
      (a) 0 <= F(x) <= 1 for all x∈R, since F(x) is a
          probability.
      (b) F(x) must be monotonically increasing:
            F(x) <= F(y) if x <= y.
  (2) The probability density function (pdf):
        f(x) = d/dx F(x).
      Then
        F(x) = ∫_{-∞}^y f(y) dy
        Pr[a < X <= b] = ∫_a^b f(y) dy.
      The pdf f(x) must satisfy a few properties.
      (a) f(x) >= 0 for all x∈R. Otherwise it would be possible
          to find an interval for which the integral is negative,
          resulting in a negative probability.
      (b) ∫_{-∞}^{+∞} f(x) dx = 1, i.e. the probability
          Pr[-∞ < X < +∞] is 1.

  The pdf tells us where there is a higher probability density. Its
  graph is similar to the distribution graph for a discrete random
  variable.

  EX: Let X be Bond's landing position when he jumps out of the plane.
      Then the cdf of X is
               { 0       if x < 0
        G(x) = { x/1000  if 0 <= x <= 1
               { 1       if x > 0
      The pdf of X is
               { 0        if x < 0
        g(x) = { 1/1000   if 0 <= x <= 1
               { 0        if x > 0
      A plot of the pdf:
        f(x)
                 ____
        1/1000
        _________    _________
                0  1000      x
      As you can see, he is likely likely to land anywhere in [0,
      1000].

  EX: James Bond shoots at a 1 foot radius gas tank, hitting any point
      on it with uniform probability. What is the pdf of the distance
      from the center to where he hits?

      Let Y = distance of hit from center. Then the cdf of Y is
               { 0    if y < 0
        F(y) = { y^2  if 0 <= y <= 1
               { 1    if y > 0
      The pdf of Y is
               { 0    if y < 0
        f(y) = { 2y   if 0 <= y <= 1
               { 0    if y > 0
      A plot of the pdf:
        f(y)
        2           /
                   /
        1         /
        _________/   _________
                0   1        y
      This shows us that there is higher density away from the center
      than closer to it. So Bond is more likely to hit further away
      from the center than closer to it.

  As you can see in the above example, the pdf is not restricted to the
  range [0, 1]. This is because the pdf is a density, not an
  actual probability. We defined the pdf as
    f(y) = lim_{δ->0} Pr[y < Y <= y + δ] / δ,
  so it is the limit of a tiny probability divided by a tiny length,
  which can give us any non-negative value.

  We derived an expression for the expectation of a random variable:
    E(X) = ∫_{-∞}^{+∞} x f(x) dx.
  Then the variance is defined as in the discrete case, with
    E(X^2) = ∫_{-∞}^{+∞} x^2 f(x) dx.

  EX: What is E(X), Bond's expected landing position? It is
        E(X) = ∫_{-∞}^{+∞} x g(x) dx
          = ∫_{0}^{1000} x/1000 dx.
          = x^2/2000 |_0^{1000}
          = 1000^2/2000
          = 1000/2 = 500.
      Then
        E(X^2) = ∫_{-∞}^{+∞} x^2 g(x) dx
          = ∫_{0}^{1000} x^2/1000 dx.
          = x^3/3000 |_0^{1000}
          = 1000^3/3000
          = 1000^2/3.
      Then
        Var(X) = E(X^2) - E(X)^2
          = 1000^2/3 - 1000^2/4
          = 1000^2/12.

  In general, for a random variable Z that is uniformly distributed in
  the interval [0, d], we get
    E(Z) = d/2
    Var(Z) = d^2/12.
  A random variable W that is uniformly distirbuted in the interval
  [a, a+d] is just
    W = Z + a,
  so we get
    E(W) = E(Z) + a
      = a + d/2
    Var(W) = Var(Z)
      = d^2/12.

Exponential Distribution
  Recall that if we have a number of independent trials, each of which
  has a probability p of success, then the number of trials T until
  the first success follows a geometric distribution
    T ~ Geom(p),
  so
    Pr[T = i] = p(1-p)^{i-1} for all i∈Z^+,
  and
    Pr[T > i] = (1-p)^i for all i∈N.

  Suppose now that we perform a large number of trials every second,
  where each trial has a small probability p of success, so that we
  perform a trial every δ seconds for some small δ. By
  linearity of expectation, the average rate of success λ per
  second is
    λ = p / δ,
  since there are 1/δ trials per second, each with probability
  of success p, so we have
    p = λ δ.
  Let S be the number of seconds until the first success. Then
    Pr[T > k] = Pr[S > kδ]   (since each trial takes δ seconds)
      = Pr[S > t],
  where t = kδ and t >= 0 since k >= 0. Then
    Pr[S > t] = Pr[T > k]
      = (1 - p)^k            (since T ~ Geom(p))
      = (1 - p)^{t/δ}        (since k = t / δ)
      = (1 - λδ)^{t/δ}       (since p = λ δ)
      ≈ (e^{-λδ})^{t/δ}      (since p = λ δ is small)
      = e^{-λ t}.
  Finally, we get, for t >= 0,
    Pr[S <= t] = 1 - Pr[S > t]
      = 1 - e^{-λ t}
  as the cdf of S, which gives us a pdf
    f(t) = d/dt (1 - e^{-λ t})
      = λ e^{-λ t}.
  Both the pdf and cdf are 0 if t < 0.

  This is the "exponential distribution," which has pdf
    f(t) = { λ e^{-λ t}   if t >= 0
           { 0            if t < 0
  It is the continuous version of the geometric distribution and tells
  us how long we need to wait for a success, if successes can occur at
  any time and λ is the average rate of success per unit time.
  We write
    S ~ Exp(λ).
  Computing the expectation and variance of an exponential random
  variable requires integration by parts, and we get
    E(S) = 1/λ
    Var(S) = 1/λ^2.
  These are similar to the geometric distribution, where we got an
  expectation of 1/p and a variance of (1-p)/p^2.

  Note that though p is restricted to [0, 1], since it is a
  probability, λ can be any non-negative value, since it is the
  average rate of success. In particular, it may be the case that we
  expect many successes in a unit of time, in which case λ will
  be greater than 1.

  Recall the relationship between the binomial and the geometric
  distribution. They both examine what happens when have a series of
  independent trials, each with probability p of success. The binomial
  distribution tells us how many successes we get in a fixed number of
  trials, while the geometric distribution tells us in which trial the
  first success occurs.

  The exponential distribution has a similar relationship to the
  Poisson distribution. They both examine what happens when we have a
  particular average rate of success λ. The Poisson tells us
  how many successes we get in a fixed unit length of time, and the
  exponential tells us at what time the first success occurs.

  EX: Suppose a web server processes on average 1.2 requests per
      second. Then the amount of time between requests follows an
      exponential distribution with λ = 1.2. Suppose a request
      comes in. What is the probability that a new request will come
      in within the next second?

      Let S be the amount of time until the next request. Then
        S ~ Exp(1.2).
      Then
        Pr[S <= 1] = ∫_0^1 1.2 e^{-1.2 t} dt
          = -e^{-1.2 t} |_0^1
          = -e^{-1.2} + 1
          ≈ 0.7.

      Note that we could use the Poisson distribution to solve this
      problem. Let R be the number of requests in the next second.
      Then
        R ~ Poiss(1.2).
      Then
        Pr[S <= 1] = Pr[R > 1]
          = 1 - Pr[R = 0]
          = 1 - 1.2^0/0! e^{-1.2}
          = 1 - e^{-1.2},
      as we computed before.

Normal Distribution
  A random variable X has a "normal distribution", also called a
  "Gaussian distribution,", if it has a pdf of the form
    f(x) = 1/√{2πσ^2} e^{-(x-μ)^2/(2σ^2)}
  for some values of μ and σ. It can then be shown that
    ∫_{-∞}^{+∞} f(x) dx = 1,
  as required for a pdf, and that
    E(X) = μ
    Var(X) = σ^2,
  hence the parameters μ and σ. We write
    X ~ N(μ, σ^2).

  The pdf of a normal distribution is a symmetric bell-shaped curve
  centered at μ, with a width determined by σ.

  The "standard normal distribution" has parameters μ = 0, σ
  = 1. So if Y is a standard normal, then
    Y ~ N(0, 1),
  and the cdf of Y is
    g(y) = 1/√{2π} e^{-y^2/2}.

  More on the normal distribution next time.