Administrative info
  HW5 due tomorrow
  HW6 out tonight

Review
  In defining events, we noted that we often do not care about a
  specific outcome to a random experiment but whether or not the
  outcome is part of a special set of outcomes, which we called
  events. For example, when flipping a fair coin 100 times, we may
  care about whether or not the number of heads and tails is the same.
  So we define E as the event that we get 50 heads and compute Pr[E] =
  C(100, 50) / 2^100.

Random Variables
  Sometimes, what we care about is a numerical value of an outcome.
  For example, if we receive $1 for each heads in 100 flips of a fair
  coin and lose $1 for each tails, we care about how much total money
  we earn or lose. For any particular outcome ω, this is just
  the number of heads in ω minus the number of tails. We can
  compute the amount of money we win or lose for every one of the 2^n
  outcomes. This is a "random variable."

  A random variable is a function that assigns a value to each sample
  point. More formally, a random variable X is a function from
  Ω, the sample space to R, the set of real numbers. The value
  of the random variable at sample point ω is denoted as
  X(ω) (like any other function). (Note: a random variable is
  neither random nor a variable, since it is a function. Why is it
  called a random variable? I don't know, but since the outcome of an
  experiment is random, the value of the random variable is a function
  of a random outcome.)

  Let's go back to coin flipping. Suppose I flip a fair coin once. Let
  X be a random variable that is +1 if I get heads, -1 if I get tails.
  What is X(ω) for each outcome ω? Well, X(h) = +1 and
  X(t) = -1.

  What if I flip a fair coin three times, where X is the amount I win
  if I win $1 for each heads, lost $1 for each tails? Then
    X(hhh) = 3    X(hth) = 1     X(thh) = 1     X(tth) = -1
    X(hht) = 1    X(htt) = -1    X(tht) = -1    X(ttt) = -3

  In general, if I flip a fair coin n times, then if X is the amount I
  win, X(ω) = H(ω) - T(ω), where H(ω) is the
  number of heads in ω and T(ω) is the number of tails in
  ω. Notice that H and T are also random variables, since they
  assign a real number to each sample point. Defining a random
  variable in terms of simpler random variables is a very useful
  procedure. (Notice a common theme with induction, counting,
  probability, and now random variables? All involve reducing a hard
  problem to simpler problems.)

  We can actually note that H(ω) + T(&omega) = n, since each
  flip must be heads or tails. So we can further write
    X(ω) = H(ω) - (n - H(ω))
      = 2 H(ω) - n.

  Suppose that rather than handing back your exams individually in
  section, we hand back a random exam to each person in lecture. Let
  X be the number of students who get their own homeworks back. What
  is X(ω) for each sample point ω?

  First, let us determine the sample space. Each outcome is just a
  permutation of the n students in class {1, ..., s}. For example,
  ω = (2, 3, ..., n, 1) corresponds to student i getting the
  exam for student mod(i+1, n). Since each outcome is a permutation,
  there are n! outcomes, which we assume to have uniform probability.

  Now let's define a series of simpler random variables. Let X_i be a
  random variable that is 1 if the ith person gets his or her own
  homework back, 0 otherwise. Such a 0/1-valued random variable is
  called an "indicator random variable" and is a very common and
  useful type of random variable. Then we have
    X(ω) = X_1(ω) + ... + X_n(ω).

  As a concrete example, suppose n = 3. Then the outcomes are
    (1,2,3)  (1,3,2)  (2,1,3)  (2,3,1)  (3,1,2)  (3,2,1)
  Then the values of the X_i are
       1        1        0        0        0        0       X_1
       1        0        0        0        0        1       X_2
       1        0        1        0        0        0       X_3
  and the values of X are
       3        1        1        0        0        1        X.

  We can use the same procedure in the case of coin flipping. Here, X
  is the total amount won in n flips. Let X_i be the amount won in the
  ith flip, +1 if it is heads, -1 if it is tails. Then
    X(ω) = X_1(ω) + ... + X_n(ω).
  Then in the case of n = 3, we have
     hhh   hht   hth   htt   thh   tht   tth   ttt
      1     1     1     1    -1    -1    -1    -1           X_1
      1     1    -1    -1     1     1    -1    -1           X_1
      1    -1     1    -1     1    -1     1    -1           X_1
      3     1     1    -1     1    -1    -1    -3            X
  as before.

Distributions
  As with events, we often don't care about the value of a random
  variable at each outcome. Rather, we care about the probability that
  the random variable takes any particular value. In fact, we can
  define events in terms of random variables. We define "X = a" to be
  the event that the random variable X takes on value a. More
  formally,
    X = a ≡ {&omega : &omega ∈ Ω ∧ X(&omega) = a},
  i.e. X = a is the set of outcomes ω for which X(ω) = a.

  In the above example with three coin flips, we have
    (X = 3) = {hhh}
    (X = 1) = {hht, hth, thh}
    (X = -1) = {htt, tht, tth}
    (X = -3) = {ttt}.

  Now since each X = a is an event, we can compute the probability
  Pr[X = a]. With the coin flips, we have
    Pr[X = 3] = 1/8
    Pr[X = 1] = 3/8
    Pr[X = -1] = 3/8
    Pr[X = -3] = 1/8
  This set of probabilities is called the "distribution" of the random
  variable X.

  We can also draw a graph to depict the distribution.
    Pr[X=a]
           |
       3/8 |       *     *
           |       *     *
       1/8 | *     *     *     *
           --+--+--+--+--+--+--+--
            -3 -2 -1  0  1  2  3  a

  Note that since X is a function from Ω to R, each ω has
  exactly one value a such that X(ω) = a, so ω is in exactly
  one of the events X = a. Thus, the events X = a partition the sample
  space. This means that
  (1) (X = a_1 ∩ X = a_2) = ∅ if a_1 ≠ a_2
  (2) ∪_{a ∈ A} (X = a) = Ω, where A is the set of all
      possible values that X(ω) can take on.
  These two facts imply that the sum of all probabilities in the
  distribution of X is 1.

  In the example of passing back exams, what is the distribution of X,
  the number of students who get their own exam back? For n = 3, we
  have
    Pr[X = 3] = 1/6
    Pr[X = 1] = 1/2
    Pr[X = 0] = 1/3.
  What about arbitrary n? Let's come back to that later.

  Let's take another look at the coin flipping example, but for
  arbitrary n. The distribution of X, the amount of money won, seems
  non-trivial. But since we know that X(&omega) = 2 H(ω) - n,
  let's first compute a distribution for H, the number of heads.

  If we flip a fair coin n times, in how many outcomes are there
  exactly i heads? This is just choosing i out of the n flips to be
  heads, so there are C(n, i) outcomes. Then |H = i| = C(n, i), so
  Pr[H = i] = C(n, i) / 2^n. This is the distribution of H, where i is
  an integer 0 <= i <= n.

  Here is a graph representation of the distribution of H when n = 5:
    Pr[X=a]
           |       *  *
      9/32 |       *  *
           |       *  *
      7/32 |       *  *
           |       *  *
      5/32 |    *  *  *  *
           |    *  *  *  *
      3/32 |    *  *  *  *
           |    *  *  *  *
      1/32 | *  *  *  *  *  *
           --+--+--+--+--+--+--
             0  1  2  3  4  5  a
  You can see the beginnings of a bell curve.

  It follows that the distribution of X is Pr[X = i] = Pr[2H-n = i] =
  Pr[H = (i+n)/2], where -n <= i <= n. If (i+n)/2 is not an integer
  (e.g. i i is odd and n is even), then this is 0.

  Now suppose we are flipping a biased coin with probability p of heads.
  Then what is the distribution of H, the number of heads?

  First, how many outcomes are in H = i? As before, there are C(n, i).
  But now we can't just divide by the size of the sample space, since
  it is not uniform. Instead, we use the definition of the probability
  of an event, that it is the sum of the probabilities of the outcomes
  in the event. What is the probability of each outcome in H = i? We
  already computed this in a previous lecture as p^i (1-p)^(n-i). So
    Pr[H = i] = C(n, i) p^i (1-p)^(n-i),
  where i is an integer 0 <= i <= n.

  This is known as the "binomial distribution" with parameters p and
  n, where p is the probability of getting heads in any one flip and n
  is the number of flips. We use the shorthand H ~ Bin(n, p) to denote
  that H is a random variable with a binomial distribution with
  parameters n and p.

  The graph of a binomial distribution with parameters p and n is
  bell-shaped, though it will be skewed in one direction if p is not
  1/2. See the reader for an example.

  The binomial distribution comes up in any experiment with n
  independent trials, each with probability of success p. As another
  example, suppose we are sending n packets over a network, where we
  choose the path from source to destination randomly and
  independently for each packet. Suppose that the probability that a
  single packet reaches its destination is p. Then if X is the number
  of packets that reach the destination, X ~ Bin(n, p), so Pr[X = i] =
  C(n, i) p^i (1-p)^(n-i) for 0 <= i <= n.