Administrative info
  HW6 out

Review
  Recall that a random variable is a function from the sample space
  Ω to the real numbers R. It assigns a real number to each
  sample point.

  For each value a that a random variable X can take on, X = a is an
  event that consists of all the outcomes ω for which X(ω)
  = a. This event has probability
    Pr[X = a] = ∑_{ω : X(ω)=a} Pr[ω].

  The events X = a partition the sample space, so they are disjoint
  for distinct a and their union is the sample space.

  The distribution of a random variable X is the set of probabilities
  Pr[X = a] for every possible value of a. The probabilities must add
  up to 1, since the X = a partition the sample space.

  Recall the coin flipping game from last time. We flip a biased coin
  that has probability p of heads n times, and for each heads we win
  $1 and we lose $1 for each tails. We are interested in how much
  total money we win.

  Let's call W our winnings. If we flip the coin three times, then
    W(hhh) = 3    W(hth) = 1     W(thh) = 1     W(tth) = -1
    W(hht) = 1    W(htt) = -1    W(tht) = -1    W(ttt) = -3

  In general, we determined that if we flip the coin n times, then
  W(ω) = 2 H(ω) - n, where H(ω) is the number of
  heads in ω. We abbreviate this statement as W = 2H - n.

  We then demonstrated that the distribution of H is
    Pr[H = i] = C(n, i) p^i (1-p)^(n-i)
  for integer i, 0 <= i <= n. This is a binomial distribution with
  parameters n and p, denoted by H ~ Bin(n, p).

  As another example, suppose we throw n balls into m bins, uniformly
  at random, with each ball thrown independently of the others. Let
  B_1 be the number of balls in bin 1. What is the distribution of
  B_1?

  Each ball has probability 1/m of going into the first bin, and there
  are n balls. Let's make an analogy with coin flipping. Let's refer
  to throwing a ball as flipping a coin, and a ball going into bin 1
  as a heads. Then this is the same as flipping a coin n times with a
  bias of p = 1/m, so the distribution of B_1 is B_1 ~ Bin(n, 1/m).

  Anytime we have an experiment with n mutually independent trials,
  each of which has probability p of success, then the number of
  successes X has distribution X ~ Bin(n, p).

  Once we computed the distribution of H, we computed the distribution
  of W = 2H - n. We have
    Pr[W = j] = Pr[2H - n = j]
      = Pr[H = (j+n)/2]
      = C(n, (j+n)/2) p^[(j+n)/2] (1-p)^[n-(j+n)/2]
  for integer (j+n)/2, 0 <= (j+n)/2 <= n. Solving for j, we get -n
  <= j <= n as the range of values W can take on, but with the caveat
  that it only takes on even values of n is even and odd values if n
  is odd so that (j+n)/2 is an integer.

  This demonstrated the value of writing one random variable in terms
  of others. Computing the distribution of W directly would have been
  more difficult than computing it from H.

  Another example of writing a random variable in terms of others is
  with indicator random variables. Recall the exam example from last
  time. We had n students, each of who receives a random exam back. We
  were interested in how many students get their own exam back.
  Calling this random variable X, we then defined X_i to be an
  indicator random variable that is 1 if the ith student gets his or
  her own exam back and 0 otherwise. Then X = X_1 + ... + X_n.

  In the case of n = 3, we had
    (1,2,3)  (1,3,2)  (2,1,3)  (2,3,1)  (3,1,2)  (3,2,1),
  and the values of the X_i were
       1        1        0        0        0        0       X_1
       1        0        0        0        0        1       X_2
       1        0        1        0        0        0       X_3,
  so the values of X were
       3        1        1        0        0        1        X.

Expectation
  Sometimes we are interested in less information than the entire
  distribution of a random variable. In particular, we may be
  interested in an "average" value. For example, if I play the coin
  flipping game many times, I want to know how much I'd win on average
  in each game.

  Let's review the distribution of W_f, the amount won, when the game
  has 3 flips and uses a fair coin.
    Pr[W_f=3] = 1/8
    Pr[W_f=1] = 3/8
    Pr[W_f=-1] = 3/8
    Pr[W_f=-3] = 1/8
  What is the "average" value? It looks like it should be 0.

  What if we were using a coin with bias p? Then the amount won W_p
  has distribution
    Pr[W_p=3] = p^3
    Pr[W_p=1] = 3 p^2 (1-p)
    Pr[W_p=-1] = 3 p (1-p)^2
    Pr[W_p=-3] = (1-p)^3
  Now what is the "average" value? Now it's not obvious what it is.

  Formally, we define the "expected value" of a random variable X to
  be
    E(X) = ∑_{ω ∈ Ω} X(ω) Pr[ω].
  Each sample point ω contributes its value of the random
  variable X(ω) to the expected value according to how likely
  the outcome is.

  The expected value is also known as the "expectation," "average," or
  "mean."

  Since X(ω) = a for any outcome in X = a, Pr[X = a] =
  ∑_{ω : X(ω)=a} Pr[ω], and the X = a partition
  the sample space, we can equivalently write
    E(X) = ∑_{a ∈ A} a * Pr[X = a],
  where A is the set of all values that X can take on.

  Now that we have a formal definition for expectation, let us determine
  the expectation of W_f and W_p. For W_f, we have
    E(W_f) = 3 Pr[W_f=3] + 1 Pr[W_f=1] - 1 Pr[W_f=-1] - 3 Pr[W_f=-3]
      = 3 * 1/8 + 1 * 3/8 - 1 * 3/8 - 3 * 1/8
      = 0,
  as expected. For W_p, we have
    E(W_p) = 3 Pr[W_p=3] + 1 Pr[W_p=1] - 1 Pr[W_p=-1] - 3 Pr[W_p=-3]
      = 3 * p^3 + 1 * 3 p^2 (1-p) - 1 * 3 p (1-p)^2 - 3 * (1-p)^3
      = 3p^3 + 3p^2 - 3p^3 - 3p + 6p^2 - 3p^3 - 3 + 9p - 9p^2 + 3p^3
      = 6p - 3.

  In the example of passing back exams, we had
    (1,2,3)  (1,3,2)  (2,1,3)  (2,3,1)  (3,1,2)  (3,2,1)
       3        1        1        0        0        1        X.
  What is E(X)? Using the first definition, we get
    E(X) = 3 Pr[(1,2,3)] + 1 Pr[(1,3,2)] + ...
      = 3 * 1/6 + 1 * 1/6 + 1 * 1/6 + 0 * 1/6 + 0 * 1/6 * 1 * 1/6
      = 1.
  We can also compute this using the distribution:
    Pr[X=3] = 1/6
    Pr[X=1] = 1/2
    Pr[X=0] = 1/3
    E(X) = 3 Pr[X=3] + 1 Pr[X=1] + 0 Pr[X=0]
      = 3/6 + 1/2
      = 1.

  Suppose we play a game of roulette. There are 38 slots on a roulette
  wheel, 18 of which are black, 18 of which are red, and two of which
  are green. If we bet $1 on black (which we always do), how much do
  we expect to win in a single game?

  Let W be our winnings. If black comes up, we win $1, otherwise we
  lose $1. So then
    Pr[W=1] = 18/38 = 9/19
    Pr[W=-1] = 20/38 = 10/19
    E(W) = 1 Pr[W=1] - 1 Pr[W=-1]
      = 9/19 - 10/19
      = -1/19.

  Suppose we roll a fair die. What is the expected value of the
  resulting number? Let N be a random variable corresponding to the
  number. Then
    Pr[N=i] = 1/6 for 1 <= i <= 6, and
    E(X) = 1 Pr[N=1] + 2 Pr[N=2] + ... + 6 Pr[N=6]
      = 1 * 1/6 + 2 * 1/6 + ... + 6 * 1/6
      = 1/6 (1 + 2 + ... + 6)
      = 1/6 * 6 * 7 / 2
      = 7/2.
  Note that this isn't actually a value that N can take on, but it is
  the expectation or average.

  Suppose we roll two fair dice. Let S be the sum of the values on the
  two dice. What is E(S)? We can compute the distribution
    Pr[S=2] = 1/36
    Pr[S=3] = 1/18
    Pr[S=4] = 1/12
    ...
    Pr[S=12] = 1/36.
  This is quite tedious, and at the end, we get E(S) = 7 = 2 E(N). It
  seems like there should be a simpler way to arrive at this.

  As a final example, suppose we pick 100 Californians at uniformly at
  random. How many Democrats do we expect out of this group, given
  that 44.5\% of Californians are Democrat? Intuitively, we'd expect
  44.5, but how can we arrive at that without computing a large
  distribution?