Administrative info
  MT1 tomorrow! 5-7pm in 10 Evans
  MT1 policies
  - 1 cheat sheet (8.5x11, double sided)
  - no calculators
  HW4 solutions posted
  HW5 posted

Review
  Recall that a probability space consists of the following:
  (1) A random experiment.
  (2) The sample space (set of possible outcomes).
  (3) The probability of each possible outcome of the experiment.
  Further recall that the probabilities must satisfy the following:
  (1) ∀ω∈Ω . 0 <= Pr[ω] <= 1
  (2) ∑_{ω∈Ω} Pr[ω] = 1
  An event is a subset of the sample space, i.e. a set of outcomes
  from the sample space. The probability of an event E is the sum of
  the probabilities of the outcomes in E:
    Pr[E] = ∑_{ω∈E} Pr[ω].
  In the case of a uniform distribution, this simplifies to
    Pr[E] = |E|/|Ω|.

Probability Theory (cont.)
  Let's start off with some examples to refresh our memory.
    EX: If I toss a fair coin 100 times, what is the probability that
        I get exactly 50 heads?
    ANS: We know that this experiment has uniform distribution with
         |Ω| = 2^100. Let E be the event that there are exactly
         50 heads. Then we know from counting that |E| = C(100, 50).
         Thus, Pr[E] = C(100, 50) or about 0.08.
    EX: If I toss a fair coin 100 times, what is the probability that
        I get more heads than tails?
    ANS: Let F be the event that there are more heads than tails, G be
         be the event that there are more tails than heads. Since
         there is a 1-to-1 correspondence between outcomes in F and G,
         |F| = |G|. Also note that the sets E (as defined above), F,
         and G are disjoint, and that Ω = E ∪ F ∪ G.
         Thus, |Ω| = |E| + |F| + |G| = |E| + 2|F| = C(100, 50) +
         2|F| = 2^100. Thus, |F| = (2^100 - C(100, 50))/2, and Pr[F] =
         |F|/|Ω| = 1/2 - C(100, 50)/2^101 ≈ 0.46.
    EX: Suppose I roll a red and a blue die.
        Ω = {(a, b) : 1 <= a,b <= 6}
        Pr[ω] = 1/36 for all ω
        What is the probability of
        (a) the red die showing 6?
            E = {(6, b) : 1 <= b <= 6}
            Pr[E] = |E|/|Ω| = 1/6
        (b) at least one die showing 6?
            E1 = {(6, b) : 1 <= b <= 6}
            E2 = {(a, 6) : 1 <= a <= 6}
            E = E1 ∪ E2
            |E| = |E1| + |E2| - |E1 ∩ E2| = 11
            Pr[E] = 11/36
            Note that in general, Pr[A ∪ B] = Pr[A] + Pr[B] - Pr[A
            ∩ B] whether or not the sample space has uniform
            distribution.
        (c) the dice sum to 7?
            E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}
            Counting argument: for each choice of the value of the red
            die i, there is exactly one choice for the blue die 7-i,
            so there are 6 total choices.
            Pr[E] = 1/6
        (d) the dice sum to 10?
            E = {(4, 6), (5, 5), (6, 4)}
            Pr[E] = 1/12
    EX: Suppose I roll two blue dice.
        Ω = {(a, b) : 1 <= a <= b <= 6}
        |Ω| = 21
        Pr[(a, b)] = 1/36 if a = b, 1/18 if a != b. Note that this is
        not a uniform distribution.
        Let's make sure this sums to 1. There are 6 outcomes in which
        a = b and 15 where a != b, so the sum is 6*1/36 + 15*1/18 =
        1/6 + 5/6 = 1.
        Now what is the probability of
        (a) at least one die showing 6?
            E = {(a, 6) : 1 <= a <= 6}
            Here, Pr[E] != |E|/|Ω| = 6/21 = 2/7, since we don't
            have a uniform distribution. So we have to do more work.
            There is one outcome in E whose probability is 1/36 and 5
            whose probability is 1/18. So we have
              Pr[E] = 1/36 + 5*1/18 = 1/36 + 10/36 = 11/36.
            Note that we got the same answer as in the distinguishable
            dice case, as expected.
        (b) the dice sum to 7?
            E = {(1, 6), (2, 5), (3, 5)}
            Each outcome has probability 1/18, so Pr[E] = 3/18 = 1/6.

  As you can see, the choice of probability space affects the set of
  outcomes and the probability distribution, as well as the difficulty
  of the computations. However, the choice does not affect the end
  results when we consider events of interest. The lesson here is to
  always try to define the probability space to make probability
  computations as simple as possible. In the two blue dice case, it
  would have made things simpler if we had assumed that the two dice
  were distinguishable and modeled the experiment accordingly.

  Sometimes, however, we don't have the choice of a uniform
  probability space.
  EX: Suppose I toss a biased coin with 1/3 probability of heads.
      Ω = {H, T}
      Pr[H] = 1/3, Pr[T] = 2/3
      Suppose I toss it three times?
      Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
      Pr[ω] = ?
      Intuitively, we'd expect that Pr[HHH] = (1/3)^3, Pr[HHT] =
      (1/3)^2 * 2/3, and so on. Later on, we will see why this is the
      case.

Birthday Paradox
  There are 64 students enrolled in CS70. What is the likelihood that
  some pair of students have the same birthday?

  Let's model this as a random experiment that assigns birthdays to n
  students, with equal probability for any outcome. We further assume
  that there are exactly 365 days in a year. How many outcomes are
  there? We have |Ω| = 365^n. Now in how many of these outcomes
  do two people share the same birthday?

  Let E be the event that two people share the same birthday. It's not
  clear how to compute Pr[E] directly: we don't want to overcount the
  cases in which three people have the same birthday, or four, and so
  on.

  Sometimes, it is easier to compute the probability of an event E's
  complement Ω\E, which is the set of all outcomes that are not
  in E. For simplicity, the complement of an event E is written as
  E. In this case, E is the set of outcomes in which no
  two students share the same birthday. What is Pr[E]? Well,
  |E| = 365!/(365-n)! (by the product rule or n-permutation of
  365 days), so Pr[E] = 365!/[(365-n)! 365^n].

  Now how to get to Pr[E]? Well, E happens exactly when E does
  not, so we expect Pr[E] + Pr[E] = 1. Thus, Pr[E] = 1 -
  365!/[(365-n)! 365^n].

  Plugging in values for n, we see that Pr[E] ≈ 0.51 when n =
  23. So there is a greater than even chance that two people have the
  same birthday out of a group of 23. For n = 64, we get Pr[E] ≈
  0.997. So it's almost guaranteed that there are two CS70 students
  who have the same birthday.

Monty Hall
  Consider the following situation, based on a 1970s game show hosted
  by Monty Hall. A contestant is shown three doors, one of which opens
  to a prize and the remaining two of which open to goats. The
  contestant must pick a door but does not open it. Hall's assistant
  Carol opens one of the remaining doors, revealing a goat. The
  contestant then can stick with his original door or switch to the
  remaining unopened door. His final door choice is opened, and he
  wins the prize only if it is behind his chosen door. What is the
  contestant's best strategy?

  Intuitively, it seems that since there are two remaining doors after
  one with a goat has been opened, it makes no difference whether or
  not the contestant switches. We will see that this is not actually
  the case.

  Let's model this problem. Define an outcome ω as a triple (i,
  j, k), where i is the door where the prize is, j is the initial door
  picked by the contestant, and k is the result of a coin flip by
  Carol. If Carol has two doors to choose from (i.e. i = j), then she
  opens the lower-numbered door if k = Heads and the higher-numbered
  door if k = Tails. If she has no choice, then she ignores the coin
  flip.

  (Why did we add the coin flip? There is randomness in Carol's choice
  of doors, and we need to make sure we model all the randomness in
  the experiment. We keep the coin flip even if Carol has no choice in
  order to preserve a uniform probability distribution.)

  How many outcomes are there? There are 3 choices for each of i and j
  and 2 for k, so there are 18 outcomes. It is reasonable to assume
  that each outcome is equally likely, so we have a uniform
  probability distribution where each outcome has probability 1/18.

  Note that for each particular outcome ω, the contestant wins
  for exactly one choice of strategy, staying or switching. If he
  would lose in ω by staying, he would win by switching, and
  vice versa.

  Let's define A as the event that the contestant wins by staying with
  his initial choice. What outcomes are in A? Clearly, it is the
  sample points (i, j, k) for which i = j, so there are 6 outcomes in
  A, and Pr[A] = 6/18 = 1/3. Then A is the event that he wins by
  switching, so it must be that Pr[A] = 2/3. (We can also note
  that A contains all the sample points (i, j, k) where i != j,
  of which there are 12, so Pr[A] = 12/18 = 2/3.)

  So the probability of winning by switching is 2/3, not 1/2!

  Here's an intuitive way to think about this to see why the result
  makes sense. Suppose without loss of generality that the contestant
  picks door 1. Then Hall gives the contestant the option to switch
  from door 1 to both of the doors 2 and 3. After he makes the choice
  to stay or switch, Carol opens one of doors 2 and 3 to reveal a
  goat. If he had switched, then he wins the prize if the other of
  doors 2 and 3 has the prize. What do you think the probability of
  winning is in this case? Can you see how this is the same as the
  original problem? (In fact, the probability space is exactly the
  same!)

  This example illustrates why it is important to rigorously and
  systematically compute probabilities. When this problem was
  presented in Parade magazine in 1990, around 10,000 readers wrote in
  claiming that the result was wrong, including almost 1000 with PhDs.
  This demonstrates the danger of relying on intuition.