Administrative info
  PA2 due Monday
  HW5 due Tuesday

Review
  Recall that a probability space consists of the following:
  (1) A random experiment.
  (2) The sample space (set of possible outcomes).
  (3) The probability of each possible outcome of the experiment.
  Further recall that the probabilities must satisfy the following:
  (1) ∀ω∈Ω . 0 <= Pr[ω] <= 1
  (2) ∑_{ω∈Ω} Pr[ω] = 1
  An event is a subset of the sample space, i.e. a set of outcomes
  from the sample space. The probability of an event E is the sum of
  the probabilities of the outcomes in E:
    Pr[E] = ∑_{ω∈E} Pr[ω].
  In the case of a uniform distribution, this simplifies to
    Pr[E] = |E|/|Ω|.

Probability Identities
  Before we move on, let's note some facts about probability that can
  make it easier to compute probabilities.

  We defined the complement of an event E as E = &Omega\E. Then,
  Pr[E] = 1 - Pr[E]. Proof:
    ∑_{ω∈Ω} Pr[ω]
      = ∑_{ω∈E;} Pr[ω] +
        ∑_{ω∈Ω\E} Pr[ω]
      = Pr[E] + Pr[E] = 1.

  Let A and B be events in Ω. Then Pr[A ∪ B] = Pr[A] + Pr[B]
  - Pr[A ∩ B]. Writing this out in terms of sums, we get
    ∑_{ω∈Ω} Pr[ω]
      = ∑_{ω∈A;} Pr[ω] +
        ∑_{ω∈B} Pr[ω] -
        ∑_{ω∈A ∩ B} Pr[ω]
  As in inclusion/exclusion for sets, the first two terms double count
  the probabilities of those outcomes in A ∩ B, so we have to
  subtract the probability of A ∩ B.

  EX: What is the probability that a random integer n between 1 and
      100 is divisible by 5 or 7?
  ANS: Let A be the event that n is divisible by 5, B be the event
       that it is divisible by 7. Pr[A] = 1/5, Pr[B] = 14/100 = 7/50,
       and Pr[A ∩ B] = 2/100 = 1/50. So Pr[A ∪ B] = 1/5 + 7/50
       - 1/50 = 16/50 = 8/25.

  Let A_1, ..., A_n be n mutually disjoint events in Ω. Then
  Pr[A_1 ∪ ... ∪ A_n] = Pr[A_1] + ... + Pr[A_n]. This follows
  from the above, generalized to n events using induction, and then
  removing the intersection terms which are all 0.

  EX: Suppose I roll a red and a blue die. What is the probability that
      the red die is less than 4?
  ANS: Let A_i be the event that the red die is i. Then Pr[A_i] = 1/6
       for 1 <= i <= 6, and the A_i are mutually disjoint. Thus,
       Pr[A_1 ∪ A_2 ∪ A_3] = Pr[A_1] + Pr[A_2] + Pr[A_3] =
       1/2.

Conditional Probability
  A pharmaceutical company is marketing a new test for HIV that it
  claims is 99% effective, meaning that it will report positive for
  99% of people who have HIV and negative for 99% of those who don't
  have HIV. Suppose a random person takes the test and gets a positive
  test result. What is the probability that the person has HIV?

  This is an example of conditional probability. Given some
  information about a particular event in the sample space, we want to
  compute new probabilities for other events.

  Let's start off with simpler examples before coming back to the
  above.

  EX: Suppose I flip a fair coin twice. The result of the first flip
      is heads. What is the probability that I got two heads?
  ANS: Let's start by drawing the sample space Ω. There are 4
       equally likely outcomes HH, HT, TH, and TT. We are now told
       that event A = "the first flip is H" has occurred. Which
       outcomes are now possible? There are only 2 outcomes in A, HH
       and HT, each of which is equally likely. So we have a new
       sample space Ω' that consists of just the outcomes HH and
       HT, each with probability 1/2. Let event B = "both flips are
       heads." What is the probability of B in this new sample space?
       Only one of the two outcomes in Ω' is in B, so Pr[B] =
       1/2 in the new sample space. We write this as Pr[B|A], "the
       probability of B given A," which is the probability of B
       occurring in a new sample space consisting of just those
       outcomes in A.

  Generalizing the above procedure, suppose we are told an event A
  occurs. Then what is the new conditional probability of each outcome
  ω, i.e. Pr[ω|A]? For ω ∉ A, this is clearly
  0. For ω ∈ A, the relative likelyhood of any two outcomes
  in A should remain the same, but we need to renormalize so that we
  satisfy the requirement that all probabilities add to 1. By
  definition, we had ∑_{ω ∈ A} Pr[ω] = Pr[A], so
  if we normalize by dividing by Pr[A], i.e. Pr[ω|A] =
  Pr[ω]/Pr[A], we get ∑_{ω ∈ A} Pr[ω|A] =
  ∑_{ω ∈ A} Pr[ω]/Pr[A] = Pr[A]/Pr[A] = 1.

  Now suppose we have another event B. What is Pr[B|A]? The outcomes
  in B that are not in A contribute nothing, since their new
  conditional probabilities are 0. So only the outcomes in both B and
  A contribute any probability, and we get Pr[B|A] = ∑_{ω
  ∈ B ∩ A} Pr[ω|A] = ∑_{ω ∈ B ∩ A}
  Pr[ω]/Pr[A] = Pr[B ∩ A]/Pr[A].

  To summarize, when conditioning on an event A, we cross out any
  possibilities that are incompatible with A and then renormalize by
  1/Pr[A] so that the probabilities of the remaining outcomes add to
  1. We can compute the probabilities of events directly in this new
  sample space or use the identities above to get the same result.

  EX: Suppose I toss a red and a blue die, and I tell you that the
      resulting sum is 4. What is the probability that the red die is
      1?
  ANS: Let A be the event that the sum is 7, B be the event that the
       red die is 1. The outcomes (1, 3), (2, 2), and (3, 1) are in A,
       so Pr[A] = 1/12. What is Pr[B ∩ A]? Only the outcome (1, 3)
       is in B ∩ A, so Pr[B ∩ A] = 1/36. Then Pr[B|A] = Pr[B
       ∩ A]/Pr[A] = 1/3.
       We could also have redefined the sample space to come up with
       the same result. Given A, we have a new sample space Ω'
       consisting of the outcomes (1, 3), (2, 2), and (3, 1), each
       with probability 1/3. Then B has probability 1/3 in this new
       sample space. So Pr[B|A] = 1/3.

  EX: Suppose I toss a red and a blue die, and I tell you that the
      resulting sum is 7. What is the probability that the red die is
      1?
  ANS: Let A be the event that the sum is 7, B be the event that the
       red die is 1. Pr[A] = 1/6 as we computed before. What is Pr[B
       ∩ A]? Only the outcome (1, 6) is in B ∩ A, so Pr[B
       ∩ A] = 1/36. Then Pr[B|A] = Pr[B ∩ A]/Pr[A] = 1/6.

  EX: Suppose I toss 3 balls into 3 bins (with replacement). Let
      A = "1st bin empty," B = "2nd bin empty." What is Pr[A|B]?
  ANS: Pr[B] = 2^3/3^3 = 8/27, Pr[A ∩ B] = 1/3^3 = 1/27, so
       Pr[A|B] = (1/27)/(8/27) = 1/8.
       Thus, the fact that the 2nd bin is empty makes it much less
       likely that the 1st one is as well.

  EX: Suppose I flip a fair coin 51 times. If the first 50 flips are
      heads, what is the probability that the 51st is heads?
  ANS: Let A be the event that the first 50 flips are heads, B be the
       event that the 51st is heads. There are only 2 outcomes in A
       out of 2^51, so Pr[A] = 1/2^50. There are 2^50 outcomes in B,
       so Pr[B] = 1/2. Only one outcome is in both A and B, so Pr[A
       ∩ B] = 1/2^51. Then Pr[B|A] = (1/2^51)/(1/2^50) = 1/2.
       So the first 50 flips tell us nothing about the 51st; the
       probability of heads is still 1/2.

  We have seen multiple examples where Pr[B|A] = Pr[B]. We say that A
  and B are "independent" of this is the case. Intuitively, two events
  A and B are independent of knowing that one happens does not change
  the likelihood of the other happening. So the 51st flip of a fair
  coin is independent from what came up before.

  If A and B are independent, we get Pr[B|A] = Pr[B ∩ A]/Pr[A] =
  Pr[B], so Pr[B ∩ A] = Pr[A] Pr[B]. This is a very useful
  identity.

  EX: Suppose I flip a coin with probability p of heads n times. What
      is the probability of a particular outcome with k heads?
  ANS: Each flip is independent, with probability p for heads. The k
       heads flips have probability p, and the n-k tails flips have
       probability (1-p). So we get p^k (1-p)^(n-k) for an outcome
       with k heads.

  EX: Suppose a casino advertises the following game. You pick a
      number from 1 to 6. The casino rolls three dice, and if your
      number comes up, you win. What is your probability of winning?
  ANS: It's not 1/2! Let A_i be the event that your number comes up
       on the ith die. We want to know
       Pr[A_1 ∪ A_2 ∪ A_3]
         = 1 - Pr[A_1 ∩ A_2 ∩ A_3]
         = 1 - Pr[A_1] Pr[A_2] Pr[A_3]
         = 1 - (5/6)^3 ≈ 1 - 0.58 = 0.42.
       In the third line above, we used the fact that the results of
       each dice are mutually independent. We will come back to the
       concept of mutual independence later.
       So your probability of winning is less than 1/2.

  Suppose you are flying to Las Vegas (in order to play the game
  above). Your friend, fearing for your safety, gives you the
  following advice: "You know, you should always carry a bomb on an
  airplane. The chance of there being one bomb on the plane is pretty
  small, but the chance of two bombs is miniscule. So by carrying a
  bomb on the airplane, your chances of being blown up are
  astronomically reduced." What do you think of his advice?

  Let A be the event that you carry a bomb on board, B be the event
  that someone else carries a bomb on board. How are A and B related?
  They are independent, so Pr[B|A] = Pr[B], and the likelihood that
  someone else has a bomb doesn't change one bit if you bring one
  aboard.