Administrative info
  PA2 due today
  HW5 due tomorrow

Review
  Recall that the conditional probability of event B given event A is
  Pr[B|A] = Pr[A ∩ B]/Pr[A]. Also recall that events A and B are
  independent if Pr[B|A] = Pr[A], or Pr[A ∩ B] = Pr[A] Pr[B].

  Further recall the general product rule:
    Pr[A_1 ∩ ... ∩ A_n]
      = Pr[A_1] * Pr[A_2|A_1] * Pr[A_3|A_1 ∩ A_2] * ...
        * Pr[A_n | A_1 ∩ ... ∩ A_{n-1}].

  Events A_1, ..., A_n are mutually independent if for any i in [1, n]
  and any subset I ⊆ {1, ..., n}\{i} (i.e. any subset I that does
  not contain i), we have
    Pr[A_i|∩_{j∈I} A_j] = A_i.
  Then it follows from the product rule that
    Pr[A_1 ∩ ... ∩ A_n] = Pr[A_1] Pr[A_2] ... Pr[A_n].

  Recall Bayes' Rule:
    Pr[A|B] = Pr[B|A] * Pr[A] / Pr[B].

  Recall the variations of the Total Probability Rule:
    Pr[B] = Pr[A ∩ B] + Pr[A ∩ B].
      = Pr[B|A] Pr[A] + Pr[B|A] Pr[A]
      = Pr[B|A] Pr[A] + Pr[B|A] (1 - Pr[A]).

  Combining Bayes' Rule and the Total Probability Rule, we get
    Pr[A|B] = Pr[B|A]Pr[A] / (Pr[B|A]Pr[A]+Pr[B|A](1-Pr[A])).

Base Rates
  Recall the HIV test from last time. We defined A to be the event
  that a random person has HIV, B to be the event that he tests
  positive. We computed Pr[B|A] = 0.99,
  Pr[B|A] = 0.99,
  and , Pr[B|A] = 0.01.

  We then computed that if we have a base rate of Pr[A] = 0.00025 in
  the entire population, then
    Pr[A|B] = 0.99 * 0.00025 / (0.99 * 0.00025 + 0.01 * 0.99975)
      ≈ 0.024.
  This tells us that blanket testing the entire population is not a
  good idea, since the test will produce far more false positives
  than actual positives.

  What if we only tested a subpopulation with a higher risk factor for
  HIV, say in which 1 in 5 people are infected? That changes the base
  rate to Pr[A] = 0.2, and we get
    Pr[A|B] = 0.99 * 0.2 / (0.99 * 0.2 + 0.01 * 0.8)
      ≈ 0.96.
  So if the base rate is much higher, the test is far more effective
  at detecting HIV. And if you have a high risk factor, this is a test
  you'd want to take.

  The takeaway here is that we can't ignore the base rate when
  evaluating the effectiveness of a test. While it doesn't make sense
  to blanket test the entire population, since its base rate is quite
  low, it does make sense to test subpopulations with much higher base
  rates.

Inclusion/Exclusion
  Recall the inclusion/exclusion principle for events A and B:
    Pr[A ∪ B] = Pr[A] + Pr[B] - Pr[A ∩ B].
  We count outcomes in A and in B, but that double counts outcomes in
  both, so we adjust by subtracting them off.
  What if we have three events? We get
    Pr[A ∩ B ∩ C]
      = Pr[A] + Pr[B] + Pr[C]
        - Pr[A ∩ B] - Pr[A ∩ C] - Pr[B ∩ C]
        + Pr[A ∩ B ∩ C].
  By counting outcomes in A, B, and C, we double count those that
  appear in any pair of A, B, C, so we subtract those off. However, if
  an outcome appears in all three of A, B, C, then we've added three
  copies in the first line and subtracted three copies in the second
  line, so we have to add one copy in the third line to include those
  outcomes.
  This generalizes to larger numbers of events, with alternating
  additions and subtractions. (Can you see why it is called
  inclusion/exclusion?) See the reader for the general formula.

  EX: Recall the dice game from before. You pick a number from 1 to 6.
      The casino rolls three dice, and if your number comes up, you
      win. What is your probability of winning?
  ANS: Let A be the event that your number comes up on the first die,
       B on the second, and C on the third. Then you win for outcomes
       that are in A ∪ B ∪ C. So by inclusion/exclusion,
         Pr[A ∪ B ∪ C]
           = Pr[A] + Pr[B] + Pr[C]
             - Pr[A ∩ B] - Pr[A ∩ C] - Pr[B ∩ C]
             + Pr[A ∩ B ∩ C].
       What is Pr[A]? Well, the probability that the first die has
       your number is 1/6, so Pr[A] = 1/6, and similarly, Pr[B] =
       Pr[C] = 1/6. What is Pr[A ∩ B]? The results on different
       dice are independent, so Pr[A ∩ B] = Pr[A] Pr[B] = 1/36,
       and similarly for Pr[A ∩ C] and Pr[B ∩ C]. By a similar
       argument, Pr[A ∩ B ∩ C] = 1/216. Then
         Pr[A ∪ B ∪ C]
           = 1/6 + 1/6 + 1/6 - 1/36 - 1/36 - 1/36 + 1/216
           = 1/2 - 1/12 + 1/216
           = 108/216 - 18/216 + 1/216
           = 91/216
           ≈ 0.42.
       This is the same answer as before, but it took a lot more work
       to get it.

Union Bound
  From our reasoning for the inclusion/exclusion principle, we see
  that Pr[A_1] + ... + Pr[A_n] overstates the probability of Pr[A_1
  ∪ ... ∪ A_n]. We can formalize this as the union bound:
    Pr[A_1 ∪ ... ∪ A_n] <= Pr[A_1] + ... + Pr[A_n].

  EX: Suppose for MT2, to prevent students from cheating, we place on
      each desk in the lecture hall a random number from 1 to 1000. We
      give one question that is parameterized by that number. If two
      people sitting next to each other have the same number, then
      they can copy off each other. What is the probability that any
      of the 62 students will cheat?
  ANS: Computing this exactly seems hard, so let's just compute an
       upper bound. There are at most 61 pairs of students sitting
       next to each other (think of them all sitting in one long row).
       Let A_i be the event that the ith pair has the same number.
       Then Pr[A_i] = 1/1000. Let B be the event that some pair has
       the same number, B = A_1 ∪ ... ∪ A_61. Then
         Pr[B] <= Pr[A_1] + ... + Pr[A_61]
           = 61/1000.
       So the probability of any pair sharing the same number is at
       most 6%.

Hashing
  Now that we've seen many techniques for computing probabilities, let
  us apply them to two problems of interest: hashing and coupon
  collecting.

  Recall the birthday paradox. We computed the probability that two
  people share the same birthday given 365 days and m people. We found
  that when m = 23, we have a slightly higher than even chance of two
  people sharing a birthday.

  Last week was Neptune's birthday! It was exactly one Neptunian year
  after it was first discovered in 1846. A year on Neptune is 89,666
  Neptunian days. Now how many Neptunians do we need so that we have a
  better than even chance of two of them sharing the same birthday?

  Let's redo the analysis in the general case, where we have n days
  and m individuals. How many sample points are there? There are
  |Ω| = n^m, since each individual has n days to choose from and
  there are m individuals. Each of these is assumed to be equally
  likely. Now let E be the event that no two individuals share the same
  birthday. How many outcomes are in E?

  Well, the first person has n choices of days, the second person has
  n-1 choices that are different than the first, the third person has
  n-2 choices that are different thant the first two, and so on, until
  the mth person has n-(m-1) = n-m+1 choices. Thus,
    |E| = n * (n-1) * ... * (n-m+1),
  and
    Pr[E] = |E|/|Ω|
      = n * (n-1) * ... * (n-m+1) / n^m
      = n/n * (n-1)/n * ... * (n-m+1)/n.

  We can compute Pr[E] another way using the product rule. Let E_i be
  the event that the ith person's birthday is different than those of
  persons 1, ..., i-1. Then
    Pr[E] = Pr[E_1 ∩ E_2 ∩ ... ∩ E_m]
      = Pr[E_1] *
        Pr[E_2|E_1] *
        Pr[E_3|E_1 ∩ E_2] *
        ... *
        Pr[E_m|E_1 ∩ E_2 ∩ ... ∩ E_{m-1}].
  Now we need to compute the probability
    Pr[E_i|E_1 ∩ ... ∩ E_{i-1}],
  the probability that the ith person's birthday is not the same as
  persons 1, ..., i-1 given that all those people have different
  birthdays. The ith person is left with n-(i-1) = n-i+1 choices of
  distinct days out of n days total, so
    Pr[E_i|E_1 ∩ ... ∩ E_{i-1}] = (n-i+1)/n.
  Plugging into the product rule, we get
    Pr[E] = (n-1+1)/n * (n-2+1)/n * ... * (n-m+1)/n
      = n/n * (n-1)/n * ... * (n-m+1)/n,
  as before.

  Let us rewrite (n-i)/n as (1 - i/n) to get
    Pr[E] = 1 * (1 - 1/n) * (1 - 2/n) * ... * (1 - (m-1)/n).

  Before we continue, let's look at the Taylor series for e^{-x}:
    e^{-x} = 1 - x + x^2/2! - x^3/3! + ...
  If x is small, then x^2/2! is really small, x^3/3! is ridiculously
  small, x^4/4! is ludicrously small, and so on. So we get
    e^{-x} >= 1 - x
  and if x is small, then they are very nearly equal.

  Using this approximation, we get
    Pr[E] = (1 - 1/n) * (1 - 2/n) * ... * (1 - (m-1)/n)
      <= e^{-1/n} * e^{-2/n} * ... * e^{-(m-1)/n}
      = exp(-(1/n + 2/n + ... + (m-1)/n))
      = exp(-(1 + 2 + ... + (m-1))/n)
      = exp(-(m-1)m/2n)
      ≈ exp(-m^2/2n).

  Suppose we want to know when this probability is about 1/2. Then
    Pr[E] ≈ exp(-m^2/2n) ≈ 1/2
      -m^2/2n = -ln(2)
      m^2 = 2n ln(2)
      m = sqrt(2n ln(2)) ≈ 1.18 sqrt(n).

  So when we have 1.18 sqrt(n) individuals, we have about an even
  chance that two individuals share the same birthday.

  In the case of Neptune, we plug in n = 89666 to get
    m = 1.18 sqrt(89666)
      ≈ 353.
  So we only need 353 Neptunians to make it likely that two of them
  share a birthday!

  This should make intuitive sense. When we have m people, there are
  C(m, 2) ≈ m^2/2 pairs of people, each pair of which has a 1/n
  chance of yielding a common birthday.

  What does this have to do with hashing? A hash table is a data
  structure for storing items. It it has n locations, then we use a
  hash function h(x) to map an item x to a location 0 <= h(x) < n. At
  each location, there is a linked list that stores all items that are
  mapped to that location. The longer the list, the slower basic
  operations on the hash table will be. Ideally, we want no two items
  to be mapped to the same location, i.e. no "collisions." Then the
  operations will take constant time.

  Suppose we store m items into the hash table. How large can m be so
  the the probability of a collision is less than 1/2?

  Before we calculate, let's outline some assumptions we are making:
  (1) For each item x, h(x) is uniformly random over [0, n-1], i.e.
      all n locations are equally likely.
  (2) The hash values for each item are mutually independent.

  Then this is just the birthday paradox! The n locations are our n
  days, and the m items are our m individuals, so we get
    m ≈ 1.18 sqrt(n).

  Another way to express this problem is in terms of balls and bins,
  where each location is a bin and each item is a ball. Then we are
  randomly throwing balls into bins. This abstraction is very useful
  in Computer Science.

  Finally, note that we made some approximations in the above
  analysis. In the reader, you can see a table that demonstrates that
  these approximations are very good even for small n.

Coupon Collector's Problem
  Let's analyze a somewhat different problem. Suppose a local cereal
  manufacturer places a baseball card with a random Giants player in
  each box of cereal. There are n players who appear on a card, and
  each box contains a card chosen uniformly at random and
  independently from all other boxes.

  Now I am a big fan of the Kung Fu Panda, i.e. Pablo Sandoval. I
  really want his baseball card. How many boxes of cereal do I have to
  buy to make it more than likely to get his card?

  Suppose I buy m boxes of cereal. Let E be the event that I don't get
  a Panda card, E_i be the event that the ith box doesn't have his
  card. What is Pr[E_i]? Well, there are n cards, and n-1 don't have
  the Panda, so Pr[E_i] = (n-1)/n = (1 - 1/n). Then
    Pr[E] = Pr[E_1 ∩ ... ∩ E_m]
      = Pr[E_1] ... Pr[E_m]                   (mutual independence)
      = (1 - 1/n)^m.
  Using the Taylor expansion from before,
    Pr[E] <= (exp(-1/n))^m
      = exp(-m/n).
  Setting this equal to 1/2 for an even chance of getting a Panda
  card, we get
    1/2 = exp(-m/n)
    -ln(2) = -m/n
    m = n ln(2) ≈ 0.69n.
  So if I buy 0.69n boxes, I have about an even chance of getting the
  Panda.

  Suppose I want all n players. (I like The Beard (Brian Wilson),
  Buster Posey, and the rest of the Giants as well.) Now how many
  boxes do I have to buy to have an even chance of getting all the
  players?

  Let F_j be the event that I don't get the jth player, F be the event
  that I am missing some player. Then F = F_1 ∪ ... ∪ F_n.
  Note that the F_j are not independent! Knowing that I didn't get a
  Panda card makes it more likely that I got someone else's.
  We already saw that
    Pr[F_j] <= exp(-m/n).
  Then we have
    Pr[F] = Pr[F_1 ∪ ... ∪ F_n]
      <= Pr[F_1] + ... + Pr[F_n]                     (union bound)
      <= n exp(-m/n).
  Setting this to 1/2, we get
    1/2 = n exp(-m/n)
    1/(2n) = exp(-m/n)
    -ln(2n) = -m/n
    m = n ln(2n)
  So n ln(2n) are sufficient to guarantee an even chance of getting
  all players.

  As you can see, we need many boxes to make it likely that we find
  the player we like or assemble a full collection of all players. So
  this is a great marketing ploy for the cereal manufacturer.

  Why did we do these examples above? They illustrate how the
  probability techniques we learned can be applied to solve real-world
  problems.