Administrative info
  Review session Sunday 7/10 5pm in 310 Soda
  MT1 policies
  - 1 cheat sheet (8.5x11, double sided)
  - no calculators
  Old exams online; see email

Review
  So far, we have seen six counting principles:
    (1) Enumeration
    (2) Product rule
    (3) Sum rule
    (4) Isomorphism principle
    (5) Pigeonhole principle
    (6) Permutations
  Recall that an r-permutation of a set of n elements S is an ordered
  list of r items from S. There are n!/(n-r)! such lists.
  EX: How many anagrams are there of the word "eraser"?
  ANS: There are 6 letters in "eraser":
         2 each of 'e' and 'r'
         1 each of 'a' and 's'
       If we pretend that repeated letters are distinguishable, then
       there are 6! permutations of "eraser". But there is a 2-to-1
       correspondence between the set such anagrams and the set of
       anagrams with identical e's and distinguishable r's. There is a
       further 2-to-1 correspondence between this set and the set of
       anagrams with identical e's and r's. So the size of the latter
       set is 6!/(2^2).

More Counting
  (7) Combinations
      EX: How many 5-card poker hands are there?
      ANS: We know that there are 52!/47! 5-permutations of the set of
           cards. However, as in the "eraser" case above, we've
           overcounted, since the hand "10 J Q K A" (all spades) is
           the same as the hand "A K Q J 10". In fact, there are 5!
           orderings of this hand (permutations of the set {10, J, Q,
           K, A}). So there is a 5!-to-one correspondence between the
           set of ordered poker hands to the set of unordered poker
           hands, and the number of different hands is actually
           52!/(47!5!).
      What we are doing here is "choosing" 5 cards out of the 52, i.e.
      constructing a subset of size 5 from a set of 52. Choosing an
      r-combination of items from a set of n is so common that it has
      its own notation:
        (n)
        (r)
      This is pronounced as "n choose r." We may write C(n, r) since
      we can't really use the proper notation in ASCII text. The
      formula for C(n, r) is n!/[r!(n-r)!]. (Proof by using
      permutation and r-to-1 correspondence, as above.)
      EX: There are 70 people in this class. If I ask for 5 volunteers
          to play Set, how many ways sets of volunteers are there?
      ANS: C(70, 10) = 70!/(10!60!)
      EX: If I flip a fair coin 100 times, how many sequences of flips
          contain exactly 50 heads?
      ANS: Choose 50 out of the 100 flips to heads, the rest tails. So
           C(100, 50).

      There are many useful identities between combinations. The
      simplest is
        C(n, r) = C(n, n-r).
      As for some others, perhaps you have seen Pascal's triangle:

                               (0)
                               (0)
                           (1)     (1)
                           (0)     (1)
                       (2)     (2)     (2)
                       (0)     (1)     (2)
                   (3)     (3)     (3)     (3)
                   (0)     (1)     (2)     (3)
               (4)     (4)     (4)     (4)     (4)
               (0)     (1)     (2)     (3)     (4)
           (5)     (5)     (5)     (5)     (5)     (5)
           (0)     (1)     (2)     (3)     (4)     (5)
       (6)     (6)     (6)     (6)     (6)     (6)     (6)
       (0)     (1)     (2)     (3)     (4)     (5)     (6)
                                     ...
                                                              row sum
                                1                                  1
                            1       1                              2
                        1       2       1                          4
                    1       3       3       1                      8
                1       4       6       4       1                  16
            1       5      10      10       5       1              32
        1       6      15      20      15       6       1          64

      (By convention, 0! = 1).

      Note that to get any entry, you sum its neighbors to the top
      left and top right. This is due to the identity
        C(n+1, m+1) = C(n, m) + C(n, m+1).
      Also note that
        C(n, 0) + C(n, 1) + ... + C(n, n) = 2^n
      We will prove some of these identities later.

  (8) Stars and bars
      EX: A band of 2 pirates (say Johnny Depp and Orlando Bloom) have
          4 indistinguishable gold coins to divide among them. How
          many different ways are there to split up the booty?
          (They're pirates, so it doesn't have to be split equally.)
      ANS: If Johnny Depp gets i, 0 <= i <= 4, Orlando Bloom gets 4-i.
           There are 5 possible values of i.
      EX: A band of 3 pirates (add Keira Knightley) have 4
          indistinguishable gold coins to divide among them. How many
          different ways are there to split up the booty?
      ANS: This seems harder. Let's line up the coins and partition
           them into sets, the first one going to the Johnny Depp, the
           second to Orlando Bloom, and the third to Keira Knightley.
           We'll draw a line in the sand to separate each pirate's
           share from the others. Here are the possiblities:
             OOOO||
             OOO|O|  OOO||O
             OO|OO|  OO|O|O  OO||OO
             O|OOO|  O|OO|O  O|O|OO  O||OOO
             |OOOO|  |OOO|O  |OO|OO  |O|OOO  ||OOOO
           So there are 15.
           Note that there is a 1-to-1 correspondence between
           splitting up the booty and 6-bit strings with exactly 2
           ones. The number of the latter is C(6, 2) = 15 (i.e. choose
           2 positions out of the string to be ones). So there are
           C(6, 2) = 15 ways to split up the booty.
      In general, if we want to split up k identical items (e.g.
      coins) into n (distinguishable) sets (e.g. one for each pirate),
      there are C(n+k-1, k) = C(n+k-1, n-1) ways to do so. This
      procedure is called "stars and bars." Maybe someone stole the
      coins and replaced them with starfish?

Balls and Bins Framework
  The course reader uses a "balls and bins" framework to introduce
  counting. Here, we will see how to apply our counting principles to
  the various balls and bins examples in the course reader.

  The basic idea in this framework is that we are placing k balls into
  n bins, under various constraints. We want to know how many ways to
  do this if:
    (a) the balls are distinguishable or identical
    (b) a bin can contain only one ball or more than one
  In terms of (b), we use the term "sampling with replacement" if a
  bin can contain more than one ball (i.e. once we pick a bin for the
  first ball, we "replace" that bin in the set of allowed bins for the
  remaining balls). Otherwise, we are "sampling without replacement."

  (1) Distinguishable balls, with replacement
      We want to throw k balls into n bins, such that multiple balls
      can go into the same bin.
      This is just like the example of 3-digit area codes; there, we
      had k=3 balls (digits) to place into n=10 bins (numbers [0-9])
      such that multiple digits can have the same number. By the
      product rule, this is just 10^3, or n^k in the general case.

  (2) Distinguishable balls, without replacement
      We want to throw k balls into n bins, such that no bin contains
      multiple balls.
      This is just like the example of 3-digit area codes with no
      repeated numbers; there, we had k=3 balls (digits) to place into
      n=10 bins (numbers [0-9]) such that no digit can have the same
      number. By the product rule, this is just 10*9*8 = 10!/7!, or
      n!/(n-k)! in the general case. Another way to think about this
      is in terms of a k-permutation of the n bins, which gives us the
      same result.

  (3) Identical balls, without replacement
      We want to throw k identical balls into n bins, such that no bin
      contains multiple balls.
      This is just like the poker hand example; there, we had k=5
      identical balls (cards in a hand) that we wanted to throw into
      52 bins (cards in a deck) such that no bin (card) is repeated.
      This was just C(52, 5), or C(n, k) in the general case.

  (4) Identical balls, with replacement
      We want to throw k identical balls into n bins, such that
      multiple balls can go into the same bin.
      This is just like the pirate treasure example; there we had k=4
      identical balls (coins) that we wanted to divide among n bins
      (pirates). This was just C(6, 2), or C(n+k-1, n-1) = C(n+k-1, k)
      in the general case.

Combinatorial Proofs
  We have seen various combinatorial identities. These can be proven
  by expanding out the terms and algebraic manipulation, but this can
  be quite tedious. Instead, we can prove them using counting
  arguments. We come up with a particular set of items and show that
  if you count the items one way, you end up with one expression, and
  if you count them a different way, you end up with a different
  expression. Since the set you counted is the same in both cases,
  those two expressions must be equal (assuming you didn't make a
  mistake!). This is called a "combinatorial proof."

  Let's do some examples.
  EX: Prove that C(n, r) = C(n, n-r).
  ANS: Suppose we have n items. We want to know how many ways we can
       choose r of them. (The set that we are counting here is the set
       of all r-combinations of the set of n items.) We can do so in
       the following ways:
       (a) Choose r items directly from the n. There are C(n, r) ways
           to do so.
       (b) Pick n-r items and through them away. Keep the remaining r
           items. There are C(n, n-r) ways to do this.
       Since we are counting the same thing in either procedure, we
       must have C(n, r) = C(n, n-r).
  EX: Prove that C(n+1, m+1) = C(n, m) + C(n, m+1).
  ANS: Suppose we have n+1 items and want to choose m+1 of them. We
       can
       (a) Choose the m+1 directly out of the n+1. (C(n+1, m+1)).
       (b) Decide whether or not to pick the first item. There are
           two cases:
           (1) Pick the first item. Then we have to choose m remaining
               items out of the remaining n items. (C(n, m))
           (2) Don't pick the first item. Then we have to choose m+1
               items out of the remaining n items. (C(n, m+1))
           By the sum rule, there are C(n, m) + C(n, m+1) total ways.
       Again, these two procedures are counting the same set, so
       C(n+1, m+1) = C(n, m) + C(n, m+1).

===========

Introduction to Probability Theory
  Now that we've learned to count, we turn our attention back to
  probability theory. Here are some statements we'd like to be able to
  understand:
  (1) The chance of getting a flush (i.e. all the cards have the same
      suit) in a 5-card poker hand is around 2 in 1000.
  (2) If you flip a fair coin 50 times, resulting in 50 heads, the
      probability that the 51st flip is heads is 1/2.
  (3) If quicksort picks a random pivot at each step, then it will
      sort any sequence of n numbers in O(n log n) time with hight
      probability.
  (4) With this algorithm for balancing the workload among servers,
      the probability that a user has to wait more than 1 minute is
      2%.
  (5) There is a 60% chance of the "Big One" (large earthquake)
      hitting Northern California in the next 30 years.
  (6) The percentage of Californians who identify themselves as
      Democrats is 44.5%.
  In order to understand these statements, we have to know probability
  theory.

  Probability Spaces
    All of the above statements are made in the context of a specific
    "probability space." A probability space consists of the
    following:
    (1) A "random experiment," i.e. an experiment whose outcome is
        "random".
        EX: A single coin flip, drawing 5 cards from a deck of 52.
    (2) The set of possible outcomes or "sample points." This set is
        the "sample space."
        EX: Heads or tails, the C(52, 5) possible 5-card hands.
    (3) The probability of each possible outcome of the experiment.
        EX: 1/2 for each of heads and tails, 1/C(52, 5) for each
            5-card hand
        EX: Experiment: A sequence of 51 flips of a fair coin. Sample
            space: all 2^51 possible sequences of H and T.
            Probabilities: 1/(2^51) for each sample point.

    Formally, a probability space consists of a sample space (denoted
    by the capital Greek letter Ω) with a probability
    Pr[ω] for each sample point ω.
    The probability Pr is actually given by a function
      P: Ω -> [0, 1],
    though we use Pr[ω] to refer to ω's probability
    instead of P(ω).
    The probability assignment must satisfy the following the
    following constraints:
    (1) ∀ω∈Ω . 0 <= Pr[ω] <= 1
        Of course, we specified the range of P to enforce this.
    (2) ∑_{ω∈Ω} Pr[ω] = 1
        In other words, the probabilities of all outcomes have to add
        to 1.

    The simplest probability space consists of a finite set Ω
    with a uniform probability assignment
      ∀ω∈Ω . Pr[ω] = 1/|Ω|.
    This is called the "uniform distribution." The examples we saw
    above all had uniform distribution.

    Most of the time, we are not interested in specific outcomes. For
    example in scenario (1) above, we want to know the probability of
    getting any 5-card flush, not a particular 5-card flush (which we
    know is 1/C(52, 5)). So we want to know the probability of a set
    of outcomes, i.e. a subset of the sample space. We refer to a
    subset of the sample space as an "event." Naturally, the
    probability of an event E is the sum of the probabilities of the
    outcomes in E:
      Pr[E] = ∑_{ω∈E} Pr[ω].
    In the case of a uniform distribution, this simplifies to
      Pr[E] = |E|/|Ω|.
    EX: What is the probability of a flush in a 5-card poker hand?
    ANS: We know that this experiment has uniform distribution with
         |Ω| = C(52, 5). Let E be the event that the hand is a
         flush. There are four suits, and we can pick 5 cards out of
         the 13 in a suit to be a flush, so the number of outcomes in
         E is |E| = 4 C(13, 5). Thus, Pr[E] = 4 C(13, 5) / C(52, 5)
         or about 0.002.