Administrative info Review session Sunday 7/10 5pm in 310 Soda MT1 policies - 1 cheat sheet (8.5x11, double sided) - no calculators Old exams online; see email Review So far, we have seen six counting principles: (1) Enumeration (2) Product rule (3) Sum rule (4) Isomorphism principle (5) Pigeonhole principle (6) Permutations Recall that an r-permutation of a set of n elements S is an ordered list of r items from S. There are n!/(n-r)! such lists. EX: How many anagrams are there of the word "eraser"? ANS: There are 6 letters in "eraser": 2 each of 'e' and 'r' 1 each of 'a' and 's' If we pretend that repeated letters are distinguishable, then there are 6! permutations of "eraser". But there is a 2-to-1 correspondence between the set such anagrams and the set of anagrams with identical e's and distinguishable r's. There is a further 2-to-1 correspondence between this set and the set of anagrams with identical e's and r's. So the size of the latter set is 6!/(2^2). More Counting (7) Combinations EX: How many 5-card poker hands are there? ANS: We know that there are 52!/47! 5-permutations of the set of cards. However, as in the "eraser" case above, we've overcounted, since the hand "10 J Q K A" (all spades) is the same as the hand "A K Q J 10". In fact, there are 5! orderings of this hand (permutations of the set {10, J, Q, K, A}). So there is a 5!-to-one correspondence between the set of ordered poker hands to the set of unordered poker hands, and the number of different hands is actually 52!/(47!5!). What we are doing here is "choosing" 5 cards out of the 52, i.e. constructing a subset of size 5 from a set of 52. Choosing an r-combination of items from a set of n is so common that it has its own notation: (n) (r) This is pronounced as "n choose r." We may write C(n, r) since we can't really use the proper notation in ASCII text. The formula for C(n, r) is n!/[r!(n-r)!]. (Proof by using permutation and r-to-1 correspondence, as above.) EX: There are 70 people in this class. If I ask for 5 volunteers to play Set, how many ways sets of volunteers are there? ANS: C(70, 10) = 70!/(10!60!) EX: If I flip a fair coin 100 times, how many sequences of flips contain exactly 50 heads? ANS: Choose 50 out of the 100 flips to heads, the rest tails. So C(100, 50). There are many useful identities between combinations. The simplest is C(n, r) = C(n, n-r). As for some others, perhaps you have seen Pascal's triangle: (0) (0) (1) (1) (0) (1) (2) (2) (2) (0) (1) (2) (3) (3) (3) (3) (0) (1) (2) (3) (4) (4) (4) (4) (4) (0) (1) (2) (3) (4) (5) (5) (5) (5) (5) (5) (0) (1) (2) (3) (4) (5) (6) (6) (6) (6) (6) (6) (6) (0) (1) (2) (3) (4) (5) (6) ... row sum 1 1 1 1 2 1 2 1 4 1 3 3 1 8 1 4 6 4 1 16 1 5 10 10 5 1 32 1 6 15 20 15 6 1 64 (By convention, 0! = 1). Note that to get any entry, you sum its neighbors to the top left and top right. This is due to the identity C(n+1, m+1) = C(n, m) + C(n, m+1). Also note that C(n, 0) + C(n, 1) + ... + C(n, n) = 2^n We will prove some of these identities later. (8) Stars and bars EX: A band of 2 pirates (say Johnny Depp and Orlando Bloom) have 4 indistinguishable gold coins to divide among them. How many different ways are there to split up the booty? (They're pirates, so it doesn't have to be split equally.) ANS: If Johnny Depp gets i, 0 <= i <= 4, Orlando Bloom gets 4-i. There are 5 possible values of i. EX: A band of 3 pirates (add Keira Knightley) have 4 indistinguishable gold coins to divide among them. How many different ways are there to split up the booty? ANS: This seems harder. Let's line up the coins and partition them into sets, the first one going to the Johnny Depp, the second to Orlando Bloom, and the third to Keira Knightley. We'll draw a line in the sand to separate each pirate's share from the others. Here are the possiblities: OOOO|| OOO|O| OOO||O OO|OO| OO|O|O OO||OO O|OOO| O|OO|O O|O|OO O||OOO |OOOO| |OOO|O |OO|OO |O|OOO ||OOOO So there are 15. Note that there is a 1-to-1 correspondence between splitting up the booty and 6-bit strings with exactly 2 ones. The number of the latter is C(6, 2) = 15 (i.e. choose 2 positions out of the string to be ones). So there are C(6, 2) = 15 ways to split up the booty. In general, if we want to split up k identical items (e.g. coins) into n (distinguishable) sets (e.g. one for each pirate), there are C(n+k-1, k) = C(n+k-1, n-1) ways to do so. This procedure is called "stars and bars." Maybe someone stole the coins and replaced them with starfish? Balls and Bins Framework The course reader uses a "balls and bins" framework to introduce counting. Here, we will see how to apply our counting principles to the various balls and bins examples in the course reader. The basic idea in this framework is that we are placing k balls into n bins, under various constraints. We want to know how many ways to do this if: (a) the balls are distinguishable or identical (b) a bin can contain only one ball or more than one In terms of (b), we use the term "sampling with replacement" if a bin can contain more than one ball (i.e. once we pick a bin for the first ball, we "replace" that bin in the set of allowed bins for the remaining balls). Otherwise, we are "sampling without replacement." (1) Distinguishable balls, with replacement We want to throw k balls into n bins, such that multiple balls can go into the same bin. This is just like the example of 3-digit area codes; there, we had k=3 balls (digits) to place into n=10 bins (numbers [0-9]) such that multiple digits can have the same number. By the product rule, this is just 10^3, or n^k in the general case. (2) Distinguishable balls, without replacement We want to throw k balls into n bins, such that no bin contains multiple balls. This is just like the example of 3-digit area codes with no repeated numbers; there, we had k=3 balls (digits) to place into n=10 bins (numbers [0-9]) such that no digit can have the same number. By the product rule, this is just 10*9*8 = 10!/7!, or n!/(n-k)! in the general case. Another way to think about this is in terms of a k-permutation of the n bins, which gives us the same result. (3) Identical balls, without replacement We want to throw k identical balls into n bins, such that no bin contains multiple balls. This is just like the poker hand example; there, we had k=5 identical balls (cards in a hand) that we wanted to throw into 52 bins (cards in a deck) such that no bin (card) is repeated. This was just C(52, 5), or C(n, k) in the general case. (4) Identical balls, with replacement We want to throw k identical balls into n bins, such that multiple balls can go into the same bin. This is just like the pirate treasure example; there we had k=4 identical balls (coins) that we wanted to divide among n bins (pirates). This was just C(6, 2), or C(n+k-1, n-1) = C(n+k-1, k) in the general case. Combinatorial Proofs We have seen various combinatorial identities. These can be proven by expanding out the terms and algebraic manipulation, but this can be quite tedious. Instead, we can prove them using counting arguments. We come up with a particular set of items and show that if you count the items one way, you end up with one expression, and if you count them a different way, you end up with a different expression. Since the set you counted is the same in both cases, those two expressions must be equal (assuming you didn't make a mistake!). This is called a "combinatorial proof." Let's do some examples. EX: Prove that C(n, r) = C(n, n-r). ANS: Suppose we have n items. We want to know how many ways we can choose r of them. (The set that we are counting here is the set of all r-combinations of the set of n items.) We can do so in the following ways: (a) Choose r items directly from the n. There are C(n, r) ways to do so. (b) Pick n-r items and through them away. Keep the remaining r items. There are C(n, n-r) ways to do this. Since we are counting the same thing in either procedure, we must have C(n, r) = C(n, n-r). EX: Prove that C(n+1, m+1) = C(n, m) + C(n, m+1). ANS: Suppose we have n+1 items and want to choose m+1 of them. We can (a) Choose the m+1 directly out of the n+1. (C(n+1, m+1)). (b) Decide whether or not to pick the first item. There are two cases: (1) Pick the first item. Then we have to choose m remaining items out of the remaining n items. (C(n, m)) (2) Don't pick the first item. Then we have to choose m+1 items out of the remaining n items. (C(n, m+1)) By the sum rule, there are C(n, m) + C(n, m+1) total ways. Again, these two procedures are counting the same set, so C(n+1, m+1) = C(n, m) + C(n, m+1). =========== Introduction to Probability Theory Now that we've learned to count, we turn our attention back to probability theory. Here are some statements we'd like to be able to understand: (1) The chance of getting a flush (i.e. all the cards have the same suit) in a 5-card poker hand is around 2 in 1000. (2) If you flip a fair coin 50 times, resulting in 50 heads, the probability that the 51st flip is heads is 1/2. (3) If quicksort picks a random pivot at each step, then it will sort any sequence of n numbers in O(n log n) time with hight probability. (4) With this algorithm for balancing the workload among servers, the probability that a user has to wait more than 1 minute is 2%. (5) There is a 60% chance of the "Big One" (large earthquake) hitting Northern California in the next 30 years. (6) The percentage of Californians who identify themselves as Democrats is 44.5%. In order to understand these statements, we have to know probability theory. Probability Spaces All of the above statements are made in the context of a specific "probability space." A probability space consists of the following: (1) A "random experiment," i.e. an experiment whose outcome is "random". EX: A single coin flip, drawing 5 cards from a deck of 52. (2) The set of possible outcomes or "sample points." This set is the "sample space." EX: Heads or tails, the C(52, 5) possible 5-card hands. (3) The probability of each possible outcome of the experiment. EX: 1/2 for each of heads and tails, 1/C(52, 5) for each 5-card hand EX: Experiment: A sequence of 51 flips of a fair coin. Sample space: all 2^51 possible sequences of H and T. Probabilities: 1/(2^51) for each sample point. Formally, a probability space consists of a sample space (denoted by the capital Greek letter Ω) with a probability Pr[ω] for each sample point ω. The probability Pr is actually given by a function P: Ω -> [0, 1], though we use Pr[ω] to refer to ω's probability instead of P(ω). The probability assignment must satisfy the following the following constraints: (1) ∀ω∈Ω . 0 <= Pr[ω] <= 1 Of course, we specified the range of P to enforce this. (2) ∑_{ω∈Ω} Pr[ω] = 1 In other words, the probabilities of all outcomes have to add to 1. The simplest probability space consists of a finite set Ω with a uniform probability assignment ∀ω∈Ω . Pr[ω] = 1/|Ω|. This is called the "uniform distribution." The examples we saw above all had uniform distribution. Most of the time, we are not interested in specific outcomes. For example in scenario (1) above, we want to know the probability of getting any 5-card flush, not a particular 5-card flush (which we know is 1/C(52, 5)). So we want to know the probability of a set of outcomes, i.e. a subset of the sample space. We refer to a subset of the sample space as an "event." Naturally, the probability of an event E is the sum of the probabilities of the outcomes in E: Pr[E] = ∑_{ω∈E} Pr[ω]. In the case of a uniform distribution, this simplifies to Pr[E] = |E|/|Ω|. EX: What is the probability of a flush in a 5-card poker hand? ANS: We know that this experiment has uniform distribution with |Ω| = C(52, 5). Let E be the event that the hand is a flush. There are four suits, and we can pick 5 cards out of the 13 in a suit to be a flush, so the number of outcomes in E is |E| = 4 C(13, 5). Thus, Pr[E] = 4 C(13, 5) / C(52, 5) or about 0.002.