Administrative info PA1 due Friday! Question What is the last digit of the 2011th term in this sequence? 2, 4, 8, 16, 32, 64, ... Review Last time, we saw that x has an inverse mod m iff gcd(x, m) = 1. We then saw Euclid's algorithm for computing GCD. So now that we know how to efficiently determine if an inverse exists, we turn our attention to computing inverses. Extended GCD Let us consider a more general problem than computing an inverse. Let d = gcd(x, y). Suppose we wanted to determine integers a,b such that d = ax + by. (Note that a,b are integers, so they can be negative.) If we could do so, can we determine the inverse of x mod y? Of course, for an inverse to exist, it must be that d = 1, so we have 1 = ax + by. But this means that by definition, ax ≡ 1 (mod b), so a is x's inverse modulo y. (Remember, though, that a can be negative, so we may need to convert it into the proper range [0, y-1].) Thus, if we can solve the above problem in general, we can immediately determine the inverse of x modulo y. Recall Euclid's algorithm: gcd(x, y): if y = 0 then: return x else: return gcd(y, mod(x, y)) It turns out that we can actually modify Euclid's algorithm to compute a and b. egcd(x, y): if y = 0 then: return (x, 1, 0) else: (d, a, b) := egcd(y, mod(x, y)) return (d, b, a - floor(x/y) * b) It should be clear that the modified algorithm correctly computes d = gcd(x, y), since the computation of d is the same as in the vanilla gcd algorithm. As for whether or not it computes a,b correctly, see the course reader for a proof. Let's run through an example: egcd(35, 12) egcd(12, 11) egcd(11, 1) egcd(1, 0) return (1, 1, 0) return (1, 0, 1 - floor(11/1) * 0) = (1, 0, 1) return (1, 1, 0 - floor(12/11) * 1) = (1, 1, -1) return (1, -1, 1 - floor(35/12) * -1) = (1, -1, 3) So we get a = -1 ≡ 11 (mod 12), so the inverse of 35 is 11 (mod 12). (Of course, we already knew this; 35 ≡ 11 (mod 12), and m-1 is always its own inverse mod m.) So in one execution of the algorithm, we not only determine whether or not x has in inverse mod y (i.e. d = 1), we compute the inverse as well. Now that we have inverses, we can divide in modular arithmetic, by computing the inverse of the divisor (if it exists) and multiplying by that instead. Ex: Solve 4x ≡ 3 (mod 7) We can compute 4^{-1} ≡ 2 (mod 7), so we multiply both sides by 2 to get x ≡ 6 (mod 7). Note that we can't divide 3 by 4 in the integers, but we can here. Ex: Solve 4x ≡ 3 (mod 8) There is no inverse of 4 mod 8, so we can't solve this by computing an inverse. In fact, we can show that there is no solution. Ex: Solve 4x ≡ 4 (mod 8) It is tempting to say there is no solution, since 4 has no inverse, but we can see that clearly x ≡ 1 (mod 8) is a solution. In fact, so are x ≡ 3, 5, 7. So in this case, there is more than one solution. So we see that if gcd(a, m) = 1, then ax ≡ b (mod m) has a unique solution, but if not, then it may have no solutions or multiple solutions. Encryption We have completed our specification for modular arithmetic, but what is it good for? Absolutely nothing? No! We will now see one of the most beautiful applications of modular arithmetic. Suppose two people, say Alice and Bob, want to communicate securely (e.g. you want to by something from amazon.com). They want to be able to send messages to each other so that they can understand them, but no one else can (e.g. you don't want anyone else to see your credit card information when you send it to amazon). How can they do so? Here is one possible scheme. Alice and Bob go out for coffee one day and agree on a large number K and keep it secret. Thereafter, if Alice wants to send Bob a secure message M (we assume it's just some number, since anything can be represented by a number), she encodes it as follows: C = M ⊕ K, where ⊕ is the bit-wise xor operation. Once Bob receives it, he decodes it using the same procedure: M = C ⊕ K Does this work? Is it always the case that M = (M ⊕ K) ⊕ K? Let's look at one bit, since xor is a bitwise operation. We require that (0 ⊕ 0) ⊕ 0 = 0 ⊕ 0 = 0 (0 ⊕ 1) ⊕ 1 = 1 ⊕ 1 = 0 (1 ⊕ 0) ⊕ 0 = 1 ⊕ 0 = 1 (1 ⊕ 1) ⊕ 1 = 0 ⊕ 1 = 1 So the scheme works. And if someone else, say Eve, intercepts the encoded message C, she cannot decode it without knowing what K is. What's the problem with this scheme? - Alice and Bob had to meet to figure out the secret key K. How would you meet with amazon.com? - If n people all want to communicate with each other, we require a lot of keys, on the order of n^2 [n(n-1)/2]. Any scheme that requires a shared secret between Alice and Bob will have the above flaws. We want a better scheme. Let's see if modular arithmetic can help. Answer to Question The sequence repeats with period 4. So the last digit of the 2011th term is the same as the last digit of the mod(2011, 4) = 3rd term, which is 8. We just determined the value of mod(2^2011, 10). We did so by arguing that the period of powers of 2 mod 10 is 4. Let's consider a more general question. What is the period of powers of 2 mod an arbitrary n? Let's look at what happens when n is prime relatively prime to 2, i.e. gcd(n, 2) = 1. Let f(n) be the period of the powers of 2 mod n. Then we have: n 3 5 7 9 11 13 15 f(n) 2 4 3 6 10 12 4 We see that f(n) | n-1 if n is prime. In other words, we have 2^{p-1} ≡ 1 (mod p) for any prime p. Can we generalize this? Fermat's Little Theorem We start by proving the following claim: If p is prime and gcd(a, p) = 1, then a^{p-1} ≡ 1 (mod p). Recall last time that we proved that the 0a, 1a, ..., (p-1)a have distinct values mod p if gcd(a, p) = 1. We know that 0a ≡ 0 (mod p), so 1a, ..., (p-1)a go through the values 1, ..., p-1 exactly once. This implies that 1a * 2a * ... * (p-1)a ≡ 1 * 2 * ... * (p-1) (mod p), since every value in the RHS occurs exactly once in the LHS. Now we know that 1, 2, ..., p-1 all have inverses mod p; since p is prime, the GCD of any of them with p is 1. So let's multiply each side by 1^{-1} * 2^{-1} * ... * (p-1)^{-1} to get a * a * ... * a ≡ 1 * 1 * ... * 1 (mod p) a^{p-1} ≡ 1 (mod p), since there are p-1 terms in the LHS. This completes our proof. Corollary: For any prime p, any integers a,b we have a^{1+b(p-1)} ≡ 0 (mod p). Proof: Case 1: a ≡ 0 (mod p). Obviously true. Case 2: a ≠ 0 (mod p). Then gcd(a, p) = 1 since p is prime. We have a^{1+b(p-1)} ≡ a * (a^{p-1})^b (mod p) ≡ a * 1^b (mod p) by FLT ≡ a (mod p). Corollary: For any two different primes p,q and any a,k we have a^{1+k(p-1)(q-1)} ≡ a (mod pq). Proof: We know from the above that a^{1+k(p-1)(q-1)} ≡ a (mod p) [by taking b = k(q-1)] and a^{1+k(p-1)(q-1)} ≡ a (mod q) [by taking b = k(p-1)] Thus, a^{1+k(p-1)(q-1)} - a is a multiple of both p and q. But since p ≠ q, it must also be a multiple of pq. Then by definition, a^{1+k(p-1)(q-1)} ≡ a (mod pq). RSA We have now seen a lot of math. Let's put it to use to develop an encryption scheme. Suppose Alice again wants to send a secure message to Bob. Now, instead of meeting up beforehand, Bob picks two large primes p,q. He then picks some positive integer e such that gcd(e, (p-1)(q-1)) = 1. He can then compute d ≡ e^{-1} (mod (p-1)(q-1)). He publishes the values N = pq and e (but not p and q!) for all the world to see. He keeps d to himself. Ex: Let p = 5, q = 11, e = 3. What is d? d ≡ e^{-1} (mod 40) ≡ 27 (mod 40) Now Alice looks up Bob on his website and finds that his public key is (e, N). So to send him a message x, she computes C ≡ x^e (mod N) and sends him C. When Bob receives C, he computes M ≡ C^d ≡ x^{ed} (mod N). Is it guaranteed that M ≡ x (mod N)? Let's see. We know that de = 1 (mod (p-1)(q-1)), so de = 1 + k(p-1)(q-1) for some integer k. Then we proved above that x^{1 + k(p-1)(q-1)} ≡ x (mod pq=N) for all x,k, which implies that M ≡ x^{ed} ≡ x (mod N). So we know that our encyrption scheme works, in the sense that Bob can recover the original message. Of course, we want our scheme to be secure, so that a third party Eve cannot recover the message. Is this the case? Let's see what Eve knows. She knows (e, N), since it is public. She might also know C ≡ x^e (mod N), if she was eavesdropping on the communication. Can she recover x? Well, she could try guessing x and then compute x^e (mod N) to see if it equals C. But this is unrealistic if N is a large number, such as the 1024 or 2048 bit numbers common in RSA. Or she could try to replicate Bob's decryption procedure. But then she would need to determine d from just (e, N). If she knew what p and q were, then she could use the extended GCD algorithm to compute e^{-1} mod (p-1)(q-1). But p and q aren't public, so she needs to be able to factor N in order to determine p and q. It turns out that no one knows how to do so efficiently. [Except with quantum computers, which don't exist on the scale necessary to factor a large number.] So that's not an option either. So we know that Eve is out of luck. But can Alice and Bob perform their computations efficiently? Alice performs exponentiation mod N. You saw how to do so using repeated squaring in discussion section. (See the course reader if you didn't.) The number of powers that need to be computed is on the order of log N, each of which requires a single multiplication of two log N bit numbers to compute. Such a multiplication can be done in O((log N)^2), so the total running time is in O((log N)^3). What about Bob? Besides exponentiation, he must find two large primes p and q, a third number e that is relatively prime to p and q, and an inverse. We already know that the last step can be done efficiently. Most values of e work, so the second step isn't a problem either. What about finding two large primes? It turns out that not only are there infinitely many primes, but their density is actually quite high. For 1024 bit numbers, around one in every 710 of them is prime. So Bob can just choose a random 1024 bit number, test if is prime (which can be done efficiently), and try again if it isn't.