8/1/02 Amir Kamil Topic: Quick Select, Bucket Sort, Radix Sort, Skip Lists Announcement: Quick Select: - Given an unsorted sequence of elements, find the kth value in the sorted sequence of those elements. - The naive way to do it is to sort the sequence, then choose the value at the kth position in the sorted sequence. However, given the sorts we know so far, this takes at least O(n log n) time. - A better way to do it is to use a modification of quick sort, but only recurse on the partition in which you know the element is in. Algorithm: Choose a pivot p, use quick sort algorithm to partition the elements into sets of less than p, greater than p, and equal to p. Let l = # of elements in less than set, e = # of elements in equal set g = # of elements in greater than set. If l >= k, recurse on less than set. If l + e >= k, return the (k - l)th element in the equal set. Otherwise recurse on greater than set, with k = k - l - e. Ex: Find the 3rd smallest element in the following sequence: (16 51 24 87 85 78 35 26) Choose 26 as the pivot, partition: (16 24) 26 (51 87 85 78 35) l = 2 e = 1 g = 5 Since l + e = k, return k - l = 1st element in e, or 26. Ex: Find the 5th smallest element in the same sequence. Choose 26 as the pivot, partition: (16 24) 26 (51 87 85 78 35) l = 2 e = 1 g = 5 Since l + e < k, recurse on greater than set with k = k - l - e = 2. (51 87 85 78 35) Choose 35 as pivot, partition: () 35 (51 87 85 78) l = 0 e = 1 g = 4 Since l + e < k, recurse on greater than set with k = k - l - e = 1. (51 87 85 78) Choose 78 as pivot, partition: (51) 78 (87 85) l = 1 e = 1 g = 2 Since l >= k, recurse on less than set. (51) Choose 51 as pivot, partition: () 51 () l = 0 e = 1 g = 0 Since l + e >= k, return k - l = 1st element in equal set, or 51. Bucket Sort: - Linear time sorting on some inputs. - Given n elements in the range a to b, create b - a + 1 buckets (one for each value between a and b). Place each element in the bucket corresponding to its value. Scan all buckets in order for the elements in order. Ex: Sort (9 5a 7 0 3 2 4 5b 8 5c). The elements range from 0 to 9, so create 10 buckets. ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ | 0 || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 | |_|_||_|_||_|_||_|_||_|_||_|_||_|_||_|_||_|_||_|_| | | | | | | | | | | V V V V V V V V V V Put all elements in their corresponding buckets. ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ | 0 || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 | |_|_||_|_||_|_||_|_||_|_||_|_||_|_||_|_||_|_||_|_| | | | | | | | | | | V V V V V V V V V V 0 2 3 4 5a 7 8 9 5b 5c Enumerate through the buckets to get the sorted sequence. (0 2 3 4 5a 5b 5c 7 8 9) - Analysis: Placing all elements in their corresponding buckets takes n time, and scanning the buckets takes (a - b + 1) time. If (a - b + 1) is O(n), then total time for the sort is O(n). As you can see from the above example, bucket sort is stable. Radix Sort: - Bucket sort is not very good on inputs that have a wide range, since the number of buckets required will be very large and the running time will be dominated by the time to scan those buckets. - Instead, we bucket sort on each digit, starting from the least significant digit. Ex: Sort (8567 4486 7534 456 7675 134 8546 1370 4108 5743 56 54). We bucket sort on the last digit to get the following: (1370 5743 7534 134 54 7675 4486 456 8546 56 8567 4108) Then we bucket sort on the next digit: (4108 7534 134 5743 8546 54 456 56 8567 1370 7675 4486) Again on the next digit: (54 56 4108 134 1370 456 4486 7534 8546 8567 7675 5743) Again on the final digit: (54 56 134 456 1370 4108 4486 5743 7534 7675 8546 8567) And we're done. - The reason radix sort works is that bucket sort is stable, so when we sort a given digit, all elements with the same value for that digit retain the ordering they had before. - Analysis: Bucket sorting each digit takes n time, and if d is the number of digits, radix sort takes O(d*n) time. Skip Lists: - Represented by a two dimensional linked list, with each node having up and down pointers as well as next and prev. - The bottom level of the structure has all the elements, and each subsequent level probabilistically has half the elements of the previous level. - In all levels, negative infinity is all the way on the left and positive infinity is all the way on the right. - A pointer to the skip list points to the top left element. Ex: ---> -I<----->8<----------->I ^ ^ ^ | | | v v v -I<----->8<------>16<->I ^ ^ ^ ^ | | | | v v v v -I<->4<->8<->13<->16<->I - Searching: Start at top left. If the next element to the right is larger than the element you are looking for and you're at the bottom level, the element is not in the list. Otherwise if the next element is larger, go down and recurse. If it is equal to the element you're looking for, return the element there. Otherwise go right and recurse. Ex: Find 13 in the above list. * * ---> -I<----->8<----------->I ^ ^ ^ | | | v v* v -I<----->8<------>16<->I ^ ^ ^ ^ | | | | v v* * v v -I<->4<->8<->13<->16<->I - Insertion: Find the spot for the element in the list, insert the element at that spot. Now with a probability of 50%, insert the element at the correct spot in the next level, and recurse. To find the correct spot in the next level, move backwards at the current level until you can go up. Go up, and the correct spot is just to the right of that element. Ex: Insert 11 in the above list. First we find the correct spot: * * ---> -I<----->8<----------->I ^ ^ ^ | | | v v* v -I<----->8<------>16<->I ^ ^ ^ ^ | | | | v v* v v -I<->4<->8<->13<->16<->I ^ | here Insert it there. ---> -I<----->8<---------------->I ^ ^ ^ | | | v v v -I<----->8<----------->16<->I ^ ^ ^ ^ | | | | v v v v -I<->4<->8<->11<->13<->16<->I Flip a coin, if it's heads, insert 11 at the next level. OK, we got heads, so we insert 11 at the next level. We backtrack until we can go up, go up, and insert at the right. ---> -I<----->8<---------------->I ^ ^ here ^ | | | | v v* v v -I<----->8<----------->16<->I ^ ^ ^ ^ | | | | v v* * v v -I<->4<->8<->11<->13<->16<->I Insert it there. ---> -I<----->8<---------------->I ^ ^ ^ | | | v v v -I<----->8<->11<------>16<->I ^ ^ ^ ^ ^ | | | | | v v v v v -I<->4<->8<->11<->13<->16<->I Now we flip a coin again. This time we got tails, so we stop. - Removal: Find the element in the list, if it is there. Remove the tower for that element. Ex: Remove 16 from the above list. First we find 16. * * ---> -I<----->8<---------------->I ^ ^ ^ | | | v v* * * v -I<----->8<->11<------>16<->I ^ ^ ^ ^ ^ | | | | | v v v v v -I<->4<->8<->11<->13<->16<->I Now we remove the tower for 16. So we remove 16 from the current level, then go down and recurse, until we get to the bottom. Remove 16 from the current level. ---> -I<----->8<---------------->I ^ ^ ^ | | | v v v -I<----->8<->11<----------->I ^ ^ ^ ^ | | | | | v v v v v -I<->4<->8<->11<->13<->16<->I Go down and remove 16 again. ---> -I<----->8<---------------->I ^ ^ ^ | | | v v v -I<----->8<->11<----------->I ^ ^ ^ ^ | | | | v v v v -I<->4<->8<->11<->13<------>I Since we're at the bottom we're done. - We can store keyed objects in a skip list by storing the key in multiple levels but the data only at the bottom. Then when we search, we have to go all the way down to the bottom to retrieve the data. This isn't too bad, since we won't have many levels. - The number of levels is about log n since each level has about half as many elements as the previous. Searching, insertion, and removal are all log n operations.