7/16/02
Amir Kamil

Topic: BSTs, Heaps, Algorithm Analysis, Huffman Encoding

Binary Trees and BSTs:
- We'll look at BSTs that hold integers as values, and use their values 
  as keys. We will also ignore duplicate keys.
- Every key to the left of a node must be less than that node's key, 
  every key to the right of a node must be greater than that node's 
  key.
- Insertion into BST:
  The algorithm is very simple. Start at the root, if the key to insert 
  is less than the root's key, recurse on left subtree. Otherwise 
  recurse on right subtree. When you reach null, add the new node 
  there.

  Ex:
  Insert 6 into the following tree:
          4
        /   \
       3     8
      /     / \
     1     5  11
  Run the algorithm, with the current node denoted by *.
          4* 
        /   \
       3     8
      /     / \
     1     5  11
  Since 6 > 4, recurse on right subtree.
          4 
        /   \
       3     8*
      /     / \
     1     5  11
  Since 6 < 8, recurse on left subtree.
          4 
        /   \
       3     8
      /     / \
     1     5* 11
  Since 6 > 5, recurse on right subtree.
          4 
        /   \
       3     8
      /     / \
     1     5  11
            \
             *
  Now we have reached null, so we add 6 at the current spot.
          4 
        /   \
       3     8
      /     / \
     1     5  11
            \
             6

- Searching in a BST is the same as insertion, except you stop when you 
  find the node you're looking for (it has a key equal to the one you 
  want). If you reach null, then the key is not in the BST.
- Removal from BST:
  There are three cases: the node to be removed has no children, it has 
  one child, or it has two children.

  Case 1: the node has no children.
  Just replace the node with null. 
  
  Ex:
  Remove 11 from the above tree results in:
          4 
        /   \
       3     8
      /     / 
     1     5  
            \
             6

  Case 2: the node has one child.
  Replace the node with its child.

  Ex:
  Remove 8 from the above tree results in:
          4 
        /   \
       3     5
      /       \ 
     1         6

  Case 3: the node has two children.
  Replace the node with the leftmost descendent of its right child.

  Ex.
  Remove 4 from:
          4 
        /   \
       3     8
      /     / \
     1     5  11
            \
             6
  The leftmost descendent of 4's right child is 5. So we replace 4 with 
  5, and move 6 up to where 5 was:
          5 
        /   \
       3     8
      /     / \
     1     6  11
  Now if we remove 8, we get:
          5 
        /   \
       3    11
      /     / 
     1     6  

- Balance:
  A binary tree is balanced if the difference in height between its 
  subtrees is at most 1, and its subtrees are balanced.

        4 
      /   \
     3     5
    /       \ 
   1         6    balanced
        4 
      /   \
     3     5
    /       \ 
   2         6
  /           \
 1             7  not balanced
        6 
      /   \
     3     7
    / \     \ 
   2   5     8
  /
 1                balanced

  Algorithm:
  isBalanced(node)
    |height(node.left) - height(node.right)| <= 1
    and isBalanced(node.left) and isBalanced(node.right)

- Completeness:
A binary tree is complete if all levels except the last are full, and the nodes in the last level are all the way to the left.

        4 
      /   \
     3     5
    /       \ 
   1         6    not complete
        5 
      /   \
     2     6
    / \   /  
   1   3 7        complete
        6 
      /   \
     3     7
    / \     \ 
   2   5     8
  /
 1                not complete

  Algorithms for computing completeness involve a breadth-first search 
  on the tree. The one I wrote for HW3 used the fact that a complete 
  binary tree could be represented compactly using an array, with no 
  empty cells between elements:
    Convert tree into array.
    If there are gaps between elements, tree is not complete.
    Else tree is complete.


Heaps:
- Heaps are of type min-heap or max-heap. In a min-heap, each node in 
  the tree has priority less than each of its children. In a max-heap, 
  each node in the tree has priority greater than each of its children.
- We will use a max-heap of ints.
- A heap is implemented using a complete binary tree, so we can 
  represent it using an array. For a node located at position n, its 
  children are at 2n+1 and 2n+2.

  Ex:
          7 
        /   \
       3     6
      / \   /  
     1   2 5  
  The root is at position 0.
   ---  ---  ---  ---  ---  ---  ---  ---
  | 7 || 3 || 6 || 1 || 2 || 5 ||   ||   |
   ---  ---  ---  ---  ---  ---  ---  ---

- Insertion into heap:
  We insert at the bottom of the heap, then reheapify up. We reheapify 
  up by comparing the new item with its parent, if the new item is 
  larger, we swap it with its parent and recurse, otherwise we stop.

  Ex:
  Add 8 to the above heap.
  We stick it at the bottom.
          7 
        /   \
       3     6
      / \   / \ 
     1   2 5   8
  Since 8 > 6, we swap them.
          7 
        /   \
       3     8
      / \   / \ 
     1   2 5   6
  Since 8 > 7, we swap them.
          8 
        /   \
       3     7
      / \   / \ 
     1   2 5   6
  Now we're done.
  The array representation:
   ---  ---  ---  ---  ---  ---  ---  ---
  | 8 || 3 || 7 || 1 || 2 || 5 || 6 ||   |
   ---  ---  ---  ---  ---  ---  ---  ---
  Now let's add 4 to the heap. We'll work directly on the array this 
  time.
   ---  ---  ---  ---  ---  ---  ---  ---
  | 8 || 3 || 7 || 1 || 2 || 5 || 6 || 4 |
   ---  ---  ---  ---  ---  ---  ---  ---
  The parent of a node at position k is given by floor((k-1)/2). Since 
  4 is at position 7, its parent is at floor((7-1)/2) = 3. Since 4 > 1, 
  we swap them.
   ---  ---  ---  ---  ---  ---  ---  ---
  | 8 || 3 || 7 || 4 || 2 || 5 || 6 || 1 |
   ---  ---  ---  ---  ---  ---  ---  ---
  Now 4 is at position 3, so its parent is at floor((3-1)/2) = 1. Since 
  4 > 3, we swap them.
   ---  ---  ---  ---  ---  ---  ---  ---
  | 8 || 4 || 7 || 3 || 2 || 5 || 6 || 1 |
   ---  ---  ---  ---  ---  ---  ---  ---
  Now 4 is at position 1, so its parent is at floor((1-1)/2) = 0. Since 
  4 < 8, we're done.

- Removing from the heap:
  We only need bother with removing the top of the heap, since that's 
  why we're using a heap in the first place. So we remove the top item 
  in the heap, replacing it with the bottom-most item in the heap (the 
  one all the way at the end in the array representation). Then we 
  reheapify down. We compare the new top item with its children, swap 
  it with its largest child if that child is larger than the item, and 
  recurse. Otherwise we stop.

  Ex:
  Remove 8 from the above heap.
  We remove 8, replace it by 1.
   ---  ---  ---  ---  ---  ---  ---  ---
  | 1 || 4 || 7 || 3 || 2 || 5 || 6 ||   |
   ---  ---  ---  ---  ---  ---  ---  ---
  Now we compare 1 to its children, 4 and 7. 7 is the largest, so we 
  swap 1 and 7.
   ---  ---  ---  ---  ---  ---  ---  ---
  | 7 || 4 || 1 || 3 || 2 || 5 || 6 ||   |
   ---  ---  ---  ---  ---  ---  ---  ---
  Again we compare 1 to its children, 5 and 6. 6 is the largest, so we 
  swap 1 and 6.
   ---  ---  ---  ---  ---  ---  ---  ---
  | 7 || 4 || 6 || 3 || 2 || 5 || 1 ||   |
   ---  ---  ---  ---  ---  ---  ---  ---
  We're done.


Algorithm Analysis:
- General algorithm for algorithm analysis:
  Let A be the expression you want to analyze.
  Then A is composed of subexpressions A1, A2, A3, ... that each
  execute c1, c2, c3, ... times. Then

    O[A] = c1 * O[A1] + c2 * O[A2] + c3 * O[A3] + ...
    
  where O[X] represents the running time of X.

Ex:
	int increment(int n) {
	  return n + 1;
	}

  This function has one subexpression that takes constant time to execute
  and executes only once. So c1 = 1, O[A1] = c, and O[increment] = 1 *
  c = c, and increment is constant in n.

Ex:
	int factorial(int n) {
	  for (int i = n; i > 0; i++) {
	    n *= i;
	  }
	  return n;
	}

  This function has a subexpression (n *= i) that takes constant time
  to execute, and this subexpression is executed n times. So c1 = n,
  O[A1] = c, and O[factorial] = n * c = cn. So factorial is linear in
  n.

Ex:
	int foo(int n) {
	  int x;
	  for (int i = 0; i < n; i++) {
	    x += i;
	  }
	  for (int i = 1; i < n/2; i++) {
	    x *= i;
	  }
	  return x;
	}

  In this case, c1 = n, O[A1] = c, c2 = n/2, O[A2] = c. Then O[foo] =
  n * c + n/2 * c = 1.5cn. So foo is linear in n. Remember that
  constant factors don't count in determining asymptotic behavior.

Ex:
	int bar(int n) {
	  int x;
	  for (int i = 0; i < n; i++) {
	    for (int j = i; j < n; j++) {
	      x += 1;
	    }
	  }
	  return x;
	}

  This function has one subexpression, the inner loop, that executes n
  times. What is the running time of the subexpression? The
  subexpression has one subexpression of its own that executes n - i
  times and is constant. So the running time of the inner loop is 
  c * (n - i). Now the problem with determining the running time of
  the function is that i varies. But we can make estimates, as long as
  the estimates are greater than the actual value, so let's assume
  that the running time of the inner loop is cn. Now the inner loop
  executes n times, so the total running time is n * cn = cn^2. So the
  function is quadratic in n.

- Running time of recursive algorithms:
  It is much harder to use the above algorithm on recursive functions,
  since you don't know the running time of the recursive call. So
  instead, start at the base case and work your way up until you can
  figure out the running time.

Ex:
	int factorial2(int n) {
	  if (n == 0) {
	    return 1;
	  } else {
	    return n * factorial2(n - 1);
	  }
	}

  The base case, n == 0, is constant. Now each recursion of this
  function is constant except for the recursive call, since there is
  only a multiplication and a subtraction. So if you recurse n times,
  you take n * c = cn time to compute the function.

  General method:
  a) Figure out how much work is done in each recursion, excepting the
  recursive call.
  b) Figure out how many recursions are needed.
  c) Running time = # recursions * time/recursion


Huffman Trees:
- Good explanation at

  http://www.crs4.it/~luigi/MPEG/huffman_tutorial.html
  
  The link is on my website.