8/8/02
Amir Kamil

Topic: Minimax, B+ Trees, Threaded Trees

Announcement:


Minimax:
- You've seen the algorithm, since you did the lab on it
  yesterday. Forget the algorithm for now, let's try to come up with
  an intuitive way of choosing our next move. 
- Let's try all possible moves, then recursively try all possible
  subsequent moves, until the game is over. We will keep track of who
  made each move.

  X = computer, C = computer move
  O = human, H = human move

  starting board:    XXO  
             ------- XO  -------
            /         O         \
         C /        C |          \ C
          v           v           v
        XXO          XXO          XXO
        XOX          XO           XO
         O           XO            OX
      H / \ H                   H / \ H
       v   v                     v   v
     XXO   XXO                 XXO   XXO
     XOX   XOX                 XO    XOO
      OO   OO                  OOX    OX
    C |                               | C
      v                               v
     XXO                             XXO
     XOX                             XOO
     XOO                             XOX

  Now we score all the final boards, HUMAN_WIN, COMP_WIN, or DRAW.

  starting board:    XXO  
             ------- XO  -------
            /         O         \
         C /        C |          \ C
          v           v           v
        XXO          XXO          XXO
        XOX    CW >> XO           XO
         O           XO            OX
      H / \ H                   H / \ H
       v   v                     v   v
     XXO   XXO                 XXO   XXO
     XOX   XOX << HW     HW >> XO    XOO
      OO   OO                  OOX    OX
    C |                               | C
      v                               v
     XXO << CW                 CW >> XXO
     XOX                             XOO
     XOO                             XOX

  Of course, in this game, there can be no draw, so no boards are
  scored as such.

  Now, at each move, we assume that the person who is moving is
  brilliant and will pick the best move for him. So if it is the
  human's turn, he will pick a move that will result in a win for him,
  if that is impossible then a draw, and if that is impossible, then
  he has no choice but to pick a losing move. The premise is that each
  player plays perfectly, so if a player can force a win/draw from a
  subboard of the current board, the current board might as well be
  the subboard. So we give the current board the best value of any of
  its subboards.

  starting board:    XXO  
             ------- XO  -------
            /         O         \
         C /        C |          \ C
          v           v           v
        XXO          XXO          XXO
        XOX    CW >> XO           XO
         O           XO            OX
      H / \ H                   H / \ H
       v   v                     v   v
     XXO   XXO                 XXO   XXO
  1) XOX   XOX << HW     HW >> XO    XOO (2
      OO   OO                  OOX    OX
    C |                               | C
      v                               v
     XXO << CW                 CW >> XXO
     XOX                             XOO
     XOO                             XOX

  For boards 1 and 2, there is only one possible move for each, so the
  score becomes the same as the score of the next board.

  starting board:    XXO  
             ------- XO  -------
            /         O         \
         C /        C |          \ C
          v           v           v
        XXO          XXO          XXO
     3) XOX    CW >> XO           XO  (4
         O           XO            OX
      H / \ H                   H / \ H
       v   v                     v   v
     XXO   XXO                 XXO   XXO
  CW XOX   XOX << HW     HW >> XO    XOO CW
      OO   OO                  OOX    OX
    C |                               | C
      v                               v
     XXO << CW                 CW >> XXO
     XOX                             XOO
     XOO                             XOX

  Now for boards 3 and 4, since it is the human's turn, we choose the
  move that is best for the human, and score the current boards the
  same as the best next board for him.

  starting board:    XXO  
             ------- XO  -------
            /         O         \
         C /        C |          \ C
          v           v           v
        XXO          XXO          XXO
     HW XOX    CW >> XO           XO  HW
         O           XO            OX
      H / \ H                   H / \ H
       v   v                     v   v
     XXO   XXO                 XXO   XXO
  CW XOX   XOX << HW     HW >> XO    XOO CW
      OO   OO                  OOX    OX
    C |                               | C
      v                               v
     XXO << CW                 CW >> XXO
     XOX                             XOO
     XOO                             XOX

  Now we are at the top level, so we choose the move that results in
  the best score for the computer player. That would be the second
  move, so we execute that move.

- The minimax algorithm does exactly this, but it only recurses on one
  subtree at a time. So it will recursively score the left subtree,
  then the middle one, then the right one, and return the best of
  them.

  Minimax(player)
  Let Move be an object corresponding to a move, ScoredMove be an
  object corresponding to a Move and its score.
  Then:
        ScoredMove bestSoFar = new ScoredMove(); // default
        ScoredMove result;
	// If the game is over, return a fake move and the score
        If the state is a draw, return new ScoredMove(null, 0);
        If the state is a win for the computer, return new
	    ScoredMove(null, 1);
        If the state is a win for the human, return new
	    ScoredMove(null, -1);
        // We set scores initially out of range so as to ensure we will 
        // get a move
        If player is computer, bestSoFar.score = -2;
        Else bestSoFar.score = 2;
        For each move m do:
            perform m;
            result = Minimax(next player); 
            undo m;
            If it is computer's turn and the result is better than the
            bestSoFar:
                bestSoFar.move = m; // new best move
                bestSoFar.score = result.score
            Else if it is the human's turn and the result is worse
            than the bestSoFar:
                bestSoFar.move = m;
                bestSoFar.score = result;
        Return bestSoFar;

- Evaluation Function:
  The minimax game tree grows exponentially with depth, so the
  algorithm runs in O(2^d) time. As you saw in the tic-tac-toe lab,
  running minimax on larger boards is very slow. In practice, we stop
  evaluating after a certain depth. However, not all boards at that
  depth will be game over boards, so we have to come up with some way
  of assigning a value to it. We can use continuous instead of
  discrete values, and then when we think we are winning, we can
  assign the board a value between 0 and 1, and when we are losing, we
  can assign it a value between -1 and 0. The minimax algorithm will
  still work fine.

- Pruning:
  Since the minimax algorithm only examines one subtree at a time,
  once we find a subtree for which we attain the best possible score,
  we no longer have to check the other subtrees.

  starting board:    XXO  
             ------- XO  -------
            /         O         \
         C /        C |          \ C
          v           v           v
        XXO          XXO      Don't have to check      
     HW XOX    CW >> XO       since we found a 
         O           XO       best move.
      H / \ H               
       v   v                
     XXO   XXO           
  CW XOX   XOX << HW     
      OO   OO            
    C |                  
      v                  
     XXO << CW           
     XOX                 
     XOO                 

- Alpha-Beta Pruning:
  We can do even better than the above pruning method, if we keep
  track of the best score that each player can achieve so far. We call
  alpha the best score the computer can accomplish, and beta the best
  score the human can accomplish. Then if alpha and beta ever
  converge, we know that continuing to search the current subtree is
  useless.

  ABMinimax(player, alpha, beta)
  Let Move be an object corresponding to a move, ScoredMove be an
  object corresponding to a Move and its score.
  Then:
        ScoredMove bestSoFar = new ScoredMove(); // default
        ScoredMove result;
	// If the game is over, return a fake move and the score
        If the state is a draw, return new ScoredMove(null, 0);
        If the state is a win for the computer, return new
	    ScoredMove(null, 1);
        If the state is a win for the human, return new
	    ScoredMove(null, -1);
        // We set scores initially out of range so as to ensure we will 
        // get a move
        If player is computer, bestSoFar.score = alpha;
        Else bestSoFar.score = beta;
        For each move m do:
            perform m;
            result = ABMinimax(next player, alpha, beta); 
            undo m;
            If it is computer's turn and the result is better than the
            bestSoFar:
                bestSoFar.move = m; // new best move
                alpha = bestSoFar.score = result.score
            Else if it is the human's turn and the result is worse
            than the bestSoFar:
                bestSoFar.move = m;
                beta = bestSoFar.score = result;
	    If alpha >= beta, return bestSoFar;
        Return bestSoFar;

  Why can we stop searching when alpha >= beta? Let's look at an
  example:

                   ---A
            start |   |
                  |   |
                   ---
                 /
                / comp
               /
           ---B
          |   |                  alpha = -2
          |   |                  beta = 2
           ---
         /     \
  human /       \ human
       /         \
     ---C        ---D
    |   |       |   |
    |   |       |   |
     ---         ---
               /     \
         comp /       \ comp
             /         \
           ---E     more stuff
          |   |     down here
          |   |
           ---

  This could be a game tree for any game (tic-tac-toe, chess,
  etc.). We start by evaluating board C. Say that it scores 0.3. Since
  it was the human's move, we updata beta to be 0.3.

                   ---A
            start |   |
                  |   | 
                   ---
                 /
                / comp
               /
           ---B
          |   |                  alpha = -2
          |   |                  beta = 0.3
           ---
         /     \
  human /       \ human
       /         \
     ---C        ---D
    |   |       |   |
    |0.3|       |   |
     ---         ---
               /     \
         comp /       \ comp
             /         \
           ---E     more stuff
          |   |     down here
          |   |
           ---

  Now we evaluate board D. First we evaluate its left subtree, E. Say
  it scores 0.7. Since it was the computer's turn, we update alpha to
  0.7.

                   ---A
            start |   |
                  |   |
                   ---
                 /
                / comp
               /
           ---B
          |   |                  alpha = 0.7
          |   |                  beta = 0.3
           ---
         /     \
  human /       \ human
       /         \
     ---C        ---D
    |   |       |   |
    |0.3|       |   |
     ---         ---
               /     \
         comp /       \ comp
             /         \
           ---E     more stuff
          |   |     down here
          |0.7|
           ---

  Now alpha and beta have converged, so we immediately set D to 0.7
  and B to 0.3. Why? When the human is looking at board D, he knows
  that he can achieve a score of 0.3 with board C (min is better for
  him). After he evaluates board E, he knows that at best, D is 0.7, 
  since if the [more stuff] evaluates to less than 0.7, the computer
  would choose E. So he sets the value of D to be 0.7. Then when the
  computer evaluates B, he knows that the human will choose C rather
  than D since C is better for him. So he sets the value of B to 0.3.


                   ---A
            start |   |          
                  |   |          
                   ---
                 /
                / comp
               /
           ---B
          |   |
          |0.3|
           ---
         /     \
  human /       \ human
       /         \
     ---C        ---D
    |   |       |   |
    |0.3|       |0.7|
     ---         ---
               /     \
         comp /       \ comp
             /         \
           ---E     more stuff
          |   |     down here
          |0.7|
           ---

  Final tree before evaluation of A's other children.