8/8/02 Amir Kamil Topic: Minimax, B+ Trees, Threaded Trees Announcement: Minimax: - You've seen the algorithm, since you did the lab on it yesterday. Forget the algorithm for now, let's try to come up with an intuitive way of choosing our next move. - Let's try all possible moves, then recursively try all possible subsequent moves, until the game is over. We will keep track of who made each move. X = computer, C = computer move O = human, H = human move starting board: XXO ------- XO ------- / O \ C / C | \ C v v v XXO XXO XXO XOX XO XO O XO OX H / \ H H / \ H v v v v XXO XXO XXO XXO XOX XOX XO XOO OO OO OOX OX C | | C v v XXO XXO XOX XOO XOO XOX Now we score all the final boards, HUMAN_WIN, COMP_WIN, or DRAW. starting board: XXO ------- XO ------- / O \ C / C | \ C v v v XXO XXO XXO XOX CW >> XO XO O XO OX H / \ H H / \ H v v v v XXO XXO XXO XXO XOX XOX << HW HW >> XO XOO OO OO OOX OX C | | C v v XXO << CW CW >> XXO XOX XOO XOO XOX Of course, in this game, there can be no draw, so no boards are scored as such. Now, at each move, we assume that the person who is moving is brilliant and will pick the best move for him. So if it is the human's turn, he will pick a move that will result in a win for him, if that is impossible then a draw, and if that is impossible, then he has no choice but to pick a losing move. The premise is that each player plays perfectly, so if a player can force a win/draw from a subboard of the current board, the current board might as well be the subboard. So we give the current board the best value of any of its subboards. starting board: XXO ------- XO ------- / O \ C / C | \ C v v v XXO XXO XXO XOX CW >> XO XO O XO OX H / \ H H / \ H v v v v XXO XXO XXO XXO 1) XOX XOX << HW HW >> XO XOO (2 OO OO OOX OX C | | C v v XXO << CW CW >> XXO XOX XOO XOO XOX For boards 1 and 2, there is only one possible move for each, so the score becomes the same as the score of the next board. starting board: XXO ------- XO ------- / O \ C / C | \ C v v v XXO XXO XXO 3) XOX CW >> XO XO (4 O XO OX H / \ H H / \ H v v v v XXO XXO XXO XXO CW XOX XOX << HW HW >> XO XOO CW OO OO OOX OX C | | C v v XXO << CW CW >> XXO XOX XOO XOO XOX Now for boards 3 and 4, since it is the human's turn, we choose the move that is best for the human, and score the current boards the same as the best next board for him. starting board: XXO ------- XO ------- / O \ C / C | \ C v v v XXO XXO XXO HW XOX CW >> XO XO HW O XO OX H / \ H H / \ H v v v v XXO XXO XXO XXO CW XOX XOX << HW HW >> XO XOO CW OO OO OOX OX C | | C v v XXO << CW CW >> XXO XOX XOO XOO XOX Now we are at the top level, so we choose the move that results in the best score for the computer player. That would be the second move, so we execute that move. - The minimax algorithm does exactly this, but it only recurses on one subtree at a time. So it will recursively score the left subtree, then the middle one, then the right one, and return the best of them. Minimax(player) Let Move be an object corresponding to a move, ScoredMove be an object corresponding to a Move and its score. Then: ScoredMove bestSoFar = new ScoredMove(); // default ScoredMove result; // If the game is over, return a fake move and the score If the state is a draw, return new ScoredMove(null, 0); If the state is a win for the computer, return new ScoredMove(null, 1); If the state is a win for the human, return new ScoredMove(null, -1); // We set scores initially out of range so as to ensure we will // get a move If player is computer, bestSoFar.score = -2; Else bestSoFar.score = 2; For each move m do: perform m; result = Minimax(next player); undo m; If it is computer's turn and the result is better than the bestSoFar: bestSoFar.move = m; // new best move bestSoFar.score = result.score Else if it is the human's turn and the result is worse than the bestSoFar: bestSoFar.move = m; bestSoFar.score = result; Return bestSoFar; - Evaluation Function: The minimax game tree grows exponentially with depth, so the algorithm runs in O(2^d) time. As you saw in the tic-tac-toe lab, running minimax on larger boards is very slow. In practice, we stop evaluating after a certain depth. However, not all boards at that depth will be game over boards, so we have to come up with some way of assigning a value to it. We can use continuous instead of discrete values, and then when we think we are winning, we can assign the board a value between 0 and 1, and when we are losing, we can assign it a value between -1 and 0. The minimax algorithm will still work fine. - Pruning: Since the minimax algorithm only examines one subtree at a time, once we find a subtree for which we attain the best possible score, we no longer have to check the other subtrees. starting board: XXO ------- XO ------- / O \ C / C | \ C v v v XXO XXO Don't have to check HW XOX CW >> XO since we found a O XO best move. H / \ H v v XXO XXO CW XOX XOX << HW OO OO C | v XXO << CW XOX XOO - Alpha-Beta Pruning: We can do even better than the above pruning method, if we keep track of the best score that each player can achieve so far. We call alpha the best score the computer can accomplish, and beta the best score the human can accomplish. Then if alpha and beta ever converge, we know that continuing to search the current subtree is useless. ABMinimax(player, alpha, beta) Let Move be an object corresponding to a move, ScoredMove be an object corresponding to a Move and its score. Then: ScoredMove bestSoFar = new ScoredMove(); // default ScoredMove result; // If the game is over, return a fake move and the score If the state is a draw, return new ScoredMove(null, 0); If the state is a win for the computer, return new ScoredMove(null, 1); If the state is a win for the human, return new ScoredMove(null, -1); // We set scores initially out of range so as to ensure we will // get a move If player is computer, bestSoFar.score = alpha; Else bestSoFar.score = beta; For each move m do: perform m; result = ABMinimax(next player, alpha, beta); undo m; If it is computer's turn and the result is better than the bestSoFar: bestSoFar.move = m; // new best move alpha = bestSoFar.score = result.score Else if it is the human's turn and the result is worse than the bestSoFar: bestSoFar.move = m; beta = bestSoFar.score = result; If alpha >= beta, return bestSoFar; Return bestSoFar; Why can we stop searching when alpha >= beta? Let's look at an example: ---A start | | | | --- / / comp / ---B | | alpha = -2 | | beta = 2 --- / \ human / \ human / \ ---C ---D | | | | | | | | --- --- / \ comp / \ comp / \ ---E more stuff | | down here | | --- This could be a game tree for any game (tic-tac-toe, chess, etc.). We start by evaluating board C. Say that it scores 0.3. Since it was the human's move, we updata beta to be 0.3. ---A start | | | | --- / / comp / ---B | | alpha = -2 | | beta = 0.3 --- / \ human / \ human / \ ---C ---D | | | | |0.3| | | --- --- / \ comp / \ comp / \ ---E more stuff | | down here | | --- Now we evaluate board D. First we evaluate its left subtree, E. Say it scores 0.7. Since it was the computer's turn, we update alpha to 0.7. ---A start | | | | --- / / comp / ---B | | alpha = 0.7 | | beta = 0.3 --- / \ human / \ human / \ ---C ---D | | | | |0.3| | | --- --- / \ comp / \ comp / \ ---E more stuff | | down here |0.7| --- Now alpha and beta have converged, so we immediately set D to 0.7 and B to 0.3. Why? When the human is looking at board D, he knows that he can achieve a score of 0.3 with board C (min is better for him). After he evaluates board E, he knows that at best, D is 0.7, since if the [more stuff] evaluates to less than 0.7, the computer would choose E. So he sets the value of D to be 0.7. Then when the computer evaluates B, he knows that the human will choose C rather than D since C is better for him. So he sets the value of B to 0.3. ---A start | | | | --- / / comp / ---B | | |0.3| --- / \ human / \ human / \ ---C ---D | | | | |0.3| |0.7| --- --- / \ comp / \ comp / \ ---E more stuff | | down here |0.7| --- Final tree before evaluation of A's other children.