20 Prerequisites to MiniMax and other
Bhushan Trivedi
Introduction
One of the oldest algorithms for game playing is MiniMax. This algorithm works on game trees described in the previous module and help the player to choose the best possible move considering as many plies it can. We have already learned about Max and Min moves as alternate moves in a two person complete information game tree. We would like to see how we can search using a search algorithm with the ability to realize the Min and the Max levels. The idea is to extend what we have learned during earlier modules. We will apply a simple trick, we will negate the heuristic value every alternate (Min) level. We will always choose the maximum value of heuristic function and thus making sure that the move taken takes highest value state at Max level and lowest value at Min level.
In previous module, we looked at three different types of nodes, W, L and D indicating win, loss and draw. Until the final decision is made, we can continue calling them as N or normal. We will elaborate what does these types of node mean for a game, what are prerequisite to applying any two player algorithm for choosing a right move here and learn enough to pick up the algorithm in the next module.
We will also look at how one can decide static evaluation function for the game and what type of components can be considered to be part of the function at the end of this module.
The process of MiniMax
Let us continue our discussion from the previous section. Let us again take the case of tic-tac-toe. The final state can be of three different types, win for us, win for the opponent, and a draw. Figure 19.1 illustrates one example each for these three cases.
Now onwards we will only show the states by W or L or D indicating so. All other states are incomplete and possible to be explored further, which we indicate as N. Let us try to learn the process of MiniMax by looking at an example. In this example we take four different types of node which we represent using four symbols. N nodes are normal nodes, W nodes are states which represent win for us, L represents state which indicates loss for us while D indicates draw for us1. Looking at this, one example game tree may be drawn in figure 19.2. Looking at which, can you suggest out of four possible moves, which move is better for the player?
- 1 Sometimes, the player-opponent is used to indicate two players, sometimes we-opponent is written to indicate the pair, sometimes player1-player2 are used.
There is a simple method by which one can determine a move which is better or not. It is called backward propagation. We begin from the leaves and determine if it is Min or Max. If Min, we pick up worst case and pass it up, otherwise pick up the best case and pass it up. Let us look at each of the figures 19.2. It shows a game tree which is again drawn partially. Most game playing programs have to deal with partial game trees as they do not have enough time to draw the game tree completely. Another point about that game tree is that not all children are generated for most nodes. It is due to the fact that we avoid some moves considering them not important. How have we decided that? For the time being assume that we have some plausible move generator which can decide that for us.
We proceed in two steps. First, we explore maximum levels of the game tree (using plausible move generator and thus avoid exploring some nodes). You can see in the figure 20.2 that we have chosen to explore 6 plies. How have we decided 6 plies? There are many factors, major one is available time to take this move, second is branching factor which is very high generally even with plausible move generators, and third criticality of the game. For a more critical part of the game, we need to explore more.
Once the game tree is constructed, we will examine the states of the final level in the second step. In our case depicted in 19.3, some leaf nodes are W, some are L, some are D while the rest are N (we cannot decide which will be result of playing that move). If we can explore game tree till the end, we will not have any node with N as every node is one of L, W or D. That is one of the ironies of the time requirement.
Closely observe how the node information is backed up. Look at figure 19.3. When there are two children, one of them is D and another is L, we will back up L. Why? Because it is Min level and opponent is playing it. He will prefer L (we losing) than D (draw). If we have L and W the opponent will invariably choose L. If we have D and N, the opponent will choose N as that might lead to L at later stages.
Figure 19.4 depicts the case where the values are backed up to one more level. What is the difference in previous backing up process and this? This is a Max level where we are going to play and thus we choose the best move for us. Thus if we have a choice between W and L or W and D, we will choose W, or otherwise if we have choice between L and D, we will choose D and so on.
Figure 19.5 depicts the case where all nodes are backed up. Look at some cases. In Min levels the worst state is chosen while in Max levels best state is chosen. Though there are some states which lead to win for us, exploring this game tree land us in limbo. All four moves that we can go for look the same. You can see that our exploration of the game tree has revealed a few things for us. For example if we pick up the second move from the left, there is one node which leads to win for us. Is not that a good idea to choose that move? Unfortunately the next move will be in the hands of the opponent and he will not choose that move. He might choose N or D (the other two options available to him). If he chooses the node with W, he will have one more chance next time, but at that time both possible moves lead to win for us and thus whatever he chooses, we are going to win.
However fruitless it seems (looking at the backed up node values being same), the backing up has an important advantage. We can see that no move is leading to lose for us. Another point is, we can still say that second move is little better than others as there is a chance of winning if opponent makes a mistake. How can we differentiate? We can differentiate it by using some static evaluation function.
That function might differentiate all N nodes by indicating the possibility of winning from that move based on domain information.
Thus, passing just the information about the node is leading us to a win, a loss or a draw is not enough. We must provide additional information about which node (which state and thus which move) has higher probability to win so we can choose so. For that we need the Static Evaluation Function, which is like heuristic function returns a value between -10 t0 10. -10 is defeat for us and 10 is win. Every other value indicates how good or a bad move it is.
Static Evaluation Function
After learning about the Min and Max moves and the game trees, let us throw some light on the static evaluation functions.
For the tic-tac-toe problem, we can have some rules for defining a good static evaluation function.
1. It should provide highest value for state where we win and least value for a state where we lose, so for the case that we take up here, -10 for loss and +10 for win.
2. It should give higher value when we have two consecutive cells occupied and third cell is empty
3. It should give lower value when the opponent has two consecutive cells occupied and third cell is empty
4. It should provide more value to a case where we have one cell occupied and other two cells are empty
5. It should provide higher value to a case where we have one cell occupied and it is possible to have more than one option to have consecutive 3 cells marked.
Let us try to understand these rules. The first rule is self-explanatory but also important for us to have. Now when we have occupied two consecutive cells and third one is empty, we are actually one move away from it, so it is a very good situation. Any move leading to such a state is definitely considered better than others. If opponent has similar structure, we are one move away from defeat so we should avoid such moves and thus we rate them low. A little less important is the case where we have one cell occupied and there is a possibility of others to be occupied.
Look at three contrived cases to emphasis on what we are trying to convey. Case 1 indicates three choices a, b and c. Here in a and c we take the same move, placing X in top center position; but in a, the top right is empty while in b, it is not. Thus move is better in condition a and not in b. Similarly c is not good as X is placed where we cannot have two X in sequence. This is indicated by first heuristic.
In the second case, we can see that opponent has two 0’s in place. The b is the only move that blocks it. Both a and c, whatever they do, do not stop opponent in winning. This explains the second heuristic. We should not move towards a case where opponent has chance to have two cells in sequence and third in line is empty.The last case C is the best option as we have X placed where we have multiple options for choosing a complete sequence to win. The other possible moves, a and b also provides option but only one possible sequence. The advantage of having multiple open sequence is that even if opponent blocks one, we can still go for another.
Figure 20.6 Three different cases.
If you are a smart player of tic-tac-toe, you probably have learned that the best move in the beginning is to occupy the central cell. Why? Because of rule number 5, we can have multiple (4) options to complete the sequence unlike any other case. If not central position, the next best position is a corner as it provides us two options compared to any other place which provides us only one.
Let us reiterate the understanding that the tic-tac-toe is still a very simple game where it is possible for one to draw a complete game tree, which is impossible in most other real world games like Chess, Go and Checkers. Thus, the strategy that we have following earlier (finding out if the move leads to win or lose or draw and back it up) does not really work there. We can only look at those states and determine which has more possibility of leading to win. We ought to have a static evaluation function to determine which move to choose next.
The move which appears to be better than other from the perspective of the player playing Max moves, get a better value from static evaluation function. That means the opponent chooses a move which reduces the value of static evaluation function. Unlike the node value which can be either of the W, L and D, the static evaluation function output is numeric and usually between some negative and positive values. We will use -10 and 10 as we did earlier, some authors use -1 and 1 but the meaning is same. In case the range is [-1,1], it is a real value between -1 and 1 including both (but usually only one digit after the decimal, i.e. 0.1,0.2,0.3 ). -1 is a lose for us while 1 is win. 0 indicates the equal chance for both of the players. Sounds great! Isn’t it? You may ask me, how do I get the value between -10 to 10 (or -1 to 1 or -1000 to 1000 or whatever suits you)? The real answer is, it is the value derived from expert’s intuition and judgment of the component of the states of the game and their relation to winning or losing the game. In most cases, the value of static evaluation function is derived from a weighted summation of those important components’ values. Components and their weights are decided by the experts. For example control of the center is one important component. An expert might decided that the weight of that component is 20 so we multiply the value of control of the center by 20. Another expert might suggest that threat of fork’s value is 35 so we multiply it by 35 and so on. In fact some components might even have negative weights if they help the opponent and not us. Thus, the static evaluation function is of the following form.
These components can be of two types. One determines the position of pieces on the board. If they occupy important positions, they are rated higher, if they occupy not so important positions and with limited accessibility (most pieces except Knight behind the pawn has this problem) have less points. Another type is about raw material value of the piece. Pawns are rated as lowest while queen at the highest. King is usually more than values of all the pieces, thus making sure that king should be saved even when all other pieces are to be sacrificed.
The positional components add dynamism to the game. For example a pawn might only be valued at 10 when it is at its usual position, much lesser to 1000 which the queen valued at its usual position. It is possible for the pawn to help check the king of the opponent in certain cases. (For example a case when the opponent king can only move to two positions, both of which can be the next killing position for the pawn under consideration. In this case the positional value of the pawn may be more than 1000). Closely look at the figure 20.7. The opponent king is under attack and check is provided by the white rook. The situation is such that the king cannot move anywhere (it is the end of the game). One can see the use of two white pawns. One of them, (with gray sides on left and right), guards two important positions which leaves the king with no option but to surrender. Here the positional power of this pawn is well beyond a queen which is not posing any threat to the king.
One may think that more components lead to better results. That means if you count more things for your heuristic function, considering smallest of issues far and wide, it is the best way of designing a heuristic function. It is not usually so. More components mean more computation and more time. If we can keep only those components which makes sense, it becomes easier and faster to compute and thus we can explore more states. The Deep Blue could only explore 6 to 7 plies deep despite the best possible hardware and software (it was running AIX OS and the program was written in C, making it a very good choice to code in those days. It was also a supercomputer3), and only sometimes to 20 plies or so. The better the heuristic value maps to the real value in shortest possible time it is better. That is why the Deep Blue team also had a grand master in their team to guide them.
2 Deep Blue had nearly 8000 components in its Static Evaluation Function
3 In June 1997, Deep Blue was the 259th most powerful supercomputer according to the TOP500 list, achieving
11.38 GFLOPS on the High-Performance LINPACK benchmark. (Source Wikipedia)
It is also interesting to note that not all experts agree on weights of a given component and not all experts even agree on choice of components themselves. The Deep Blue was specifically designed by IBM to beat Gary and had many moves Gary played in past completely embedded into it. The components, in short, designed to be specific for playing against Gary. That might not work against some other player4
If one needs to design a computer which can generally play chess and win against any world champion, it would be a much more difficult thing as it would need more general components and much different set of weights. It should also have a capability to decide about the mental model of the opponent and decide about the strategy and plan that he generally plays his game.
For example if we are playing against a player who is very good at playing fork, we must assign higher weight to threat of fork. Look at figure 19.8. You can see that the white knight can attack both the king and the queen together. We can only move one piece in the next move (assuming we are playing black). So the knight will always be in a position to capture one of these two in the next move. We will have no choice but to move the king and let the queen be captured. Even if the knight is captured in the next move, the opponent will have a huge piece advantage after that move. If our opponent is good at finding out ways to reach to such board positions, it is always better if we consider this as a much more weighted component in our play.
There are quite a few other components, one important one is degree of freedom we have for a given piece. Consider the case depicted in figure 19.9. The black king is under attack but queen cannot help as it is not able to capture the opponent’s knight is attacking. What is the reason? The black pawn which stands in the middle hinders the movement. It is important to choose the moves where more important pieces have more mobility and freedom to move around. Thus if the player is planning to have this move, he should probably initiate some other move earlier to remove its own pawn from the path.
4 It is quite possible that a novice may start playing with deep blue and confuse it by playing some real stupid moves which, when deep blue try to compare with Gary’s moves, does not have a clue!
There are many similar components one may consider for deciding about the components and their weights based on positions and mobility and so on and thus generating a good static evaluation function for a complicated game like chess is not a trivial job.
Summary
The process of Minimax involves deciding if a move is leading to win or loss or a draw. When a game tree is derived, the leaves are evaluated and the values are backed up based on Min or Max layers. It is not enough to just learn that a node is either leading to win or lose, as it lands us in no man’s land. It is important to assign some real value to each move to indicate the chances of win for us if we take that move. Static evaluation function helps us decide the goodness associated with a move. We have seen some important considerations for finding out a good static evaluation function for a simple game of tic tac toe and a complex game of chess. Center control, threat of fork, degree of freedom and piece advantage are a few important components for a static evaluation function. Usually a static evaluation function is a weighted summation of such components with weights assigned by experts. Having a good static evaluation function for a complex game is a task in itself.
you can view video on Prerequisites to MiniMax and other |