21 MiniMax algorithm

Bhushan Trivedi

Introduction

Now we are ready to see how MiniMax algorithm works to solve the game playing problem. The MiniMax algorithm works same as we discussed in the previous module. It recursively calls itself with ply value one move than the previous and pass the value of the player as opponent of current player. The algorithm is basically a depth limited depth first search process. The algorithm starts from the initial position, generate children using a plausible move generator and go to the next level. Now, reverse the player (as now opponent is going to take the next move) and continue repeating the process till a signal to end the process. When we receive the signal to end the process, the static evaluation function is applied to leaves, and we will back the value up and return the best move at the moment. This MiniMax algorithm can be improved by a method known as alpha-beta cutoff. This alpha beta cutoff is based on our study of branch and bound earlier in module 12. Any branch which not leading to an optimized solution, i.e. the cost of getting the partial solution is going beyond the known path cost, the complete branch is pruned from the game tree.

Functioning of MiniMax AlgorithmLet us try to understand how the algorithm works by a few examples.

Figure 20.3 two ply search -2 backing the value up

Look at figure 20.1,20.2 and 20.3. The player starts from the state A. A has four children generated through plausible move generator; B, C, D and E. When the player is trying one ply lookahead, as depicted in figure 19.1, state C looks most promising and the player might tempt to choose the same. The two ply deep search is depicted in 19.2 and 19.3. In two ply search, two plies are generated first (again using plausible move generator). The static evaluation function is applied to the nodes F to N. Now from F, we can achieve 5 but from G the possibility is of -3 only. Similarly, H is the best move which yield 7 if we choose to play C. Last two trees do not look that promising at the first sight as all values are negative, favoring the opponent. Interestingly, this move is taken by the opponent and he would like to choose a node which minimizes our chances and so instead of F, he would choose G if we play B. If we play C, he would choose I and not H. Similarly the case for D and E. That means, if we play B, we end up at -3 and if we play C, we may nearly lose, ending up at -9. Now other two moves sound more promising. In fact, the move which would look worst if one ply search is deployed by the player seems best in two ply search process. This kind of upheaval is quite possible, particularly in the middle of piece exchange. For example when we have captured an important piece of the opponent (for example a rook or a queen), we seem to have achieved a big material advantage. If the opponent is able to capture our important piece in return (capture our rook or queen), the honors are even and the move which looked so great, does not look so now.

MiniMax AlgorithmIt is time now to look at the formal algorithm.

We will assume two entities which we have discussed in the previous module, the plausible move generator and the static evaluation function. We assume player W and B (W is playing white while B is playing Black, two common names used in Chess parlance). We also assume the static evaluation function SEF returns the evaluation on the basis of the winning chances of W. The player B therefore chooses the minimum valued state while the player W chooses the maximum valued state. We assume the function PMG (Plausible Move Generator) returning multiple moves that we can make from current State. Current state is passed to the algorithm (which is recursive). The first call to the algorithm takes the state the player in at the moment where he will have to decide the next move. As we are deploying a recursive algorithm, let us have a brief introduction to them before we present one.

Every recursive procedure has three phases,

a) Winding,

b) Stopping criteria and recursive call

c) Unwinding

The winding part happens before the recursive call; it sets the tone for the procedure to do its job. Technically, this part happens before in calling function and later in called function. We have the stopping criteria and recursive call next. The final part is unwinding, which is most critical. The final part happens before in called function and later in calling function.

Our concern here is to have some terminating criteria for our algorithm. In fact the terminating criteria depends on many things, when one of the player has won, we have explored enough number of plies to get quite accurate measurement of the ability of the next move, or time is getting over, are we progressing (getting better and better nodes) or not (exploring further does not change the situation much. If the situation is not changing much (like the best path remains the same), we can stop exploring further. We will assume the function Over () to return true when any one of the condition is satisfied for terminating.

Another point is, what should the algorithm returns when terminated. Our algorithm returns two things here, first, the complete path to the leaf node under consideration; second is the depth of the game tree. Both are part of a single object of type NodeInfo which the algorithm returns.

We have variable Path which stores the path; the unwinding part of the algorithm accumulates the path. The returning function adds the current state to the path and thus we get the path being accumulated from the end to the first state. We have SEF (Static Evaluation Function) which we have been referring so far.

The algorithm does neither describe the plausible move generator nor static evaluation function. As mentioned in earlier modules, both of them depend on a typical game being programmed. The approach that we have chosen here helps us to build a game-independent solution which we can tailor for a given game providing SEF and PMG for that particular game.

After this brief introduction to recursive procedure, let us introduce the formal algorithm

Minimax (CurrentState, CurrentDepth, CurentPlayer)

1. If Over (CurrentState,CurrentPlayer) // reached to leaf

{

NodeInfo NF;

NF.Path= NULL

NF.Value = SEF(CurrentState,CurrentPlayer)

returnNF,

}

2. Else

ChildrenStatesList = PMG(CurrentState, CurrentPlayer) // Generating children

3. If ChildrenStateList == [] //if no children, it is same as a leaf

{

NodeInfo NF; NF.Path = NULL;

NF.Value = SEF(CurrentState,CurrentPlayer);

return NF;

}

4. ElseBest = MinSEFValue // initialize Best to minimum (-10 in our case)

for each Child in ChildrenStateList do

{NodeInfoChildValues = MiniMax(Child, CurrentDepth + 1, 1 – CurrentPlayer);

ChildSEF = (-1) *ChildValues.Value; If Best <ChildSEF

Best = ChildSEF; // set Best to better value

Path = Child + ChildValues.Path;

}

5. NodeInfo NF

NF.Path = Path;

NF.Value = Best;

returnNF;

The processLet us try to understand what we did in above algorithm, step by step. This is a recursive algorithm and the first step describes terminating criteria. If the state that we examine is the last (based on ideas that we learned earlier, including if somebody has won, time is over etc), we return a structure. The structure is like this

StructNodeInfo

{

String Path;

int Value;

}

Thus, there are two things, path, which is in string form and value which is integer. The NodeInfo structure thus represent two things into one, path from this node to the leaf node under consideration (from where the value being backed up) and associated heuristic value. We build the path from the leaf node. We attach the parent to the leaf node which appears to be the best path and continue that way till the state from where we have started. Thus we get the path from originating state to the leaf under consideration. Every time the current node is added the algorithm checks if the current node belongs to the best path or not. The current best value is also calculated from that.

First step checks the current state and see if we have reached to a termination state. If so, that is our leaf node and we need to return back from here. The path terminates here so we return Path value as NULL. The SEF is applied at this node and the value is returned. This is the leaf node from where we start our journey of backing the values up. Why the path is NULL? The path ends at leaf and thus the path value is null. When this path is added to the parent, the value is increased by one and that will continue till the start node is reached. Similarly, the SEF is calculated at leaf and passed up. So the other value is calculated and provided to the parent.

Second step explores the node if it is not our leaf node. The plausible move generator (PMG) is applied to the current state and all children are generated. If there are no children, it is anyway a leaf node so the same process is applied here. When a node does not contain any children? This might be the case where one of the parties has won or we reached to a draw state. In that case also, we will set the Path variable as NULL and SEF accordingly. If one of the parties have won, the extreme SEF values (-10 if opponent has won or 10 if we) are backed up. Anyway, it is quite similar to reaching a leaf node. So the fourth step does that.

Fourth step begins with collection of list of nodes generated as successors of the node. These children are processed one after another in this step. For each of the children, we call Minimax recursively. We increase the depth value by one (as while exploring child we go one level down), and change the player.

We assume CurrentPlayer as a variable defining who the player is. The CurrentPlayer value is 0 when we play and the value is 1 when the opponent plays (we can even interchange that value without any trouble here). With every level change, we change current player using a simple formula 1 – CurrentPlayer which converts 1 to 0 and 0 to 1.

We receive the return values from the MiniMax call into a structure ChildValues. The SEF value backed up from the child is negated for solving a problem. At one level we want to maximize and at next level we want to minimize the values. If we just check for maximum every time, we can negate the values to achieve the same result. For example if we have -1,2and3 as values from Max level, we back up 3. If other children at upper level (which is Min), we have 4 and -2, we need to get minimum, so we negate them first so we get -3 (which we just back up), -4 (negating 4) and 2 (negating -2). Now when we choose the best (2), we are backing up the worst value. To provide negation we have multiplied the value by -1.

If the backed up value is better than current best, (Best <ChildSEF),we will set the new best value as ChildSEF. Now the best path also travels through this node, so attach child (which is current node under consideration) to the (best) path returned from MiniMax.

Thus each child is processed and the best value is returned from child in form of a structure NodeInfo.

When we explore complete list of successors, the best value might change a few times and so the Path value. Eventually, when we exhaust the complete list of nodes, we have both the values the best child and the path. So we return that in the final step.

One interesting observation is about recursive procedure. The procedure simplifies the logical understanding. One can assume MiniMax is called for say 6 plies deep. It works from first level and before calculating best, calls MiniMax for each of its children, their children also invoke their children before calculating best and it goes on till the ply no 6. Now Over() returns true, so static evaluation function values are found and backed up at level 5. Now Best is calculated at level 5 and is returned back to level 4, and so on. Eventually it reaches to level 1, provides the best value overall which we can return now.

One last point before we discuss how we can improve this algorithm. What will be the first call to this recursive algorithm? Well we need to pass CurrentState as current position, player’s identity and depth value as 0. So following is one way of calling MiniMax.

CurrentPlayer = 1 (We, 0 if opponent starts) MiniMax(CurrentState, 0, CurrentPlayer);

Need for improvement

The MiniMax is a depth first depth limited process. When our criterion is achieved, the recursive process halts and then the control unwinds, take best child from the successors connected to the parent. The process is done for every successor and the best of the successor is returned up as next node on the best path. There are a few issues with this approach. Suppose we are exploring a branch where we have received the SEF value as 5. Now we have a new node at maximizing level with value less than 5, there is no point in exploring it further (no need to explore more children of it) as at maximizing level we already have 5 which anyway be selected. Similarly at minimizing level if another node has 7; there is no point in going for another child of that node as we already have the node with 5 so the opponent will always choose that. Let us try to understand this point through an example.

Closely observe figures 20.4. A has to decide the next move out of two choices B and C. B is explored and has two children F and G with values 5 and -3. So if A plays B, he can get -3 (or more if the opponent incorrectly chooses F); now A starts exploring another child C. Like B, it has two children; H and I. The H with 7 and I with -9 are already explored. The point is, should A explore other children of C? Even if they have larger than -3? The answer is no. Why? Let us see.

This is a place where the opponent is going to choose a move. When the opponent has already a chance to play -9, exploring the next child j of c has two possibilities, either it is less than or equal to -9 or more.

In case of it being more than -9, the opponent will choose I and won’t choose J thus resulting in -9. If it is less, the opponent will choose J which will yield value less than -9. Thus we are guaranteed to have less or equal to -9. When this value is updated and backed up to C, it is guaranteed to have -9 or less.

That means if A plays C, he is likely to get -9 or less. Now this move is to be played by the player A, who has a choice of playing B which is -3, which is better and the player will play that only. Thus, exploring the next child of C won’t make a difference in the selection of move. In fact it is worthless to explore C further once we get a child with any value less than -3, as the player won’t be choosing that move. When we avoid exploring the node at maximizing level like this, it is called alpha cutoff.

Another case is depicted in figure 20.5. When one of the children of C, the H, is explored and assigned value -9, we know that opponent will play that unless the other child yields less than -9. Now exploring other child, the I, we come across one more case. One of I’s children yields 5 as the SEF value; thus ensuring that we get 5 or more at this level. As this is maximizing move, even if we get another child with -10, the player will choose the one with value 5. The opponent, knowing this, won’t play I as H has already guaranteed the value -9. The opponent will not play I as he has an option of playing H with guaranteed value of -9. A should not explore any other children of I as it is guaranteed not to be played by the opponent given the choice of H. You can see that this cutoff happens at minimizing level. A cutoff happening at minimizing level is known as beta cutoff.

Beta cutoff seems counter intuitive as we are pruning the branch with higher values of Static Evolution Function. Only when you see that the player who is playing that move is our opponent and he will not play a better move for us given a choice of a worse move, it makes sense.

Both alpha and beta cutoffs are quite commonly used in most game playing programs. One may wonder why we should do this process for saving just one node. It is not so. In many cases the game tree is 15 to 20 plies deep. When we cut a node at level 3 for a game tree of length 15, we are saving a tree of 12 plies deep. Looking at a large branching factor for most real games, this is a huge saving.

Now, we can see that the algorithm that we have discussed earlier needs improvement. We need to provide two threshold values, one that we know a maximizing player is guaranteed to achieve (the beta cutoff) and a minimizing player is guaranteed to achieve (the alpha cutoff). The backup value that we get must be between these two values.

How to use these two values in implementing the MiniMax algorithm is covered in the next module.

Summary

The Minimax algorithm functions in straight forward manner which we have seen in previous module. It explores the game tree, applies static evaluation function to the leaves and backs the values up. We use a two value structure to represent the path and the best value along that path from every child. The value and path for every child is calculated recursively. The final child (the leaf) calculates the SEF and the rest along the path only backs up the best value. The best value and the players are negated each level to match with the requirements or Min and Max levels. The algorithm is invoked with depth as zero, current player and current state described by the game. Though this algorithm works, it can be improved by two simple tricks applied at maximizing and minimizing levels. At minimizing level we apply value called alpha cutoff while at maximizing level we apply beta cutoff. Both cutoffs has the potential to reduce the search space to a large extent.

you can view video on MiniMax algorithm