23 Other Refinements

Bhushan Trivedi

epgp books

 

Introduction

 

We have seen how game playing domain is different than other domains and how one needs to change the method of search. We have also seen how MiniMax search algorithm is applied and also seen how alpha beta pruning helps improve the efficiency of the MiniMax search algorithm. In fact it is possible to improve the functioning of algorithm by providing few additional refinements to the process of searching. In this module, we will provide a few of the possible refinements.

 

There are a few simple refinements one can deploy while playing game playing algorithm. First is waiting for stable states where the change of heuristic values across layers is not drastic. Second is to use secondary search to avoid horizon effect. Third is to use book moves for typical non-search part of the game playing. Fourth is to use other than MiniMax algorithm. We will look at SSS* and brief about B* algorithms in this module which provides certain degree of improvement over the MiniMax algorithm under certain conditions.

 

Waiting for stability

 

We have already stated that it is important for us to learn when to stop as it is impossible to reach to a leaf node of a real life game tree. We would prefer to stop when the estimate that we have of the node based on static evaluation function, to be as accurate as possible. One good measure of the stability is that when the heuristic estimates do not change much over a period of time, during exploration of next levels.

 

Let us take an example depicted in figure 23.1. When we apply the SEF at first or second level the values are changing drastically and the best node changes. After exploring 4th level you can see that the nodes are not changing much and almost all nodes are showing similar SEF values. Though MiniMax does not do this, some other algorithms including one B* which we brief about at the end of this chapter implements this. It does probe and decide not only the best move but stops only when it gets one move significantly better than others.

 

Look Beyond the Horizon

 

The game playing algorithms, however good, are plagued by the horizon effect. When the search stops at any ply, for example 5th, it is quite possible that the 6th ply may produce exactly opposite result than what 5th ply as produced. This is popularly known as horizon effect. It is impossible to explore one more ply in these circumstances as exploring one more ply requires huge memory and time requirements.

 

One solution to this problem is to explore the node that we have chosen to play for a few plies further in place of exploring the entire game tree; thus avoiding exploring entire ply. For example look at figure 23.1. if A decided to play C, the result is decided on the basis of N so that node is critical for us. In this case, N is explored further for a few more plies and tested if it still has the SEF value near 7. If so, we can accept that otherwise pickup next best node and explore. Such a process, as we have already mentioned earlier, is known as secondary search.

 

Using predetermined moves

 

It is also possible to have a database of moves and thus for a given state, the next state is produced without searching using algorithms like MiniMax but picked up from the database. Such a move is known as book move and is pretty useful in start and end games of Chess1.

Figure 23.1 Waiting for stability

 

“1  Recognized sequences of starting moves are referred to as openings and have been given names such as the Ruy Lopez or Sicilian Defense which are used in opening part of the game. Many endgame strategies involve moving the king. They are so stylized that they are indexed and used in end games.”

 

Use other algorithms

 

MiniMax is not the only algorithm that one can use here. One useful alternative is to use IDA* which we have discussed earlier. The advantage of using IDA* is clearer now. We have seen the impact of ordering the best branches before others. IDA*, which explores the game one ply at a time and back the values up, have better idea about which moves are better when try for the next time.

 

One other reason choosing an algorithm other than MiniMax is that MiniMax is designed assuming the opponent is always going to play the best move. Look at the situation shown in figure 23.2. MiniMax would choose move B (-5) for the player as it is better than C (-7). In fact one can argue that it is better to choose C in this case as there is chance that opponent does not choose H but I. If he makes this mistake, we can actually come quite close to winning this game. If we play B, both moves are bad and even when the opponent makes a mistake, it is not very encouraging. It is a good ploy to take a chance in this case but MiniMax does not do that.

Figure 23.2 taking a chance is sometimes better

 

We will see two algorithms which are little different than MiniMax and can produce a better result. We will study them in next two sections.

 

State Space Search *(SSS*)

 

State Space Search * or SSS* (has some similarity with A* so *) was proposed in 79. SSS* concentrates on complete solutions unlike MiniMax which constructs solution part by part. The algorithm works on looking at solution space and constructs complete solution from partial solution like a few methods we have seen earlier. The researchers proved that this algorithm  indeed work better than alpha beta MiniMax algorithm but with a price. It requires more memory and processing for the same problem as compared to MiniMax algorithm.

 

This algorithm starts with an estimation of all possible solutions, aborting exploring solutions going below already known best solution midway. This is quite similar to branch and bound. Another important difference between MiniMax and this algorithm is that Minimax is basically a depth first depth bound search always begin from the leftmost child, SSS* is more like a breadth first search.

 

Each solution in SSS* is known as a strategy. A strategy is the decision of the first or maximizing player. Remember the player is playing at Max levels and thus strategy is about choosing Max player’s moves. The idea is to find all possible strategies and pick up the best. A strategy is constructed by choosing one (of all possible) child for Max layers and all children for Min levels. Why? We decide what we are going to play and allow the opponent to play any move. Thus we will let the minimizing player to choose the move. Consider a 4 layer sub tree depicted in figure 23.32. This game tree is quite trivial. It only has a branching factor of 2 and also is a complete tree. Even in such a small sub tree with branching factor is fixed at 2, and height is just 4, we are having a difficulty drawing it (that is the reason we are keeping it this small). Anyway, the idea is possible to be conveyed with this small sub tree so we proceed further.

Figure 23.3 one strategy for a maximizing player

 

One strategy for that sub tree is depicted in 23.3 while another strategy is depicted in 23.4. The thick black lines indicate the moves chosen by that typical strategy. For example in 23.3, the node A is a maximizing player and he chooses to play B, when opponent plays D, he chooses I and if opponent chooses E, he chooses J. The strategy depicted in 23.4 is little different than this. It chooses similar moves in other cases but E. When the opponent chooses E, the player chooses K. Thus these two strategies differ in this case. Both the strategies have few things common while few things different. Interestingly R and S happen to be part of both of the strategies.

 

“2  Layer of nodes is 5 but plies are 4. We are counting plies and not node layers.”

Figure 23.4 Another strategy for a maximizing player

 

Let us take the strategy depicted in figure 23.3. What is the outcome of this strategy? Looking at final moves, assuming the minimizing player (the opponent) choose the best possible move, will always choose the minimum and we are guaranteed to have 4.

 

Now let us take the strategy depicted in figure 23.4. The outcome here again is 4, as the leaf with minimum value is the same.

Figure 23.5 Choosing and refining a strategy.

 

This node S is part of two strategies. In other words, it is part of a cluster of two strategies. The node S represents a partial solution for this problem. We may start with S being a partial solution to the problem. We might have a better solution later when we refine this solution by choosing a better strategy. This is the same as backing values up in MiniMax. Let us try to understand.

 

The idea in the case depicted in 23.3 and 23.4 is to get a strategy that guarantees more than 4 at least. This value 4 provides us the alpha cutoff. When we explore the strategy further we can also provide the beta cutoff. The strategy is said to evolve over a period of time and once we have a complete strategy the algorithm returns back.

 

Figure 23.5 describes one such way to solve problem. We (the maximizing player) will open all options for us (maximizing moves) and fix minimizing moves which might change later. The idea is to see which strategy is good for us out of all possible ones.

 

Looking at the figure, can we decide the upper bound; the value that is maximum possible for us to get? Yes, it is the best value of four nodes U, W, c and e. i.e. 8 which we get if the opponent plays c. One can ask; how have we decided the Min player’s move? In fact we do not know what he is going to play, for the time being we took right most child. We will soon refine the same.

 

The best node is c which belongs to a cluster consisting of children of N. When we are interested in a strategy involving c, we may would like to extend it further and see what we get at the parent; N. For that, we must examine other siblings of c and see what we get at N. for that, let us explore b (the left sibling or c and another node part of the cluster). Suppose it is 9, should the upper bound change? No. it is a minimizing ply and opponent is going to play so he won’t choose that move.

Figure 23.6 Choosing and refining a strategy- 2

Figure 23.7 Choosing and refining a strategy- 3

 

Let us continue refining that strategy (backing the value up). The N value will become 8. We will not require to explore d; why? This is an alpha cutoff. Opponent will have a chance to choose a move with 7 if we play O, while we are guaranteed to have 8 if we play N, irrespective of the value of node d. So there is no need to explore d.

 

The figure 23.6 depicts the case after this process is carried out. G is refined to get value 8. You can see that so far the algorithm seems to be on the right path. Can it back the value up? Not before we know what F offers. So we need to explore its children. Let us begin with node X. suppose it is 6. Should we explore Y? No. Why? Alpha cutoff. Now we explore Z and get the value 7. Should we explore a? No. Why? Again alpha cutoff. Now F guarantees 6 or less. Thus C will get value 8.

 

Now before we back the value up, we must decide what B offers. In fact, right now we can easily decide alpha cutoffs for T and V. So B is done. It is O’s turn now.

 

Should we explore O? No. Why? Beta Cutoff. Remember this is minimizing move and the opponent is going to play this, If he has an option to play E which guarantees anything less than 6, he will always play that if O offers a better option. When we have a better option playing C for 8, why should we give a chance for the opponent to provide anything less than 6? So we will beta cutoff O.

 

The SSS* is better in the sense that it picks up most promising branch before others and thus has higher chances of choosing the right order. That results into maximum pruning. In above example, we are able to prune 12 nodes.

 

The SSS* algorithm is definitely better in terms of choosing the right move. We can see that the ordering issue in MiniMax algorithm  is handled nicely by SSS*. The improvement is dependent on memory required and processing required for the bookkeeping which is quite large in SSS*.

B* search

 

The A* is extended into another search algorithm called B* by Hans Berliner. He was a world champion in one typical type of Chess as well as a computer scientist.  We will not explore this algorithm here but a few things about the same in order.

 

First, it has three dimensions unlike other games. The third dimension represents the goodness of that node. Another difference is that instead of representing an estimate, the nodes in B* use intervals to represent their probable values. Another point is that it does not have predetermined length of chain of nodes to be explored. It does a preliminary probe to decide better moves and explore them before others. It stops only when one alternative is clearly better than others. Like SSS*, B* does the selective search, and omits non promising paths. B* starts the selective search from the beginning and explore only those nodes which looks promising from the word go. It tries to explore that portion of the tree irrespective of other nodes in the first phase. In the second phase it work more like SSS* and other algorithms by carefully checking if the chosen move is really good.

 

Summary

 

There are a few things we can do to improve the process of MiniMax algorithm. First, we must wait till the situation calms down. Second, after choosing the move, we may expand the leaf chosen and not the entire tree for a few more plies. That will help avoid horizon effect. Another improvement is to use predetermined moves for special cases rather than searching to save time. Another important decision can be made to decide other than MiniMax algorithm. One alternative is to use State Space Search * algorithm. The SSS* algorithm uses a breadth first approach rather than depth first approach used by MiniMax and shown to do better if proper storage and reference mechanism is provided. Another alternative is B* algorithm which is derived from A* and is more complicated but promises to do better under some conditions.

you can view video on Other Refinements