5 Heuristic search methods

Bhushan Trivedi

epgp books

 

Introduction

 

We have seen that most but trivial problems are plagued by combinatorial explosion. To combat combinatorial explosion the designers should use domain knowledge to restrict the search process. The unguided search methods that we have seen in the previous module must be augmented with heuristics to solve AI problems. How it is done is the theme for this and subsequent modules.

 

Heuristic function

 

We begin with a notion of a heuristic function. This function takes the problem state as the input and generates a value between some extremes. Usually it is between -10 to 10. The value -10 indicates that it is impossible to reach to a solution (a goal state) from this given state while the value 10 indicates that the state is a goal state. For example HFFCN(L(0,0,0,0) R(1,2,4,8)) = 10 for the farmer fox chicken grain problem and Hmg((0,0,0)) = -10 for 8,5,4 gallon milk jug problem. Here HFFCNandHmgarerespective hashing functions for farmer fox chicken grain and 8,5,4 milk jug problem.

 

More interesting point comes when some other value is generated by the heuristic function. When it is so, it decides the merit of that state. The heuristic function higher values indicate more probability of reaching to the final state from that particular state. In such cases the value of heuristic function (which is between -10 and 10), indicats how far it is from the goal state. For example a node’s H value 9 indicates that the node is much nearer to the goal state than the node with H value 5. That means it is better to explore node with H value 9 than node with H value 5.

 

Suppose if we apply all possible rules to the state (6,2,0) and find that (8,0,0), (6,0,2), (3,5,0) (3,2,3) are next possible states. Now we apply our heuristic function to all four of them and yield that

 

H(8,0,0) = -5, H(6.0.2) = 5, H(3,5,0) = 3 and H(3,2,3) also is 3 than we can clearly state that he move which yield 6,0,2 is better than others. So we prefer to explore that node before others.

 

There are two questions commonly arise when such argument is presented. First question is how to write such a function. We may, for the time being, assume that such a function is defined and available. The second question, if such a function is available, how to put it to use. In this module we will try to focus on the answer to the second question. We will see some examples to answer the first question in later modules.

 

Hill climbing

 

One of the simplest search strategies while using heuristic is known as hill climbing. Our discussion thatfollows refer to the description of the state space exploration process that we have seen in the third module. It works like this

  1. Pick up the start state and consider it a current node.
  2. If current node is the goal state, quit.
  3. Generate a new state by applying the next applicable rule, if no rules left report failure and return.
  4. Apply heuristic function to the new node.
  5. If this value is more than the parent node’s heuristic value, make the new node a current node and go to 2.
  6. If not go to 3.

 

 

Carefully look at figures 5.1 to 5.3. These figures describe different scenarios. These scenarios indicate different cases of hill climbing process. Each hill climbing process is depicted as a tree having nodes and arrows. The numeric value inside a node represents the heuristicvalue of the node. Arrows indicate children of the parent node generated by applying possible rules. The figure 5.1 indicates how a normal hill climbing process works and completeswhen reaches to the goal state. Figure 5.2 showcases a typical case where all children have a smaller heuristic value. The hill climbing process requires a better node than the parent. It is not possible to find one in this case and so the process comes to a grinding halt. Such a situation is known as local maxima.

 

This search strategy is called Hill Climbing as we are making sure the next move is on the higher side every time. If heuristic function output is height, we are climbing, hence the name. Guess why problem mentioned in 5.2 is called local maxima. It has all its surrounding nodes having a lesser heuristic value. Thus we are at a point which is at height more than surrounding but we are not certainly at the pick. That is why it is called local maxima1.

 

Now look at figure 5.3, it depicts a case where we reach a node where no rules are applicable. It is hard to imagine such situation for the trivial problems that we are working on but consider chess again. We might be in a situation of a deadlock where we cannot move a piece. Such a situation is called a dead end. In most cases (not in chess) one can go back to a previous move and try another path but hill climbing does not allow that. Thus Hill Climbing is applied for problems which are monotonous (travelling in one direction only, not allowed to move back once the forward step is taken).

 

One typical variant of hill climbing chooses the best child for a given node. Unlike normal hill climbing, it explores all children and picks up the one which has the highest heuristic value. This process, quite logically, known as steepest ascent hill climbing. This method yield a faster move to the solution but if the branching factor is high and the difference between the best and reasonably good move is not much, it wastes lots of time looking for all children. That situation is depicted in 5.4.

 

If one looks at how hill climbing works closely, one might feel that it is better if we do not only check for the children of the current node but other unexplored nodes as well. Instead of just looking at the children of the current node one may try to find the overall best node instead. That type of search strategy is known as best first search and is the next point of discussion.

 

Best first search

 

The best first search, as the name suggests, picks up the best node based on heuristic value irrespective of where the node is. It has three types of nodes, first are the nodes which are yet to be explored.Second are which already explored and third, the best node(s) currently.

  • 1 Sometimes the value of heuristic function is decreasing for a better move and that process is known as valley descending. The local maxima problem becomes local minima problem there. There is no conceptual difference though.

 

The nodes are in contention for exploration. As the name suggests, the best first algorithm explores the best node from the list, that means explores the node with the highest heuristic value.

 

In the process, the explored nodes are moved out of contention. They are all stored together in another array of explored nodes. Why are we keeping them if they are of no use? They are used to check if a new descendent generated belongs to that list or not. If a descendent belongs to that list, it is not added to list of unexplored nodes. All other children are added to that list.

 

Let us see how the best first search principally works.

  1. Pick up the start state and consider it a current node
  2. If current node refers to the goal state, quit.
  3. Generate all new child states by applying all applicable rules to current state, if no rules left report failure.
  4. All child nodes which are already part of explored node list are removed.
  5. Apply heuristic function to each remaining node.
  6. All nodes are inserted in an array sorted in descending order of their heuristic value. This array may contain other unexplored nodes from previous exploration.
  7. Find out the best node (with best heuristic value may be lesser than the parent) from the array. As it is sorted, the top entry is to be picked up.
  8. If the array is empty, report failure.
  9. Move the current node to list of explored nodes, make the best node as a current node and go to 2

 

This search process has two important characteristics.

  1. The solution demands storing the unexplored nodes in sorted order of the heuristic values
  2. We always pick up the best node, irrespective of their parent’s value. We may proceed even when all children have lesser values than the parent. In fact we may continue even when some other node is the best node and not any one of the children.

 

Figure 5.5 to figure 5.9 depictthe best first search process. The gray nodes are unexplored nodes. The best node from unexplored nodes is displayed as a dark gray node. Once it is explored, all the children of the best node become part of the set of unexplored nodes and the explored node now is out of the list of unexplored nodes. The next best node is selected from the other unexplored nodes including the children of the current node.

 

If you closely look at the process, it is quite jittery, that means it is jumping from one place to another and seems to take lot of time converging on the right path. One may ask why such behavior. Let us try to understand.

 

A heuristic value is not an exact value; it is just a rough estimate. It is important to note that if the heuristic function is exact, we do not need to have any search, we will always get the best node explored and we never need to look back. Such an erratic behavior depicted in figures 5.5 to 5.9

 

indicates the inability of the heuristic function chosen to correctly estimate the state. To decide a heuristic function which can estimate reasonably accurate is an empirical problem and sometimes the designers require to experiment with multiple heuristic functions to zero in on the best possible heuristic function. In many cases, (one of them is Chess) the researchers are able to get a reasonably good heuristic function so much so that it is possible to have a human level of expertise exhibited by computer programs. Chess playing programs often beat human experts using the power of heuristics. Though better programs not only have heuristics for the game in general but also some special heuristics for the expert opponent. For example, the computer Deep Blue which beat Gary Kasparov who was world champion then, had many heuristics based on the moves Gary took in previous games and had a huge amount of knowledge about the pattern that he follows.

 

We will throw more light on this issue when we will look at A* algorithm.

 

 

Branching factor

 

Heuristics is helping us in thwarting the problem created by the number of combinations and permutations that a solution provider should address due to branching factor. A branching factor of a solution tree is average number of children each node has. For example the chess problem average branching factor is 35. In other words, on an average, everynode has 35 children.

 

What is the consequence of the branching factorvalue? Consider a case of a complete binary tree. That means the branching factor is 2. Consider the number of nodes increases at each level. First level is one node; second level is 1+1 =2. That is one more than the previous level. Third level we have 4 nodes that is one more than the total of all previous levels again. Take any level, this rule is true. An nth level has one more node than all other nodes encountered so far for all the levels before; i.e. all the nodes after exploring level n-1. Thus, at every level, number of nodes in the complete binary tree almost doubles itself.

 

What about trees with higher branching factor, for example 35? The new level will be having 35 times more nodes after first iteration, that makes it 35. 35 * 35 – 36 (i.e. 35 +1, number of nodes already explored by previous level) increases in the next iteration, 35 * 35 * 35 – (35 *35 + 36) in the next to next and so on. It is clearly understood from this discussion that at each level number of nodes increases exponentially.

 

One can easily understand that exploring just one more level from a current position is little too much for a real world problem with higher branching factor. Sometimes the method called secondary search is applied in the problems like Chess. Suppose we explore the search tree for n plies (levels) looking at the time available (In the problem like chess, each player has limited time), and found a node which looks better than others (like our simple case shown above). Now we cannot afford to explore next level so we choose only to explore the node under consideration. Though we cannot explore the entire tree for next level, we can always explore the next level for a single node. This is going to help us in determining whether the node chosen is really good.

 

Solution space search

 

The state space is sometimes called solution space where each node represents the solution. Let us try to understand.

 

While we are searching for a solution and traverse though the search space, there are two usual ways of doing so. First method is known as perturbation search. It assumes a node of the solution space to be an actual solution, if it is found we quit otherwise try next one. If we try next one without taking any feedback or any knowledge of the domain, we are using generate and test. If we take feedback and using domain knowledge to decide the next candidate, it is hill climbing or best first.

 

Another method is to build the solution one step at a time. The missionary cannibal problem solution that we have seen is of that type. We move from one state to another to make sure we reach nearer to

the final solution. In that case, we know exactly which state we want to reach, but we cannot directly reach there so we build gradually. Such search methods are known as constructive search methods.

 

In fact the solution search can be done in either ways. Take the case of missionary cannibal problem. We may assume the solution space containing some random sequences of moves. We may start with one such random sequence of states. If those states are not correctly sequenced (that means n+1th state cannot be reached from nth state) or invalid (the preconditions are not satisfied), we may pick up another random sequence based on our observation and domain knowledge), we will be using perturbation search. If we start from initial state, go to next best state and so on, we are again hill climbing but using a constructive search.

 

Both, perturbation and constructive methods are used in practice for solution search. Rather, many times both of them are used together to solve bigger problems. The Hierarchical Generate and Test (HGT) example shown in 3rd module is one such case.

you can view video on Heuristic search methods