15 Iterative deepening A*, Recursive Best First, Agents

Bhushan Trivedi

epgp books

 

Introduction

 

The A* has proven itself to the extent that other algorithms also adopted the good characteristics of A*. We have studied Iterative Deepening in module 4. When A* is combined with Iterative Deepening, it is called IDA* or Iterative Deepening A*. The algorithm works the same as before but chooses better paths based on the heuristic values. We stated in module 4 that Iterative deepening’s power is obvious when used with heuristic and IDA* is the way to it. IDA* is to A* what a depth bound search to DFS. The property of the A*, it gets an optimal solution if the heuristic used is optimal, also holds for IDA*. Another useful algorithm is recursive best first (RBF) search. Both IDA* and RBFS are memory bound searches. They are designed to make sure that the memory utilization is minimal. In that, one better node, other than children is kept and chosen for exploration if the children are found not better. Recently AI is extensively driven by AI programs which are more or less automated, can interact with real world using connectors called sensors and actuators, and learns how to solve a typical problem on their own with the help of human counterparts and their own built in knowledge. These programs are known as agents. We will look at some basic information about agents in this module as well.

 

IDA*IDA* and RBF were proposed by Prof. Richard Korf and few other researchers of Columbia University in 1985. He was interested in solving the space complexity of algorithms and was interested in improving the cost of solution path. Before that, ID was proposed and used by many two person games including Chess. IDA* and RBF* were part of the project where researchers looked at optimizing three things about the algorithms that they studied and proposed. First is the amount of memory that they need, second is the cost of the solution and third is cost of the path to the solution. We have already seen that not all solutions demand path optimization. For some typical problems (like TSP) solution themselves does not matter much but the paths are important.

 

Korf was working on solving problems related to computational biology. Those problems used to have a typical grid like structure. The grid like structure makes the things very complicated in a different sense than other search problems. Which such a high connectivity, the number of options to every path is very large. Thus when we search and keep the list of paths already explored, they tend to be extremely large. He thought that if he does not need to keep the explored list, he can save on huge memory. The prize that he has to pay is the possible exploration of same nodes multiple times. Korf proposed two algorithms, one is IDA* and another was RBF. Both algorithms had linear space search strategy that means over a period of time, the amount of memory needed increases linearly. Both the algorithms provide linear space search at the cost of additional time spent in exploring same node multiple times.

 

Let us see how it works. The IDA* is nearer to Depth First Depth Limited search rather than Iterative Deepening. Unlike both of them, IDA* does not explores all children level by level. It explores them by logical level of g + h’. It explores the search space in depth first way and calculate the g + h’ values of each of the path being explored. Unlike conventional iterative deepening, where the exploration is of one arc every time, here it is the path with the typical length g + h’.

 

For example we may decide that we will explore for g + h’ value to be maximum 3, now we apply depth first search with applying heuristic to each one of the children and stop when it becomes greater than 3, and pick up another branch like we did in Depth First Depth Limited search. After exploring all children and the branches associated with them, we check if we get a solution somewhere. If we get the solution, we may quit. If not, we can either increase the maximum limit or stop if we cannot proceed further. We may, in above example, now increment the g + h’ value to 6 and restart exploring from the root node.  As mentioned in  module 4, this seemingly wasteful exercise may be quite useful here. When we have explored and revised our estimates and so on, we have far more accurate measurements next time and it is more likely that our search leads to better direction this time. In fact this process demands more time but gets an optimal path. One can understand this process to be an Iterative deepening search where the algorithm proceeds by levels by pre-decided g + h’ values each time and not physical levels. Closely look at figures 15.1 and 15.2. The white node is start node; the red node is the end node while all green nodes are nodes in the range of g + h’.

 

Figure 15.1 first iteration of a typical IDA* search

 

Figure 15.1 describes first while figure 15.2 describes second iteration. You can see more number of green nodes towards red node in both cases, more in case 2 than case 1. It is clearly visible that the nodes in the other than right direction get their g + h’ value over limit before nodes in the right direction and thus it helps us to get moving in the right direction unlike other search algorithms we have seen before.

Figure 15.2 second iteration of a typical case of IDA*

 

The biggest advantage of IDA* is that it does not need the amount of space A* needs to have. Being depth bound, it requires the memory needed to be just one node at a time. The other important advantage is that the paths are explored which are near to solution as we are restricting the search based on g + h’ values and not with the length of the tree. The third advantage, as mentioned already, is it uses optimized storage. The process is costly in terms of time, as it explores branches every time a fresh during a new iteration. For example, all green nodes which are explored in figure 15.1  during the first iteration, are explored again in second iteration described in the figure 15.2.

 

The algorithm’s initial cut-off value can be decided based on the estimate of the root node1; i.e. the h’ value. If the h’ is admissible, i.e. underestimates h always, if we get a solution it must be optimal. If we do not get a solution, we can increment the value of cut off by the best so far node in the list of unexplored nodes. It is quite possible that the best so far node is the root node itself but with much tighter estimate. Now when we search, we are bound to get a solution which is within that limit. While h’ is admissible, the solution might not be received during the first few iterations but will never get beyond optimal.

 

IDA* algorithmThe formal algorithm can be described as follows.

 

1. The limit = h’(root node), current node = root node

2. Path = root node, Best node = root node

3. Pick up leftmost child of current node (if in a graph, one can pick up any directly connected node at random), and make it current

4. If it is goal node quit

5. Find g value of the child and apply h’ to it.

6. If g + h’ > limit

 

“1  We will continue calling this node as a root node and not a start or initial node. The idea is to emphasis the search process being executed like exploring a tree and not a graph, as parent nodes appear again in the tree and more than one parent points to a single node, the process progresses like a tree otherwise.”

 

a. update parent node estimate based on the best path and update it till the root node

b. pick up another node as if reached to dead end (usually the right child of the parent)

c. change path variable accordingly

 

7. otherwise

 

a. If no node left in the tree

b. change limit value to limit = g (best node) + h’ (best node)

c. Go to 2.

 

8. Otherwise,

 

a. add this node to path

b. if this node is better than best node = current node

c. explore this node, go to 3

 

There are a few things you probably have noticed. This is basically a DFS algorithm. Only when g + h’ becomes greater, it stops and starts searching for another branch like Depth limited Depth First Search. We have named that variable as limit. So we explore the depth first tree till the child node’s estimate goes beyond limit. This clearly helps us travel in the direction of goal. The nodes which are away from goal node have their h’ values higher than the nodes which are nearer and thus further (to goal) nodes crosses the limit value earlier than the closer (to goal) nodes.

 

There are two different situations emerging from the exploration. First, the g + h’ value going above the limit which is similar to reaching a dead end. So we pick up the parent’s next child and explore further. If parent does not have a right child left we will pick up parent of the parent and so on. Another situation is when the limit value is reached for all branches of the root node. We will increase the limit to whatever best so far. This best so far must be higher than earlier best so far as we could not find the goal node so far. We will restart from the root node and explore the tree (or graph) yet again. This time the limit value is incremented sometimes by one node or two nodes etc based on the best node’s value.

 

Limitations of IDA*

 

There are basically three limitations of IDA*

 

First, it explores the entire tree again when searching the entire tree is failed to achieve the goal node. Though this process eliminates the storage requirement, it increases the time

 

Second, as there is no real sense of direction, especially while exploring graph, many nodes are explored multiple times, as the exploration process is tree like. All paths to all nodes from every other node based on not crossing the limit are to be explored. This takes exponential amount of time if nodes are well connected with others like city routes. Thus the IDA* is better suited for the less connected graphs and not for others. One interesting parallel can be drawn with the broadcasting problem in networks. The network broadcasting is done when the destination node’s whereabouts are unknown to sender and thus the message is broadcasted across entire network. Every node receives the broadcasted message from all of its neighbours and he sends back it to all the neighbours, every time it receives it. This makes every node receiving the same message multiple times. To make sure that the broadcast does not overwhelm the network, every node does a simple trick in computer networks. It only broadcasts the message coming from the neighbour which is nearest to the sender. This simple trick saves lot of unnecessary transmission. Unfortunately this is not possible here as we are not working on a distributed algorithm (the computer network case, each node can take the decision to broadcast or not, on its own, in a distributed fashion, which is not possible here). This is an example of how real world differences make the same things better or worse.

 

Third, we face the same problem that we face in A*. When we take an admissible heuristic function, it does not mean that we have the best possible heuristic function. We can actually choose a little worse heuristic function and make sure our optimum path is not worse than that small difference.

 

Recursive best first search

 

The other algorithm proposed by the researchers group at Columbia University is the recursive best first search. The backtracking is based on depth first method and not like conventional best first search. The process only remembers one best so far node. When a node is explored, and all children are found to be of worse value of heuristic value, the remembered best so far node is picked up for exploration. Otherwise the best child is explored.

 

The best first search that we have seen earlier had an important advantage. It was able to move to the best node in any circumstances. It was able to keep the list of explored and unexplored nodes, able to avoid the exploration of already explored nodes by looking at the list of all explored nodes before exploring any child. It also maintained the list of unexplored nodes in the order of their merit so at any point of time, the best node can be explored.

 

Recursive best first cannot afford to have the list of explored as well as unexplored nodes. To limit the search space to linear, it modifies the original algorithm in a way that only one node is kept for backtracking.

 

The algorithm works like a normal depth  first search with  picking  up the best node based on heuristic value. When all nodes are explored, the best node is explored but second best is preserved in memory. If the best node is explored and all children are found to be worse, the search process backtracks to the second best node (which is now best in the list including all the children as well as it). At the same point of time, the best child is kept as second best node now. At any given point of time, this search method only explores one node and remembers one more. Like depth first search, it also remembers the next node if it reaches a dead end. Korf found this algorithm quite expensive in terms of time though save on storage space.

 

Agents

 

We have looked at many search algorithms in due course. Where shall we apply these algorithms? Do we design a GUI interface? Or should we have an API? In recent times, the stress is on to develop agents and provide search algorithm as one of the components of the agents.

 

An agent is somebody who is working on behalf of a user. The agent is a well-defined entity in manual world. A railway ticket booking agent, for example, books ticket on traveller’s behalf. A real estate agent looks for and checks for estate properties and prices on buyer’s or seller’s behalf to the other party. Similarly when we are discussing about a computer based agent, it is a program or process working or acting on user’s behalf. A rational agent is the one who takes decision based on rational logic based on user’s preferences. For example you may find a program called trip advisor or trip planner which can give you options based on your preferences and choices. For example if you say that I would like to travel to a hill station with the budget at about 20,000 per person, the program might look at ticket booking options to hotel and cab booking options based on the budget that you provided and also your preferences, for example the program might be aware of the fact that you always prefer 2nd  AC over 3rd  AC or Flight over train or Business class over Economy class and so on. It also might look at other points and when cannot reason due to insufficient information, come back and ask you “Are you planning to travel alone or with family?”, “How many of you are travelling together?”, “Would you like to travel by car instead of train”, etc. Once all information is collected, the agent, quite similar to the human counterpart, will decide things on its own and may comeback for confirmation.

 

Another example of agent is an intrusion detection agent. This agent is given primary logic of looking at network logs to determine if there is a likelihood of intrusion. When it can sense so, it might report to a central agent, who, might deploy specific agents capable to check for typical type of attack and are equipped with sufficient arsenal to combat with or at mitigate the effect of those attacks. For example if the initial observation reveals some attempts to a denial of service attack, a specially designed agent might start inserting rules in firewall to drop the attack packets or block the user and so on.

 

Another example of agent is a parking sensor in a parking lot. Those agents are able to beep when the user is not parking his car properly or even take a photograph to prove the same in court. In this case, it is acting on behalf of a traffic policeman. In the era of IOT (Internet of Things) devices, a self- driving car, house monitoring system, smart agricultural system which can find out the amount of water needed for irrigation and provide just that much amount and so on can all be acting as agents.

 

Agent can be of two different types. One type of agent is a physical agent, a robot of some sort which is actually doing something physically. Another type of agent is all software; it is process acting on user’s behalf. The agent with a body also has some software to act. The hardware part of the robot is not important to us right now and so we will not discuss it further.

 

Agent Environment

 

An agent is characterized by its ability to take inputs from the real world from various ways, process them, store them, reason with them and generate output. The part of agent which takes input from real work is called sensor while the part of agent which responds back is known as actuator. Usually an agent is connected with the world by multiple sensors and actuators. The context in which the agent is operational is called the environment. For a software agent, both actuators and sensors are all software. Simplest example of a sensor is a text box which takes input from user and an example of actuator is a print statement which prints something on the screen. Look at the figure 15.1. The agent communicates to the environment using actuators and sensors. In fact the agent is part of the environment or immersed in environment where both these components are connected to the other parts of environment and receive and send signals using them.

 

How these agents actually work? For example an intrusion detection agent might look at some input patterns observed over a period of time and then decide that it is an attack. For example if it finds that the attacker is trying to sense what is happening on port 30, then what is happening on port 31 and so on till the port 65536, the agent can conclude that the attacker is trying port scanning. The agent in this case look at what the attacker is doing right now, what it has done in past one minute, past 10 minutes, past hour, previous day and so on to decide. Every input to the agent is known as a percept and combination of such input (which is stored in the same sequence it has come in) is known as percept sequence. The agent’s job is to look at current percept and/or percept sequence so far and decide an action based on the input.

 

Conventional intrusion detection systems were like virus scanners. They look at a specific signature and determine if the signature indicate the intrusion. For example if a packet contains same source as well as destination address, it understands that the attacker is trying a “Land” attack. In that case, the agent is simple and it takes action only based on current percept. Another type of agent, known as behavioural agent looks at what the process or a user is trying to do in recent past and how far it is deviated from normal user behaviour. For example a normal user has a “failed login” problem once in a day. If the user fails to login 50 times in last one hour, it can be a case of intrusion. In this case the agent has to look at the complete percept sequence of last one hour. In fact if the agent is in “learning” mode, it might look at a complete percept sequence for last one month to determine the behaviour of a genuine user.

 

The agent environment can be depicted in a very simplified form using basic parts. Look at figure 15.5. The agent connects with environment using two different types of connectors. Sensors are used for collecting information from the environment while actuators are used for reacting back. For example the sensor on front of a moving automated car might receive a human image in front, the agent processes that and press brake paddle (the actuator) in response. In fact, the sensors and actuators are part of environment only. The agent is logically connected to both of them and does not interact with environment directly. For a computer program, other than input/output statements, no statements communicate to outside world. In the same sense, the agents communicate with the outside world using actuators and sensors.

 

Figure 15.3 How agent connects with environment. Sensors gather precepts for an agent, agent reacts to them using actuators.

 

Rationality

 

Agent looks at the percept or percept sequences and decides the next action to take. How can we say that agent has taken a right decision? One seemingly good answer is “based on output”. For example you can measure goodness of a chess playing agent by finding how many chess games the agent has won out of total played so far. That might not be a very good measure. Let us take an example. The agent takes a very silly move and the opponent tops him by taking a sillier move. The agent wins. Now if the agent learns that the stupid move that it took is the best move in that situation and considers that its decision making is good, do you agree with him? The outcome is good but the move that led to it is not. Similarly if the agent has taken a move which is best possible in a given case but the opponent is smart enough to win despite that good move. Should the agent consider that the move was bad? A self-driving car may decide to take a sharp turn without bothering about the cars coming from back and reaches the destination in time. Was that a good move?

 

It is not really easy to assign credit to a certain move in such cases. Sometimes this is called credit assignment problem. A similar case is when the agent takes a move which is best in situation but still lose as the opponent is smart enough to take a better move. Do you agree if the agent considers this move as a bad move? In fact, success of a chess playing agent depends largely on the ability of the opponent. If the agent is playing with novices and winning, it might lose when encounter an expert playing similar moves. If the agent keep on assigning good values to moves which lead to wins while playing with novices, those very moves will lead to its losing while playing with experts.

 

You probably have learned so far that the output does not only depend on actions of the agents but other factors as well in a real world environment. For example a self-driving car takes a legitimate decision of blinking the side lights, takes the rear and front camera images as inputs and taking a right turn, and a rash driver of some other car bumps into this car, should the agent consider that it is a bad move? Another example, what if a self-driving car races fast and nothing objectionable happens as other drivers take precautionary actions and it reaches to the destination in record time, should it consider that the fast driving is good? In fact in an average situation, the first move usually does not lead to an accident while the next one does whatever the actual outcome is.

 

Bottom line is; we cannot decide the rationality of the decision the agent has taken based on the output of the action. We may end up assigning credits in incorrect fashion. How can we decide about the rationality of the decision then?

 

In fact here is where the idea of agent’s objectives comes into picture. The agents are given some objectives to meet by the designer. How far the agents are able to meet with those objectives determines their rationality irrespective of the output.

 

For example, taking a sudden plunge into a street with heavy traffic by a self-driving car is not rational and should be against the objectives of the car. If the agent is capable to take decisions which are in line with the objectives of the designers of the car it is fine. It might happen that a rash truck driver, coming from a wrong side, crushes the self-driving car, when the agent has instructed the car to drive on the right side.  The agent does not conclude that the driving on the right side was a wrong decision2. Another point is if the car chooses the wrong side and reaches to the destination faster, should it consider the driving on wrong side a good decision? Most would agree that it is not. That means the rationality, in true sense, does not depend on result or outcome of experiment. If the agent has decided anything on percept sequence so far, which might result into a bad outcome, it is still fine. An agent cannot foresee the outcome and decide. The only issue is to make sure the decision making process has a strong logical base. In turn, that logic base depends on the objectives of that problem solving process the agent is designed to implement. That means the agent must decide and act in such a way that it meets the objectives as long as it can.

 

Learning

 

When we play chess for the first time or drive for the first time, we do not do well. After multiple attempts we get better. We also expect the agents to exhibit this behaviour. It is important to design agents with minimal information and it should add on with experience. The agent architecture must have three things built into it. First part is what the original designers had in mind as what agent has 2  If the agent often encounters vehicles coming from wrong side, it might conclude that this particular road is dangerous and might take another path as a better alternative. In that case, it might need to look at the percept sequence of last one week or so and backtrack to the decision of choosing this road. It might need to take next best route. Looks like depth first search? It is; the only difference is that we take the next best value based on heuristic information. set out to work. For example a chess playing agent might have only basic rules of chess, what winning and losing is etc. Second part is about how to infer from those basic rules and find out more rules from it. It should also have objectives and make sure if the rules or sequence of rules result into something which is in line with objective or not. Accordingly it might need to modify its current behaviour. For that, the third part is to learn from the experience, correlate moves and actions based on situations and find out what is good and what is bad in a given context. In the module 20 we will be referring to game playing programs which can learn from their experience. One of the earlier programs Checkers had the ability to learn on their own to the extent that it evolved into an expert game player to beat the designer (Arthur Samuels) himself. In fact, the agent requires having two parts, one is under the control of the designer, and another is based on agent’s choice. For example the self-driving car should not open the door of a fast moving car on its own, if the user insists; it might allow it by displaying a warning sign. Another example, choosing the route can be prefixed by the designer; fine tuning the route if the road is blocked may be under the control of the agent. So there are two types of logics the agent works with. Prefixed by designer and automatically decided by the agent. In most cases, the agent’s logic overrides the built in logic by designers. More the experience better is the logic of an expert quite similar to human counterpart.

 

The agent relies on two things, the built in logic (by the designer) and understanding of the percept sequence it has received so far. A good agent might have some built in logic to start with but relies more and more on its own percept sequence and the rules it has generated over the period of time; exactly like its human counterpart.

 

In fact, one can discuss many things about agents. One of the interesting topic may be how can we represent the problems that we have discussed so far using agents. A good reference is “Artificial Intelligence, A Modern Approach” by Stuart Russell ad Peter Norvig. We will not discuss it further.

 

Summary

 

Iterative Deepening A* and Recursive Breadth first search are two algorithms which are useful in space constrained environments. Both the algorithms save on memory at the cost of exploring same node multiple times if it is part of multiple routes. IDA* acts like ID with a change that now g + h’ is taken as the length of each iteration rather than plain vanilla 1. This helps exploration of nodes in the direction of the goal more than other directions. The problem with IDA* is that it takes lot of time exploring same node, if it falls along multiple paths, as many times as different path along with it falls. Recursive best first search is another example of such algorithm. This algorithm is (though it is mentioned as best first) quite similar to depth first but with exception that it always remember the next best path and explores it in case of current best path no longer remains best. Current AI research is focused on using agents for problem solving. Agents work on user’s behalf and can be hardware + software or software alone. Agents work in specific context which is known as environment. Agent senses the environment using sensors while affect the environment using actuators. A rational agent takes decisions in line with objectives of the designers of the agent. Every agent is designed with special objectives in mind which they must observe by choosing between alternatives. The ability of the agent is not only determined by the ability to do its job but how it learns to improve its own functioning. For that an agent has a learning component. Whenever an agent receives a percept of percept sequence it has seen before, it must produce the response based on what it has learned from the earlier experience. An agent, armed with basic knowledge provided by the designer, is expected to learn and act like human counterpart after this learning process.

you can view video on Iterative deepening A*, Recursive Best First, Agents