16 Forward and backward state space planning

Bhushan Trivedi

epgp books

 

Introduction

 

There are many ways to categorize problems that we need to solve. Some problems are possible to be abandoned in the middle of solution process without any adverse effect. We may start solving them and take some steps in the direction of solution, realizing midway that we have chosen a wrong path, drop whatever we have done so far and start working on something seems correct at the moment. For example while we are proving a theorem, we may start exploring in a typical direction and found a dead end, we just ignore whatever done so far, pick up another path and try again.

 

Another type of problem occurs when we cannot ignore whatever we have done so far. This type of problem is little more difficult than previous one. Though we are allowed to undo whatever we have done so far by carefully coming back to the point where we would like to take diversion and do so, we cannot ignore whatever we have done so far. For example while editing a word document, we may edit a paragraph and realize that we have incorrectly formatted it. We cannot just ignore and go ahead, we can only undo each and every move that is incorrectly done and comeback to a point where the unwanted formatting is removed, we can progress ahead from there. An 8 puzzle problem is another example of the same.

 

The third type of problem is where we cannot comeback. For example playing a game of chess, we might realize after the 8th move that we made a mistake in 2nd move. Can we ignore and start from correct 2nd move? No. Can we backtrack and start from taking another option from move 2? We cannot do that either. We have to move on from the very state that we are. This is the hardest problem type.

 

Is there a trick which can make problems which cannot be backtracked to actually be able to do something similar? Yes. The trick is called planning. Planning is about projecting in future to assessing if the move that we are choosing is landing us into trouble or getting us closer to the solution. Planning can be done from current node to the final node or from the final node back to current node. Choice depends on many factors including branching factor in either direction. We will study about both types of planning and deciding factor during the journey of this module.

 

Objectives and planning

 

Objectives and Reasoning for problem solution are important to plan. That means two things needed for planning are 1. What we want to do: the objectives and 2. How we are going to do that. The plan is designed based on list of objectives; list of what we want to achieve and how we are going to achieve them. So when we have our start state (where we are currently) and also the goal state (what we want to be), the plan will give us the sequence of moves to reach from a start state to an end state. The sequence that the planning process generates depends on our objectives. For example when we want to have shortest path, we will get the sequence of states to produce least length of path variable. When we want to have a path with least traffic, the states are chosen accordingly (which might be completely different for the same problem). The plan includes the process of applying actions to current state and subsequent states based on two important criteria. First, whether the prerequisites for applying an action do match with current state, and second, how near that action brings the state to goal state. Thus for applying action (a rule in earlier modules represent the same idea), we must see that the prerequisite must match. Another point is that when a sequence of actions are proposed as plan to convert a start state into a goal state, each action must generate a state which has correct prerequisite for the next action to be applicable.

 

Let us take an example to understand.

 

We have looked at agents in the previous chapter. Let us take an example of an agent with the capability of detecting intrusion. When an agent derives something, for example “There is a strong possibility of the denial of service attack”, agent is in a state where there is a positive possibility of an attack. One of the objectives says that the agent must confirm this. To achieve confirmation, the intrusion detection agent might communicate to other IDS agents of other networks, or look at logs of packets sent by the same sender or packets generated by the same user and so on. After confirmation, the agent must plan to take some action, for example removing attacker’s IP address from the firewall. The reasoning that the removal of attacker’s IP address from the firewall stops his packets to reach inside the network, and thus thwart denial of service attack. Thus this action is essential to plan to meet objective of thwarting such attack. Thus attack sequence is, check for an attack, confirming the attack if so, add an entry in firewall if so. You can see that the action is applied if the precondition is true and not otherwise. The plan includes a sequence where one action generates the precondition of the next.

 

Another example is a chess playing plan based on possibility of threat of fork. If the agent can sense possibility of such threat, it might introduce another agent which is designed especially for looking at the board position, opponent’s history and information about the games he played in past and so on to decide evasive actions. One of the objectives here is to check for the threat of fork and thwart if so by avoiding moves which lead to threat of fork. One more objective may be to make sure that the pieces are not organized in a way that there is possibility of fork1. One can clearly see that the objective help decide the sequence of moves and obviously, each action generates the state with preconditions where next action in sequence is applicable.

 

Planning is done to achieve something, in above example to achieve freedom from the said attack. The twist in the tale comes due to the fact the attacker also is planning. The denial of service does not come as a first step from the attacker. He might do few other things before embarking on that attack. The defender’s planning is (and should be) based on some rational thinking about what attacker is up to and what will he be doing next; that is, what is attacker’s plan2. In case of chess, the same logic applies. All our attempts to thwart the possibility of fork might result into some other attack if the attacker is able to outthink us. That means the planning must also include what opponent can plan and making sure that we can outthink him.

 

An administrator can decide what attacker is planning to do next and so plan their own defence in a way that attacker fails to do what he wants. For example the administrator might move some critical system files to an inaccessible place or restrict access to sensitive disk partitions or provide a more stringent database access control and so on. The objectives here can be

 

1. Restrict the attacker’s access to minimum

2. Disallow any system disruption

3. Maintain the integrity, confidentiality and correctness of data

4. Make sure only authenticated users access data

5. No legal user is denied of access to data he is authorized to

6. …

 

1 One may think that the intrusion detection and the chess attacks are similar. They aren’t. In chess players take turns, in intrusion, the attacker might reach to solution (successful in launching the attack) without administrator playing a single move.

2 Some of us think that an interesting solution is to thwart such attack is to remove the machine from the network so the attacker cannot continue. We are now helping the attacker in achieving denial of service attack as our own genuine users will not be able to access the server now.

 

You can see that the list can be very long if we are serious about securing a real network. What we have seen are just a few basic objectives. Though our discussion below is based on the small subset that we have described above, they are good enough to stress the point we are trying to make. We can achieve our objectives by taking the actions one after another. The actions that we take to achieve our objectives are steps in the planning or states in the state space to reach to a final state. For example we may check for IP addresses of sending and receiving machines and learn about which malicious user’s IP addresses are to be blocked. Now we can execute an action ‘block that IP address”. Please note that it is compulsory for us to know the IP address the attacker is using to block it3. It is an important prerequisite. Thus when we need to plan blocking the attacker, we must plan it in a way that we get the IP address first. Thus for taking typical actions, we must also satisfy a few prerequisites.

 

Some of the objectives are achieved before we reach to the final state. For example we can make sure only authenticated users can access the data by providing some authentication system using username and password. Only when the user gets authenticated, he can proceed further. Thus authentication is the first objective that we achieve. Preventing user from viruses and other attacks may be our other objectives which are not yet met with so we aren’t at the final state yet.

 

Some objectives, especially the one with the idea that no legal user can be denied of the right to access the data he is authorized to access, cannot be achieved without some additional measures, for example even when somebody is authenticated and thus also authorized, he/she cannot demand data with the speed that it swamps the server. The agent must take care of that as well and plan accordingly.

 

The other problem is that complete satisfaction of all the objectives is hardly possible. In most cases the designers have to decide some acceptable measure of performance. Sometimes this is denoted as soft constraints (which can be partially met with) as compared to hard constraints (which must be met with). An interesting parallel can be drawn from a problem of academic domain of constructing a time table. If an expert system is given the problem to generate a time table, it might be given some hard constraints like “A teacher cannot take two classes simultaneously” which you cannot break for successfully planning the classes. Another example is “A teacher should not be given two consecutive classes” is an example of soft constraint, if a designer can design a time table when some teacher occasionally get two consecutive classes, it is fine. It is sometime called preferences rather than constraints. We prefer a timetable with no consecutive classes for any given teacher which might not be met with in some cases.

 

Finding out soft and hard constraints and also the measures of performance criteria are important part of such dynamic planning problems.

 

3 It is not the IP address of the attacker but IP address the attacker ‘is using’. This is important to note that in most cases, the attacker bounce of attacks from other, usually innocent but compromised, machines

 

Types of planning

 

Planning can be done in roughly two ways. A simple planning omits details and outlines the process. Simple planning can be done in short duration but can lead to problems due to oversimplification. Some processes, when details are worked out, found to be impossible to be implemented or contradicting with others. A detailed planning may be a better option. Though it is better as it explores the region in a far deeper sense, it sometimes takes inordinate time. Our discussion about neighbourhood function is module 6 becomes more relevant here. The sparse neighbourhood function involves simple planning while dense neighbourhood involves detail planning.

 

Another type categorizes planning based on if the universe is predictable. For example if a planner is solving an 8 puzzle problem, where the agent slides few tiles one after another, it can clearly predict correct position of each tile after those moves. The agent can continue planning till it finds the goal state. If it follows the same plan during actual play, it is guaranteed to achieve the result. Unlike that, if agent decides to take a set of moves in case of an automated driving car, the situation may be very different than what the driver has predicted. Thus planning when the universe is predictable is easier than otherwise. Another example is of unpredictable environment is chess. If the chess playing agent decides to take a specific action (a move) the result may be quite different than what the agent has predicted as the result also depends on opponent’s moves.

 

Another categorizing point is whether the world is changing while the agent is deliberating over to take decision. In case of chess, if the agent takes a few minutes to decide, the world (the chess board) is not going to change. Unlike that, an automated car case, the situation may change. For example the agent starts deliberating to take a right turn when there is no vehicle overtaking. When the action is taken little later, it is quite possible that some other vehicle is actually overtaking and taking a right turn might lead to an accident.

 

Another important point is whether the world is completely visible while planning. For example in case of Chess, nothing is hidden and the world is completely visible. For automated driving also, when there is clear visibility the driver can see everything in his surrounding and the world is thus visible. The agent can take decision based on information that it has. It is not possible always. For example in case of playing cards, we generally do not have any idea what type of cards others have and have to assume a few things before making a move. Such a case where the world is not visible completely, one will have to assume and progress according for planning4.

 

Agent Based Planning

 

A diligent reader might be thinking, we have been talking about graph based search methods till the previous module and now suddenly we are discussing similar things using agent based representation of the problem. We are discussing about agent finding its way through state space by taking actions one after another. This process is basically a superset of our graph search approach so far. Till this module, we assumed that moves are taken without really introducing the action. While introducing agents, we also have introduced actions, which results into moves, we have become more explicit. Thus we no longer say that applying rule X has changed state S1 into S2. We say that applying action A which implements the rule X, moved the state from S1 to S2. Also, when we decide actions, we also associate time with them. We no longer assume the move is instantaneous, we say that the move takes some time. In some cases we also assume and expect some changes in the environment during the move is taken (for example self driving car). In fact one can also associate an

 

4 We are not discussing this, but this is a case of dynamic planning where the plan changes as we go along. The plan is not fixed and flexible to act as per the outcome of the previous move. agent program which can execute when an action is taken. Thus whatever the designer has decided to do when the action is taken, (for example actually moving the piece when the chess playing program decides to move), the agent program actually performs it. Thus agent based reasoning is more detailed and explicit. Also take an example of changing world, the time during the decision is deliberated over, whatever changes occurred must be considered for taking a final call; for example the agent might also have a final look at the right mirror before taking a right turn.

 

Agent based planning can be done in two different ways, first, based on previous experience and second, taking a fresh look. Whenever humans encounter a situation which demands planning, they look for similar situations encountered in past. For example when a teacher is given a job to organize a conference, he would take inspiration from his earlier attempts to do similar things. If he has organized a conference before, he would take previous experience and start planning accordingly. If a fellow teacher has organized and he helped, he takes that experience into account. If he has not organized a conference but a small symposium before, he might use that experience but with more caution. The point is; a human planner has many such cases from his past experiences stored in the repository of his experiences. Whenever he encounters a situation where he has to solve a problem, he will try to match the problem situation with any of the cases that he has seen before. If it matches, he will simply apply that case. If it matches partially, it will try to apply the case as long as possible. All in all, he will map whatever he learned in past experience and the model that he perceived during his previous experience into his new assignment. This process is called case based reasoning.

 

One more similar example can be cited about intrusion detection systems, when the Intrusion Detection System finds a new attack not seen before, it tries to match the symptoms of the new attack with other attacks it has seen so far and try to thwart it using some known methods for those problems in this case. Thus case based reasoning is applicable in any situation where there is a complete or partial match between new and old situation.

 

Different approach of reasoning is needed for a fresh case. If the planner has to face a situation which he has no experience of, he has to plan afresh. For example when the planner encounters demand from a foreign visitor to attend the conference, he might need to plan and act based on classical planning method. He has to look at all possible alternatives and decide the actions based on the objectives. For example he might need to go and check if some hotel is possible to be booked for a foreign visitor. The earlier planning about the hotel might just not work. He might also need to inquire and take permission from the ministry to make sure the visa process go in a smooth way. This is an entirely new thing he never has encountered so far. He might need to work on many other aspects which might turn out to be worthless later but he has to explore them. Thus he has to follow a conventional search method for those problems.

 

In fact the intelligent agent will always learn from its experience and get more and more cases for it to reason with. It can also continue refining its learning over a period of time. As the cases are stored in the memory and matching is also performed in the memory, this approach is also known as memory based planning.

 

Forward planning

 

As mentioned earlier, it is possible for an agent to start from the start state and plan to reach to the end state. This process is known as forward planning or forward state space planning. That is quite similar to graph search methods we have discussed so far. The important difference is that even when problem is irreversible, planning helps us try alternate solutions and pick up better ones.When we describe planning in the context of agents, we expect to get the specific percept sequences and generate action sequences based on them. Our aim is to get the complete percept sequence which leads to final state or if not, as near to final state as possible. Based on our discussion of previous module, we can easily understand that sometimes the action sequences are to be in strict order, sometimes in partial order or sometimes can even be carried out in parallel. For example it is must for an automated driver to press the clutch and only then change gear; an example of strict ordering. The automated driver might start AC after changing the gear or after that, only after starting the car; an example of partial ordering. The driver might start the AC and also start playing the music player at the same point of time; example of acting in parallel.

 

When agent decides to pick up specific action sequences and apply them, it will change the world from the state it is in to another state. For example applying accelerator changes the speed the car, not only that, it also changes the location of the car with respect to the rest of the world. If this makes the vehicle reach the destination, the resultant state is a goal state and thus the action sequences that the agent chose are correct.

 

The search algorithms that we have described in the earlier modules can still be applied in the planning. In Agent based reasoning, each state can be thought of a combination of various components, and each move is an operator changing one of those components. For example an automated car can be represented by many components including its location and speed. Applying accelerator operator (pressing the accelerator paddle) changes these two components without changing others (for example does not change the temperature inside the car). Applying some other operator changes something else. For example applying AC operator the temperature of the passenger compartment can be controlled. This operator, obviously, has no influence over the speed of the vehicle. Thus different operators inflict different moves which in turn change some components of the state, in a way generating a new state from the current state.

 

The operators may be applicable in different situations based on preconditions like production rules. For example changing the seat position is not allowed while the car is moving; or changing gear is not possible when the clutch is not pressed. Thus for adjustment of the seat, the precondition is that the car is stationary, and the changing gear, the precondition is that the clutch is in the pressed state. The move in our state space search is an operator applied to a specific state; we call it action here.

 

When an agent encounters a specific state, it decides actions which can be applied on that state based on preconditions. It can also decide about successor states based on application of those actions. Let us reiterate the search process based on Agents. A forward state planning begins from a start state, decide various actions possible to be applied on it, decide successor nodes, search through them and generating additional successor nodes in the process and proceed till it get the final state.

 

Let us take an example of a blocks world problem once again. We are picking up a typical problem now. This problem is specifically contrived to address the need. It is little different from the one which we have seen in module 7. State space and rules are the same as we defined them in module 7 ∀    State, Space:Xϵ(A. .- G, −) (∃Y ϵ(A. . G) On(X,Y) ) V Free(X)

 

We have initial and final states as shown in figure 16.1. If you explore the initial state, you will get some states as depicted in figure 16.2. There are many possible states which are not shown in 16.2. In one specific move, you can remove the block on the top and place it on the table, in another specific move, removing the block on the top and place it on any other stack of block. You can see that if we do not use any heuristic, the search will proceed like this, many output states from each move, and a huge number of moves to be processed before reaching to a goal state.

 

We can clearly see that the search space is operating without much sense of direction and branching factor for this case is quite overwhelming.

 

Backward planning

 

Backward planning or backward state space planning is exactly opposite to forward planning. It begins from the goal state and reaches to the initial state. The problem of the forward planning is that it is unaware of the direction of the goal and try in all possible direction to reach to it and thus results into combinatorial explosion.

  • 5 On(X,Y) in our case indicates Y is resting on top of X. Later we will use On(X,Y) to indicate X is on top of Y. This is done purposefully to indicate that our interpretation depends on the context and thus both forms are valid. One can choose any form he/she likes, but must remain consistent for that example.

 

Backward planning is sometimes better as we start with the goal and thus the visibility of the goal helps us choosing the moves. Backward Planning is also known as backward reasoning or goal directed reasoning. It is preferred when the branching factor in reverse direction is less than in forward direction.

 

If you carefully look at the problem that we have described, it is clear that it is easier to start from goal state and get four different components of the initial state. Only one state from where the goal state is possible to be achieved is the block A being on top of table and rest is same as the goal state. (thus only one branch) This is the only possible state from where the goal can be achieved. Thus moving back does not encounter the amount of choices that we have in forward reasoning. The example is carefully contrived to emphasis a case where backward planning is better. The goals state is so designed that when we start from it and start stacking the blocks removing from the only structure present, it automatically forms the initial four structures. Also, during un-stacking process, it is also easier to build components of the start state as they come in the order in which they are to be stacked.

 

All in all, our example is ideal for goal directed reasoning.

 

One more reason to use backward planning is when there are many known states possible to be reached from goal state. When we have a choice of either moving from a known place to an unknown place or moving from an unknown place to a known place, which one will you prefer? Moving from unknown place to a known place is an obvious choice. It is easier as we can always find an intermediate place from where we know the path to the start node (the known place). Thus reaching to any one of those nodes we can always find the start node. If we want to reach to the unknown place from the known place, we have to be exactly there to reach there. We have no sense of direction and we need to try in all possible directions. The same thing happens in the case of backward reasoning. If we start from goal state, we will have to reach to any intermediate state which can be reached from start state. Once we find any one of such states, going from start state to that state (or moving back from that intermediary state to start state) is known to us and thus it is equivalent to reaching the start node. Thus we start moving back from goal node and when we reach any one of the intermediary nodes, we are done.

 

In forward planning we have one advantage though. The backward planning may not work in all cases. We usually know exactly how the start state looks like and we can begin from that and see if we are moving nearer to goal state after each step. We cannot say the same thing about the final node in a few cases. There may be multiple final nodes and we may not know where we are going to reach; which of the many final states is the one that we are to target. Consider Chess. We know the initial state of any Chess game is the one with black pieces on one side and white on the other. The position of all pieces is also fixed. It is easier to start from the start state. Goal state can be many. Unless we know the exact goal state, it is impossible to come back from it. In fact even for a smaller domain, a case with only a few pieces on the board (a typical chess challenge frequently appear in newspapers), the goal states can be many and it is always easy to start from the start state provided in the puzzle. Another example is when we are looking for a typical place, a restaurant or a movie theatre or something similar. In fact we should begin from the start state as it is easier in that case too.

Figure 16.3 individual goal components are interdependent sometimes.

 

Testing to see if in right direction

 

One way to test if we are nearer to goal is to assess how much part of the next node resembles or matches with the description of the goal node. One can use that as a measure to choose the path ahead. Please note that this is not the heuristic in true sense as it is not based on the domain knowledge. It just checks if the current state has significantly increased in similarity with the goal state.

 

Sometimes the goal is divided into parts, consider the problem of finding path from our home in Ahmedabad to Mumbai, we have seen that we need to pick up the most critical part and solve that first (Book the train tickets from Ahmedabad to Mumbai). Other parts of the goal depend on that choice. The planning which is unaware of such interaction and criticality of the different components of the goal cannot solve the problem to user’s satisfaction. Blocks world is also a similar domain. For example if we want to achieve On (A,B) and On (B,C) and OnTable (C). Achieving On (A,B) first will not solve the problem. On (B,C) will not solve the problem if C is resting elsewhere and not on table. That means just putting A on top of B if B is not on top of C won’t work. Same way, B on top of C will not work if C is not on table. The most critical part is to have C on the table and then achieve other parts of the goal. Figure 16.3 depicts the case. Thus whichever form of planning is used, one must take care of the dependencies of goal components. We will explore this point further in the next module.

 

Choosing between forward and backward reasoning

 

We have already seen some reasons for choosing backward reasoning from conventional forward reasoning. Here is a summary.

 

1. It is always better to move from smaller number of alternate start or goal states to larger number of alternate goal or start states. We have already seen that it is always easier to start from an unknown goal state to reach to a known start state as there are many other states from where we know the path to the start state.

 

2. What type of problem is being solved? A database or a big data inquiry requires us to reason back from query to a solution. A game playing problem where the input from opponent decides the next outcome, only forward reasoning makes sense.

 

3. If there is a need for explanation facility, backward reasoning is a must. For example if the expert system suggests that the engine of the car is to be replaced, the mechanic would like to learn why. The program can only provide the explanation if it can reason back from the answer it gave. Sometimes the explanation needs to be provided in forward reasoning manner, where the forward reasoning is preferred, but generally queries on the output can only be handled by providing explanation from the back. Another example is when a patient is given the verdict that he has a fatal disease (for example leukaemia, the blood cancer), if the medical diagnosis system does not respond back how it has come to this conclusion, it would be impossible for the patient to accept the verdict.

 

4. The branching factor, if different in both directions of movement, a movement with lower branching factor yields the answer faster and thus preferred. Our example above is one such case. Some theorem proving programs also found to have used this and providing proofs. Many cases in mathematical proofs begins with negating what we want to prove, move forward and show that the result contradicts with some known fact. This is also similar to going from an unknown place to a known place being better.

 

5. How the solution is to be implemented. Some program environments; for example ones which involves programming language PROLOG, does backward reasoning. It is because the predicate logic proof is provided using backward reasoning and PROLOG is based on predicate logic. Unlike that, agent based solutions which are coded in Java, can be naturally coded in forward reasoning manner. Event driven programming, which provides reactions to various events based on when and how they occur, is also a norm in most GUI based and object oriented systems for coding. When such systems are used, forward reasoning, proceed with sequence of events and change the state accordingly, is better.

 

Summary

 

Planning helps converting non-reversible problems into reversible ones; i.e. help us undo what we are doing if the current move found to be worse that what we expect it to be. Planning involves two things, what are our objectives and how we are going to achieve them. The plan is a sequence of actions which results into a final state keeping in view that objectives are fulfilled; for example choosing a shortest route. There are many objectives and in most cases the agent is allowed to fulfil most of them in the most acceptable manner rather than satisfying all of them. There are a few ways the planning process can be categorized, simple and detailed, for a changing world or not, for a completely visible world or not, and so on. Some of them involve more difficult type of programming.

 

Agent program implements action which results into application of move, which in turn changes one state into another. Agents’ reason is based on two things, one, looking at the cases of the past experience and another, conventional methods using search. As the agent continues to learn, it has more and more cases to match with current situation. Both forward and backward planning are used in practice. Judicious deliberation of the problem and other factors is a must to choose right type of planning for a given problem.

 

___________________

 

A diligent reader, by now, has probably realized that there is a need to find if we are going towards solution or not. Even without using heuristic, using goal directed reasoning and coming back from the goal node, one can easily place each block either on the floor or on top of some other block in a way that the start state is reached. In our case the start state consists of multiple sub states which are easier to be achieved from goal.

 

One good alternative is to combine both forward and backward reasoning. Some systems adopted this approach. One example is to start from both ends together and match them somewhere in the middle.

you can view video on Forward and backward state space planning