8 Genetic algorithm & Travelling salesman problem

Bhushan Trivedi

Genetic Algorithms

Genetic algorithms were not the usual content for AI as a subject and quite a few AI syllabi today do not contain genetic algorithms. Soft computing is a new emerging discipline which is a better candidate to have genetic algorithms.We will not discuss the difference between soft computing and AI further. GA (Genetic Algorithms are also popularly known by this abbreviated name) is included here because it is an excellent method of solving some very interesting AI problems. One of the most important problems the AI algorithms face, combinatorial explosion or unsurmountable number of options to choose from, is nicely handled by GA. Those who want to solve complex problems they should not overlook GA1.

GA is quite different than the algorithms that we have learned so far. It is not based on heuristics or using search algorithm for moving in a typical direction. The algorithm begins with some random solution states (like Generate and Test). It chooses better solutions states based on some criteria of ‘goodness’. The function which determines the goodness of the solution state here is not called the heuristic function but a fitness function. Unlike other algorithms, GA does not choose from available solutions, it combines ‘good’ solution states to produce next generation of solution states which are more likely to be better than the first generation. It also applies the fitness criteria to the new generation of solution states to retain only the best. GA continues this process until the solution state with required quality is achieved. Thus we will not have a search graph where we start from a start state and reach to a final state traversing through states by applying valid rules. We start with a solution set which may not be the best, we modify that solution set somehow to get a better state and continue till we get the set which we want. For example we may start with a random sequence of moves for missionary cannibal problem and continue modifying it till we get the sequence which works. If you think that we are discussing something quite similar to the variable descend, it is not, wait for a while to see the difference.

GA is based on the idea of how life forms persist. All life forms use some strategy to survive. Every species of a typical life form has a clear cut policy for survival. For example deer, even while busy grazing, keep their nose continuously smelling and ears continuously listening to the surrounding. At a smallest hint, they start running away from the danger. Obviously, the deer which smells and hears better and run faster have better chances of survival. When these better deer mate, they have more chances of getting children with the better ability to survive. “Survival of the fittest” is the mantra of life given by Darwin long back in 1959. Those who protect themselves better from the hazards of the environment have more chances of survival. For example faster carnivorous animals can catch their prey and slower ones miss out and die of starvation. Faster preys survive attacks with higher probability than the rest. In fact not only faster but smarter (which can smell and hear better and also better decide the direction to run or strategy to defend) animals have higher chances of survival.

Animals who are better survivors, theiroffspring are more likely to inherit their good virtues and likely to be better than the parents. Over the years, more and more fitting generations are produced.

Another important point is that the resources available are always limited and species compete with each other to have as many resources for them to survive. There is a competition within the spices (Tiger and Lion herds have their allocated area and they do not allow other tigers or lions within that area for example). The Darwin’s rule is more important in the context of limited resources. When the resources are limited, only a small population can survive and naturally only the best can do so. Another important principal of innovation is that nature allows mating between two individuals to produce new individual who inherits from both the parent. The process of inheritance (unlike our object oriented languages) involves lot of randomness in the sense that two individuals with same parents inherit differently, sometimes extremely opposite to each other. This randomness allows nature to escape from monotony and allows to get better offspring than the parents themselves (those who inherit better features from both) or otherwise (those who inherit not so good features from their parents). As fittest have more chances to survive, the first type of offspring are more likely to survive and thus ready for better next generation.

Thus two important principals of life are as follows

Better offspring have more chances of surviva
When better offspring mate with each other, they are more likely to producegenetically better offspring

Genetic algorithms are based on this philosophy of life. GA is about writing code to make sure only the fittest survive from the set of possible solutions and then mixing those fittest elements to produce newer elements randomly. The newly generated elements are also need to pass the fitness test. Those who pass only exist in the next generation. Again these fit elements combine and produce fitter elements and so on. Eventually the process generates the set of elements most suited for solving the given problem.

The Genetic Algorithm work like this.

Decide the initial solution set CurrentSS(usually a random set)
Check if CurrentSS is the required solution set, if so quit
Continue till the set with qualifying criteria is received
1. Apply fitness function to all elements of the CurrentSS
2. Choose elements from the CurrentSS to produce RestictedSS based on fitness function values (choose the best)
3. Alter RestrictedSS into NewSS by
Generate offspring of random parents of RestrictedSS
Replacing old elements by these offspring
Call the result NewSS
Make CurrentSS=NewSS

The process of altering may happen by more than one way, by choosing different elements to be combined and also combining attributes of parent in a different ways.

As the GA took the inspiration from Genetics, it also borrowed many words from Genetics. Let us look at the jargon first.

The collection of species is called a population. The collection of solutions that we start with is also called the same. That means our CurrentSS in the beginning is the population. The population is replaced by next and next generations until we get a population of what we really want. This process is depicted in the algorithm by generating NewSS and starting all over again considering it as CurrentSS till the qualifying criteria is not reached.

The genetic map of the individual is called a chromosome. Chromosomes contain DNAs. The genetic footprint is carried by genome. Gnome is encoded in DNA. DNA in turn contains genes which are basic building block of the characteristics of the life form for example the color of skin and eyes to ability to climb trees and swim and so on. Reproduction process helps the genes to carry from the parent to an offspring as DNAs are replicated in the offspring. In fact the offspring’s genome is a combination of genomes of its parent.

That wasn’t bad? Isn’t? As long as possible we will not be using this jargon in subsequent discussion except for the word chromosomes.

The GA are set of algorithms which generate solutions by starting with a random set and evolving and combining those solutions. The chromosome in GA is basically a string and thus population in GA is collection of these strings. The genetic operations for altering is done by various operations on strings like cutting some part of the string and pasting it in another. The algorithm also has some function which takes the string as an input and evaluate and return the fitness value.

How chromosomes are combined to produce next generation? By applying three basic operations on pairs of them described in the following.

Basic operations

The operations which help in alteration are three. First one selects the best members of the population, second recombines them to produce next generation and third called mutation which generates children in an abnormal way by combining genes using some irregular pattern. Let us describe them in little more detail. They are called operations in genetics parlance.

Selection

In real world each individual (called phenotype in genetics) is sent out to survive on its own. It has to compete with others and reproduce for survival. If GA is designed like that, the candidate solutions have to be sent to the artificial problem world to see if they survive. Such an approach is used by some but not most as it takes inordinate time.

A more suitable approach for computer domain is have a fitness function which can test the ability of the solution to survive. It is also known as optimization function when people from optimization domain uses the GA. The fitness function is applied to every candidate solution to determine the fitness of every candidate2.

The selection operator only chooses better candidates based on their fitness value and also take more copies of better candidates in the final selection. The candidates which are not very good might have a few copies of them while worse candidates may not be copied into the final selection at all.

Recombination

The recombination operator does exactly what it indicates. It takes pairs of parent and combines them to generate offspring. These offspring inherit genes and thus attributes of both the parents. The current generation is input to and next generation is the outputof the recombination process. Once the recombination process gets over, the population has entirely new set of members. This is in contrast to real world where the new offspring is added to the existing population. Here the next generation completely replace the old generation. Sometimes though some of the best old generation elements are cloned to be kept in the new generation.

Mutation

Mutation is a process of adding randomness to the process. In genetics, mutation changes the properties of the gene on permanent basis. In genetics external influences like radiation is sometimes responsible for mutation of normal genes. The genes are distorted to produce offspring with very different properties for example people with no hair or with three eyes or heart being on the right side or something similar. Usually such drastic changes make the life very difficult for that individual. The mutation process can sometimes results into better genes though. The GA applies mutation process in the same sense as the simulated annealing process allows bad moves. Thus mutation process changes genes drastically but provides and escape route as well. Like simulated annealing, if the mutation is applied in a controlled manner, it provides sufficient randomness to get a better solution.

Traveling salesman problem

To learn how these operations are actually applied, let us take a very well-known problem called travelling salesman problem. This problem is about sending a travelling salesman to visit a few cities, only once, in a way that all cities are visited using shortest possible route. However simple the problem looks like it is really a very challenging problem even with reasonable number of cities. The same combinatorial explosion problem come back to haunt in this case. This is so interesting a problem that there are many researchers tried to find different strategies to solve this problem. Many heuristics are proposed, one of them called the nearest neighbor algorithm. That algorithm is quite simple.

2 In some sense, this is a heuristic function telling how good this state is compared to others.

Pick up any city at random. Solution_List = that city.
Pick up the nearest neighbor of the current city not part of solution list
If no neighbors exist, quit with the Solution_List
Otherwise Solution_List = Solution_List -> nearest neighbor (add new city in the list)
Current city = nearest neighbor
Go to 2.

This heuristic is quite useful and produces very good answers (i.e. quite close to optimal). Unfortunately, like other heuristics, we are not guaranteed to have an optimal solution. There are many other methods proposed which we are not going to discuss here but some good books are written solely on this problem and various solutions which the reader might consult if need to learn more.

One method to solve this problem is, off course, GA. Let us see how GA can be used to solve the TSP problem. Before we proceed further, let us understand that TSP is actually a type and many similar problems can and are described as TSP. For example, if we are interested in connecting different points over the circuit board using shortest path, it is also a TSP. Figure 8.1 shows the travelling salesman problem for 7 cites. You can see that each city has 6 outgoing roads. Though we have assumed only one direct rode from one city to another, in reality there might be more. Also, sometimes the outgoing and incoming roads may not be same due to strangely laid one way lanes, but we are ignoring that as well.

Look at the figure 8.1. This figure shows how 7 cities are connected to each other using direct roads. Suppose salesman has to travel to all these cities, can you suggest him the best tour if we provide distance to all paths? Each distinct path is a tour which covers all cities. For example A,B,C,D,E,F,G is one tour and A,C,B,D,E,F,G, is another tour. The graph is not a directed graph and thus a reverse path is the same path with no difference to the total distance covered.ThusA, B, C, D, E, F, G is the same tour as G,F, E D, C, B, A. Total number of paths between any 2 cities is going to be (n-1)! for n cities,but we do not consider the reverse path as different so effectively they turn out to be half of (n-1)! =( )!. In our case ! = = 360 total tours possible for the salesman. If we write a program to generate tours one after another and find the best tour out of the list, it is manageable but not easy. If you take a normal case of ! 26 cities, the poor salesman has number of alternatives which comes out to be

7755605021665492992000000, impossible to be managed in one’s lifetime. Would you like to take an optimized world tour?

Figure 8.1 a travelling salesman problem for 7 cities

Solving the TA problem

How can one solve this problem? Let us probe deeper and try to understand. In above case, we may begin with one such solution set shown below. Please see that the last city is the same as first city as thesalesman is expected to come back.

A->B->C->D->E->F->G->A

There is a very little chance that this solutions is the optimal solution. If it is not, we would like to move to another solution which is better. How can we judge? If we are using the hill climbing process, we may move from this sequence to other sequences possible to be generated using some rules. A simple heuristic (total amount of distance to be covered) can be used as a heuristic function and we would move along the path where such value is lesser3. The rules to generate other set of sequences may be based on altering the route in some way. For example we may decide to exchange positions of two adjacent cities.

For example

A->B->C->D->E->F->G->A

Has four children.

A->C -> B ->D->E->F->G->A, A->B->D -> C ->E->F->G->A, A->B->C->E -> D ->F->G->A, A->B->C->D->E->G->F->A

3 This is basically valley descending. If we take the negative value of total path to be covered as the heuristic function value, it becomes hill climbing.

Now we can find the complete distance of the tour and pick up the best. After that, we can change any two cities from that and pick up the best.

This discussion also relates to something that we have discussed earlier. We discussed about dense neighborhood and sparse neighborhood functions. Above is an example of sparse neighborhood function. If we take 4 cities to exchange instead of 2, it has more neighbors and thus become denser. If we allow all 7 cities, all permutations are possible in the first move itself and it is most dense neighborhood function.

We have chosen to exchange two adjacent cities only. If we relax the constraint and exchange any two cities, again the function becomes denser as it has more neighbors now.

One question, if we allow 7 cities exchange, how many neighbors the solution has? Same as we discussed earlier, 360. Above function, the sparse neighborhood value of 4 children, the most dense neighborhood function has 360 neighbors. All other neighborhood functions have some value in between. A variable neighborhood descent will decide how dense the neighborhood function along the path and change the neighborhood values one after another depending on the requirement.

GA to solve TSP

However, our interest is in discussing how GA is applicable in this case. Let us try to understand.

First, instead of applying heuristics to each state and choosing the best, we will have a set of better solutions. We, in the next iteration, combines these parents randomly and generate next generation of solutions. We will try to combine good elements of parents in search for a good solution.

One may wonder how to start. If there is no other clue then the best way to start is to use random parents. If we have some parents already, we may apply fitness function to determine the best parents out of that lot. In this case, one can start with few random parents (say 20), find their fitness values, and choose better ones (say 5 better ones from those 20). Now combine these 5 offspring to generate 30 offspring (how? We have 15 distinct pairs of 5 offspring, each generate two children), apply fitness function to find 5 fittest from that list of 30 and so on. One may continue until the fitness function returns the best possible value (the least possible distance).

The process to generate new solutions from old is done using method known as crossover. Obviously the crossover must not change cities themselves but only order in which they are visited. One simple method of crossover is to do cut some portion from one parent, replace it in the second parent to generate first child and exchange the remaining parts to produce another.

Above method which picks up a point and all data beyond that point or prior to that being exchanged with the other parent is known as single point simple crossover.4 Unfortunately though this works in many other cases, it is not going to work here.

4.It is also possible to have two point crossover method as well as multiple crossover methods. A multipoint crossover cuts and joins at more than one places. There is one more cut and splice method for a crossover. The cut

For example if we have chosen two best solutions as follows. (They are considered good parents may be on the basis of the fact that they require lesser distance to cover compared to other tours)

A-> C -> B ->D->E->F->Gand C->D->B->G ->A->E->F

(Now we aren’t showing the last city being the same as first. First, it can always be assumed and anyway we cannot change or alter that as well)

Now if we combine these two parents just by replacing first four cities of the first tour with three cities of the second and vice versa, we get following.

A->C -> B ->D->A->E->F and C->D->B->A->E->F->G

None of which are legal as the cities are being repeated and some of them are not visited. Neither of the children represent valid tours.

We will have to define the crossover function which preserves the constraint of each city must be visited once and only once. In fact, even with this constraint in place, we are still in position to apply crossover in many ways. Let us discuss some common crossover methods5.

PMX (Partially Mapped Crossover)

First crossover method is called PMX or partially mapped crossover. Two points are selected from the parents in this method. The part between point-one to point-two from parent-1 to is copied to the child-

Other elements of child-1 is picked up from the parent-2 using a mapping that derived from this copy process only. Both the points are known as crossover points.

Thus partial parts of parents are copied to children, hence the name. We cannot blindly map the rest (i.e. we cannot apply single point crossover) as we know that it is not going to work. So, in PMX, the rest is filled up based on the mapping implicitly generated by that copy. Let us take an example to understand.

Let us take the same parents parent 1 as A->C ->B ->D->E->F->G and parent 2 as C->D->B->G ->A->E->F. The underlined part represent the area between crossover points. Now let us take three middle elements (from 3rd city to 5th city, the crossover points) for mapping (partial mapping as shown in underlined part) to generate two children Child 1 as *->*->B->G ->A->*->* and Child 2 as *->* -> B ->D->E->*->* where * is to be replaced by valid city values.

The first child is *->*->B->G ->A->*->* with the middle three values picked up from second parent i.e. B G A. Other values are to be picked up from the first parent. First parent has B,D and E in the same place and splice is different from the point crossover method is that the length of the children do not remain same as parents. None of which are applicable to TSP.

5 We are discussing crossover method for the representation that we use here. The crossover methods depends on the representation heavily.

where B G A appears in the Child 1. Thus the partial mapping is B ó B, DóG, and EóA., The first item in the first parent is A, we can replace it by E as per this mapping in Child 1. Last item is G which we can replace b D looking at the mapping. That makes it E->*->B->G ->A->*->D Other two members (C and F) do not produce any conflict so we can just copy them from the first parent so the First child becomes E->C->B->G ->A->F->D Similarly the second child becomes *->G -> B ->D->E->A->* once the mapping related replacements are provided and finally become C->G -> B ->D->E->A->F considering C and F as it is.

In fact our problem is simple otherwise longer chain of mapping may be required. For example we have replaced C by G and we might also have G being replaced by some other character and so on. We have to continue that chain till the character not considered so far is returned.

OX (Order Crossover)

Another method is called order crossover. It is a method where the order of the parent is maintained in the children after filling the child using a substring from another parent exactly like the PMX.

Let us see how it is done. Let us consider the same set of parents as well as crossover points. The first step is the same as the PMX, have a substring from one parent copied into the child. Thus both children look like PMX. Child 1 = *->*->B->G ->A->*->* and Child 2 = *->* -> B ->D->E->*->* where * is to be replaced by valid city values.

Now child 1 has a substring from parent 2. The rest of the element is to be copied from parent 1 in the same order. The first parent is A->C ->B ->D->E->F->G(after cancelling the cities already part of Child 1). The rest is to be filled in the same order in which they appear. Thus first * will be replaced by C, second by D, third by E and fourth by F and the child 1 becomes Child 1 = C->D->B->G ->A->E->F

Similarly for second child the first parent’s 3 values are copied so the second parent remains as C->D->B->G ->A->E->F so we need to replace the *s with CGAF that makes the child 2 as Child 2 = C->G -> B ->D->E->A->F Interestingly Child 1 is same as parent 2. That happens sometimes.

CC (Cyclic Crossover)

Third method that we are going to study is called Cyclic Crossover or CC. To understand CC, we will take some other set of parents (you may try using earlier set of parents to see why we chose the new set, a short answer is, they yield same parents as the output!)

Parent 1 as A->B->C->D->E->F->G->H->I

Parent 2 as D->A->B->H->G->F->I->C->E

Child 1 takes first value from parent 1 as A,

Child 1 A->*->*->*->*->*->*->*->*

corresponding value for A in parent 2 is D (at first position), that occurs at position 4 in parent 1 so picking up that now makes child 1 as

Child 1 A->*->*->D->*->*->*->*->*

corresponding value for D in parent 2 is H (at fourth position), that occurs at position 8 in parent 1 so picking up that now makes child 1 as

Child 1 A->*->*->D->*->*->*->H->*

corresponding value for H in parent 2 is C (at 8th position), that occurs at position 3 in parent 1 so picking up that now makes child 1 as

Child 1 A->*->C->D->*->*->*->H->*

corresponding value for C in parent 2 is B (at 3rd position), that occurs at position 2 in parent 1 so picking up that now makes child 1 as

Child 1 A->B->C->D->*->*->*->H->*

Corresponding value for B in parent 2 is A which we have already covered. Thus input from the first parent is over. All remaining elements from the second parent is to be copied from the specified parent. Thus we get

Child 1 A->B->C->D->G->F->I->H->E

Child 2 becomes

D->A->B->H->*->*->*->C->*

D->A->B->H->E->F->G->C->Iafter placing cities from parent 1

These three type of crossovers try to preserve order or position or both in the result.That means if the parent has good ordering or position of cities they are more likely to be preserved in the children and the new tours generated as children from them are more likely to be better tours.

Other Representations

We have described tours in a form known as Path Representation. There are few other representations also possible to be used. One of them is known as Ordinal Representation and other one is known as Adjacency Representation. Ordinal representation is little difficult to understand but it is possible to use simple single point crossover on it. This method is better for solving large problems as the crossover operation is faster and thus it makes the overall operation faster.

It is not possible to use a single point crossover with Adjacency Representation but there are other advantages. One of the important crossover method to be used with adjacency list is called heuristic crossover which is useful if the designers have some idea about the domain knowledge and can draw a good heuristic from it. Moreover, Adjacency Representation makes it easier to inherit from parents. We would not look at more details about these two representations here.

There are many other things one can say about Genetic algorithms but this is not the place. For details a book “Genetic Algorithms + data structures = evolution programs” third edition by Zbigniw Michalewicz, Springer International Edition is recommended. One more good reference is “Genetic Algorithms in search optimization and machine learning” by David E Goldberg, Addition Wesley Longman (nowPearson).

Summary

Genetic algorithms are not generally considered part of AI but it is an excellent method to solve complex AI problems. It is derived from the philosophy of how life survives and progresses ahead. In simplest of forms, GA helps combine parents to form better children. One can start with a solution set and get a better solution set after each iteration. After every iteration some better solutions from the set is picked up and rest are thrown away. After a few iterations the solution set is more likely to contain better solutions. One of the most talked about problems which is addressed using GA is travelling salesman problem. It is solved using GA by a few variants of representation. For Path Representation, Partial Mapping crossover, Order crossover or Cyclic crossover may be used for combination. Other methods can also solve TSP using GA.

MCQs

GA is different from other search algorithms studied

It does not find best and use it as next
It does not have start state
It does not have end state
It does not have children

GA is based on idea about

How people live
How genes decide the attributes of the children
How parents mate to produce children
How life form persists

3. In Selection

a. The children are selected

b. The parents are selected

c. The genes are selected

d. The crossover methods are selected

4. Recombination process

a. Takes parents as input

b. Generate parents as output

c. Combines again

d. Work on existing population to generate another population

5. Mutation process

a. Generates genetically different children

b. Generates children inheriting from parents

c. Add steadiness to the process

d. Applied in haphazard manner

6. TSP

a. Is a simple problem

b. Is a very structured problem

c. Is a very difficult problem

d. Has many permutations and combinations to consider

7. How may tours are possible for 5 cities

a. 60

b. 80

c. 120

d. 240

8. If we represent TSP as set of solutions, one can have

a. Dense neighborhood function

b. Sparse neighborhood function

c. Variable neighborhood descent

d. Many useful heuristics

9. When a string is cut from one parent and replaces the string of similar size in another than it is known as

a. Single point crossover

b. Order crossover

c. Partial Mapped Crossover

d. Cyclic Crossover

10. When a part of one parent is copied into a child and rest is filled using that mapping from another parent, it is known as

a. Single point crossover

b. Order crossover

c. Partial Mapped Crossover

d. Cyclic Crossover

11. When a part of one parent is copied into a child and rest is filled using the order in which it appears in second parent, from second parent, it is known as

Single point crossover
Order crossover
Partial Mapped Crossover
Cyclic Crossover

12. When a part of one parent is copied into a child in the same position in which it appears in one parent until it finds a cycle, then filling from another parent is known as

Single point crossover
Order crossover
Partial Mapped Crossover
Cyclic Crossover

13. The crossovers help in

Retaining the good qualities of parent
Retaining the order of cities in parent
The position of cities in parent
The sequence of cities in parent

14. Which representation allows single point crossover in TSP problem?

Path
Ordinal
Heuristic
Adjacency

1-a, 2-a,3-d,4-a,5-a,6-a,7-d,8-a,9-c,10-a,11-c,12-b,13-d,14-a,15-b

you can view video on Genetic algorithm & Travelling salesman problem