36 Minimum Spanning Trees –Prim’s algorithm

T.V. Geetha

Welcome to the e-PG Pathshala Lecture Series on Data Structures. In this module we will discuss about Prim’s algorithm for finding out Minimum Spanning Trees.

Learning Objectives

The learning objectives of the module are as follows:

• To understand in detail the Prim’s Algorithm to find Minimum Spanning Trees

• To understand Prim’s Algorithm through a walkthrough example

36.1 Recap Spanning Trees

When a graph G is connected, a depth-first or breadth first search starting at any vertex will visit all vertices in G. A spanning tree is any tree that consists solely of the edges in G and that includes all the vertices.

E(G): T(tree edges) + N (non-tree edges)

Where T is the set of edges used during search and N is the set of remaining edges. The Minimum Spanning Tree or MST for a given graph is the spanning tree of minimum cost for that graph.

36.2 Prim’s Algorithm

Prim’s algorithm was initially discovered in 1930 by Vojtěch Jarník, then rediscovered in 1957 by Robert C. Prim. The algorithm starts off by picking any node within the graph and growing from there. Initially we label the starting node A, with a 0 and all others with infinity. Starting from A, we update all the connected nodes’ labels to A with their weighted edges if it is less than the labeled value . Now we find the next smallest label and update the corresponding connecting nodes. We repeat until all the nodes have been visited

36.2 Steps of Prim’s Algorithm

1. The new graph is constructed – with one node from the old graph.

2. While new graph has fewer than n nodes,

– Find the node from the old graph with the smallest connecting edge to the new graph

– Add it to the new graph

Every step would have joined one additional node, so that at the end we will have one graph with all the nodes and it will be a minimum spanning tree of the original graph. This method is quite similar to Kruskal’s with one big difference. In Prim’s algorithm the tree that we are “growing” always stays connected. However in Kruskal’s algorithm we could add an edge to the growing tree that was not connected to the rest of the tree, if that edge had the minimum cost. In other words we could have a forest. Only at the conclusion of the Kruskal’s algorithm would there be the connected spanning tree.

36.3 Prim’s algorithm:

Set S = Æ.

2) Add the minimum edge incident to that vertex to S.

3) Continue to add edges into S (n-2 more times) using the following rule:

Add the minimum edge weight to S that is incident to S which doesn’t form a cycle when added to S.

Consider the following graph (Figure 36.1).

Select any vertex, say A. Select the shortest edge connected to that vertex. The shortest edge connected to A is B. So, AB is one of the edges of the Minimum Spanning Tree (Figure 36.2 (a)).

Now, select the shortest edge connected to any vertex (that is A or B) already connected. This vertex, as can be found from the graph is E and the edge is AE (Figure 36.2 (b))

Now, again select the shortest edge connected to any vertex (A,B,E) already connected. The shortest edge is ED (2). This is given in the Figure 36.2 (c))

Now, again select the shortest edge connected to any vertex (A,B,E,D) already connected. The shortest edge connected to any vertex already connected is DC(4) (Figure 36.2(d)).

Now, again select the shortest edge connected to any vertex (A,B,E,D,C) already connected. The shortest edge connected to any vertex already connected is EF (5). (Figure 36.2(e)).

Now, all the vertices have been connected. The solution is: AB 3, AE 4, ED 2, DC 4 and EF 5. Thus the total weight of the tree is 18.

36.2.2. Walk-Through

Similar to the walk through we carried out for Kruskal algorithm, we will do the same for Prim’s Algorithm. We consider the graph and the initialization of the array as shown in Figure 36.3(a)). We initialize the array with rows corresponding to nodes of the graph. Associated with each node are three attributes. K indicates whether we have visited the node or not. It can take one of the values T or F corresponding to True or False respectively. Initially K is set to False (F) for all nodes. dv is the distance of the node from previous node and is initially set to ¥ for all nodes. pv is the previous node from which the edge to this node originated and is not initialized (Figure 36.3 (a).

Now we start with any vertex, say D. Now the K value of D is set to T and the dv is set to 0, since this is the start node and in the graph the selected node is marked in red (Figure 36.3 (b)).

Figure 36.3 (b)

Now we need to update the distances of adjacent, unselected nodes. For the node D, the adjacent nodes are C, F, G and E. The previous vertex pv of all these nodesare set to D and the distance dv of each node to D is noted (Figure 36.3 (c )).

Now we need to select node with minimum distance. Of all the edges, DC, DF, DG and DE, the edge with the smallest weight is that of DG. Hence the K value for G is updated to T as shown in Figure 36.3 (d).

From G, the adjacent vertices are H and E. The distances of these adjacent, unselected nodes are updated in the table. The value of the weight 3 for the edge GH can be directly entered with the pv value as G in the row for H. But when we consider node E, we find that there is already an entry for the previous vertex as D. Now we consider both the distances from D to E which is 25 and G to E which is 7. The least of these two is 7 and hence we update the table for row E with pv as G and dv as 7. After the updates, the array is shown in Figure 36.3 (e).

Now we need to select node with minimum distance. Of all the edges, the edge with the smallest weight is that of edges DC and GH. Here we select node C. Hence the K value for C is updated to T as shown in Figure 36.3 (f).

From C, the adjacent vertices are B and F. The distances of these adjacent, unselected nodes are updated in the table. The value of the weight 4 for the edge BC can be directly entered with the pv value as C in the row for B. But when we consider node F, we find that there is already an entry for the previous vertex as D. Now we consider both the distances from D to F which is 18 and C to F which is 3. The least of these two is 3 and hence we update the table for row F with pv as C and dv as 3. After the updates, the array is shown in Figure 36.3 (g).

Now we need to select node with minimum distance. Of all the edges, the edge with the smallest weight is FC. Hence the K value for F is updated to T as shown in Figure 36.3 (h).

From F, the adjacent vertices that are still unselected are B and A and E. The distances of these adjacent, unselected nodes are updated in the table. The value of the weight 10 for the edge AF can be directly entered with the pv value as F in the row for A. But when we consider node E, we find that there is already an entry for the previous vertex as G. Now we consider both the distances from G to E which is 7 and F to E which is 2. The least of these two is 2 and hence we update the table for row E with pv as F and dv as 2. When we consider the node B, we find that there is already an entry for the previous vertex as C. Now we consider both the distances from C to B which is 4 and F to B which is 7. The least of these two remains as 4 and hence we do not update the table for row B which still remains as C. After the updates, the table is shown in Figure 36.3 (i).

Now we need to select node with minimum distance. Of all the edges, the edge with the smallest weight is FE. Hence the K value for E is updated to T as shown in Figure 36.3 (j).

Figure 36.3 (j)

From E, we need to update distances of adjacent, unselected nodes but at this point there are is only one unselected node B. But when we consider node B, we find that there is already an entry for the previous vertex as C. Now we consider both the distances from C to B which is 4 and E to B which is 10. The least of these two remains as 4 and hence we do not update the table for row B which still remains as C. Hence the table remains unchanged as shown in Figure 36.3 (k).

Now we need to select node with minimum distance. Of all the edges, the edge with the smallest weight is GH. Hence the K value for H is updated to T as shown in Figure 36.3 (l).

Figure 36.3 (l)

From H, we need to update distances of adjacent, unselected nodes and at this point they are A and B. But when we consider node A, we find that there is already an entry for the previous vertex as F. Now we consider both the distances from F to A which is 10 and H to A which is 4. The least of these two is 4 and hence we update the table for row A with pv as H and dv as 4. When we consider node B, we find that there is already an entry for the previous vertex as C. Now we consider both the distances from C to B which is 4 and H to B which is 9. The least of these two remains as 4 and hence we do not update the table for row B which still remains as C. . After the updates, the table is shown in Figure 36.3 (m).

Figure 36.3 (n)

Now we need to select node with minimum distance. Of all the edges, the edge with the smallest weight is AH. Hence the K value for A is updated to T as shown in Figure 36.3 (n).

From A, we need to update distances of adjacent, unselected nodes but at this point there are is only one unselected node B. But when we consider node B, we find that there is already an entry for the previous vertex as C. Now we consider both the distances from C to B which is 4 and A to B which is 8. The least of these two remains as 4 and hence we do not update the table for row B which still remains as C. Hence the table remains unchanged as shown in Figure 36.3 (o).

Now we need to select node with minimum distance . The only edge which is unselected is B , hence we select it and its the K value updated to T as shown in Figure 36.3 (p).

Figure 36.3 (p)

Now all the nodes of the graph has been selected and the spanning tree obtained is shown in Figure 36.3 (p). Now, we can compute the cost of minimum spanning tree dv= 21

36.3. Analysis of the Prim’s Algorithm

When there are m edges and n nodes then, the running time for the algorithm is given by O(m + n log n) if a heap is used for sorting the edges. If a heap is not used, the run time will be O(n²) instead of O(m + n log n). However, using a heap complicates the code since you’re complicating the data structure. A Fibonacci heap is the best kind of heap to use, but again, it complicates the code. Unlike Kruskal’s, it does not need to see all of the graph at once. It can deal with it one piece at a time. It also does not have to worry if adding an edge will create a cycle since this algorithm deals primarily with the nodes, and not the edges. For this algorithm the number of nodes needs to be kept to a minimum in addition to the number of edges. For small graphs, the edges matter more, while for large graphs the number of nodes matters more.

36.4 Comparing the algorithms

Both algorithms will always give solutions with the same length. However the two algorithms will usually select edges in a different order. Occasionally they will use different edges – this may happen when you have to choose between edges with the same length. In this case there is more than one minimum connector for the network.

Summary

Explained in detail the Prim’s Algorithm to find Minimum Spanning Trees
Illustrated Prim’s Algorithm through a walkthrough example