35 Minimum Spanning Trees- Kruskal AIgorithm
T.V. Geetha
Welcome to the e-PG Pathshala Lecture Series on Data Structures. We have understood the Minimum Spanning Trees. In this module we will discuss about Kruskal’s Algorithm in detail.
Learning Objectives
The learning objectives of the module are as follows:
• To understand in detail the Kruskal’s Algorithm to find Minimum Spanning Trees
• To discuss implementation issues associated with Kruskal’s Algorithm
• To understand Kruskal’s Algorithm through a walkthrough example
35.1 Spanning Trees
This is a recap of Spanning trees and minimum spanning trees. When an undirected graph G is connected, a depth first or breadth first search starting at any vertex will visit all vertices in G. A spanning tree is any tree that consists solely of edges in G and that includes all the vertices. The edges of the graph E(G) is as below:
E(G): T (tree edges) + N (nontree edges) where
T: set of edges used during search
N: set of remaining edges
The Minimum Spanning Tree (MST) for a given graph is the Spanning Tree of minimum cost for that graph.
35.2 Greedy Algorithms
Now let us make a simplifying assumption that all edge costs ce are distinct. With this assumption let us now define two properties namely Cut Property and Cycle Property (Figure 35.1).
Cut property – Let S be any subset of nodes, and let e be the minimum cost edge with exactly one endpoint in S. Then the MST contains e.
Cycle property – Let C be any cycle, and let f be the maximum cost edge belonging to C. Then the MST does not contain f.
Figure 35.2 Kruskal’s Algorithm
35.2.1 Generic Algorithm for growing MST
First we will examine the generic MST algorithm.
GENERIC_MST(G,w)
1. A:={} /* (can be a forest or a single tree – see section 35.2.2 )
2. while A does not form a spanning tree do
3. find an edge (u,v) that is safe for A
4. A:=A∪{(u,v)}
5. return A
Set A is always a subset of some minimum spanning tree. This property is called the invariant Property. An edge (u,v) is a safe edge for A if adding the edge to A does not destroy the invariant property. A safe edge is just the CORRECT edge to choose to add.
35.2.2 Kruskal and Prim’s Algorithms
Both Kruskal and Prim’s algorithms are based on the generic algorithm. A specific rule to determine a safe edge is what diffrentiates the two algorithms. Set A is always a subset of some minimum spanning tree
In Kruskal’s algorithm, the set A is a forest. In this algorithm the safe edge added is always a least-weight edge in the graph that connects two distinct components.
In Prim’s algorithm, the set A forms a single tree . The safe edge added is always a least-weight edge connecting the tree to a vertex not in the tree.
35.3 Kruskal Agorithm in detail
Now we will discuss the Kruskal algorithm in detail. Created in 1957 by Joseph Kruskal, it finds the MST by taking the smallest weight in the graph and connecting the two nodes and repeating until all nodes are connected to just one tree. This is done by creating a priority queue using the weights as keys . It creates a forest of trees. Initially the forest consists of n single node trees (and no edges). At each step, we add one edge (the cheapest one) so that it joins two trees together. If the edge would result in a cycle, it would simply link two nodes that were already part of a single connected tree, so that this edge would not be needed.
35.3.1 Concept of Kruskal’s Algorithm
Add the cheapest edge that joins disjoint components as in Fig 35.2. We we select (uv) since that is the cheapest edge, then we select (sb) since that is the next cheapest edge and so on. We do not select edges if they form a cycle a nd stop when all nodes are covered.
1.The forest is constructed with each node forming an independent tree. The edges are placed in a priority queue using the weights as keys.
2.Until we have added n-1 edges,
(a) Extract the cheapest edge from the queue,
(b) If it forms a cycle, reject it,
(c) Else add it to the forest. Adding it to the forest will join two trees together.
Every step Joins – two trees in the forest together, so that at the end, there will only be one tree in T.
The reason the algorithm works is that each added edge is a connection between two sets of vertices, and since we select the edges in order of weight, we are always selecting the minimum edge weight that connects the two sets of vertices.
35.3.2 Cycle detection
One important part of Kruskal’s algorithm is the cycle detection. Here the steps are as follows:
Keep track of disjoint sets.
- Initially, each vertex is in its own disjoint set.
- When you add an edge you are unionizing two sets.
- A union cannot happen if the two vertices are already in the same set.
• This would create a cycle.
35.3.3 Basics of Kruskal’s Algorithm
- Sort the edges E in an non-decreasing order (please note two edges can have the same weight)
- A:={}
- while E is not empty do {
take an edge (u, v) that is shortest in E and delete it from E
if u and v are in different components then
add (u, v) to A }
Here each time a shortest edge in E is considered. We work with edges, rather than nodes. In other words there are two fundamental steps. We need to sort edges by increasing edge weight and then select the first |V| – 1 edges that do not generate a cycle. Each step of the Kruskal Algorithm is explained below with an illustartive example..
Step1
List the edges of graph where the edges are sorted by increasing weight as shown in Fig 35.3a as below.
(ED-2, AB-3, AE-4,CD-4, BC-5, EF-5, CF-6,AF-7,BF-8,DF-8)
Step 2
Select the shortest edge in the list (ie ED-2) as in Fig 35.3b.
Step 3
Select the next shortest edge (AB-3) which does not create a cycle as in Fig 35.3c
Step 4
Select the next shortest edge CD-4 (either CD or AE could have been selected first since both have same weight) which does not create a cycle (Fig 35.3d).
Step 5
Select the next shortest edge (AE-4) which does not create a cycle as in Fig 35.3e.
Step 6
Select the next shortest edge which does not create a cycle (EF-5). BC-5 is not selected because it forms a cycle (Fig35.3f).
Finally all vertices have been connected. The solution is (ED-2, AB-3, CD-4,AE-4,EF-5). Total weight if tree is 18.
1.3 Walk Through
The steps of ‘walk through’ operation is explained below in Figures 35.4a to 35.4k.
The list consists of a sorted list of edges along with the cost of each edge.
Step 1
Consider an undirected, weighted graph and sort the edges by non-decreasing edge weight as in Fig 35.4 a.
Step 2
Select the first edge in the sorted list which does not generate a cycle (DE). This task of finding a cycle is carried out by using sets and is explained in section 35.3.2. Here this is not relevant since this is the first edge selected (Figure 35.4 (b))
Step 3
Select the next cheapest edge (DG) that has does not generate a cycle (Figure 35.4 (c).
Step 4
Select the next cheapest edge in the list which is EG but that results in a cycle and hence is not considered (Figure 35 d).
Step 5
Select the next cheapest edge (CD) that has does not generate a cycle (Figure 35.4 (e).
Step 6
Select the next cheapest edge (GH) that has does not generate a cycle (Figure 35.4 (f).
Step 6
Select the next cheapest edge (CF) that has does not generate a cycle (Figure 35.4 (g).
Step 7
Select the next cheapest edge (BC) that has does not generate a cycle (Figure 35.4 (h).
Step 8, Step9 & Step 10
Selection of the next three edges, BE,BF,BH but that result in a cycle and hence they are not considered (Figure 35 i.).
Step 11
Select the next cheapest edge (AH) that has does not generate a cycle (Figure 35.4 (j).
Now the remaining edges are not considered since all the vertices have been accounted for. The final MST is shown in Figure 35.4 (k). Total Cost of the MST= S dv = 21
35.4 Kruskal’s algorithm ( Details)
The details of the algorithm is shown below.
MST_KRUSKAL(G,w)
1 A:={}
2 for each vertex v in V[G]
3 do MAKE_SET(v)
4 sort the edges of E by nondecreasing weight w
5 for each edge (u,v) in E, in order by nondecreasing weight
6 do if FIND_SET(u) != FIND_SET(v)
7 then A:=A∪{(u,v)}
8 UNION(u,v)
9 return A
To implement this algorithm we must find a way to represent sets. We need to find which set a vertex is in . This is because when an edge is being considered for the MST, we must first find which sets the edge vertices belong to. Hence a findSet() operation is required.
We need to get the union of two sets since if the vertex sets are disjoint, the edge is added to the MST and the union of the two sets is obtained. So a union() operation is also needed.
Testing if an edge creates a cycle can be slow, and hence we need a data structure which supports these operations- Union-Find data structure. There are two standard implementations : Disjoint-set linked lists & Disjoint-set trees.
35.4.1 Disjoint Sets
Now let us discuss this disjoint set. Here we keep a collection of sets S1, S2, .., Sk where each Si is a set of vertices, e,g, S1={v1, v2, v8}. We associate three operations with this Disjoint set. They are
- Make-Set(x)- This operation creates a new set whose only member is x.
- Union(x, y) –This operation unites the sets that contain x and y, say, Sx and Sy, into a new set that is the union of the two sets.
- Find-Set(x)-This operation returns a pointer to the representative of the set containing x.
Each of the operations discussed above takes O(log n) time.
The union-find data structure is used to check if adding an edge to a set would create a cycle. Here we maintain a set for each connected component. ! If vertices x and y are in same component, then adding edge x-y creates a cycle. To add x-y to a set we create a new set by merging the sets containing x and y.
35.4.2 Analysis of Kruskal Algorithm
The Running Time of the Kruskal’s algorithm = O(E log V)
where (E = edges, V= vertices) of the graph.
It usually only has to check a small fraction of the edges, but in some cases (like if there was a vertex connected to the graph by only one edge and it was the longest edge) it would have to check all the edges. This algorithm works best, of course, if the number of edges is kept to a minimum.
Summary
- Explained in detail the Kruskal’s Algorithm to find Minimum Spanning Trees
- Discussed the implementation issues associated with Kruskal’s Algorithm
- Illustrated Kruskal’s Algorithm through a walkthrough example
you can view video on Minimum Spanning Trees- Kruskal AIgorithm |