16 Deadlocks – Detection and Recovery
Mary Anitha Rajam
19.1 Introduction
In a multi-programming environment, many processes compete for a finite number of resources. If the resources requested by a process are not available, the process waits. If the resources requested by this process are held by other waiting processes, the situation may lead to a deadlock. The different ways to handle deadlocks are deadlock prevention, deadlock avoidance, deadlock detection and recovery.
In this module we will learn how to detect deadlocks. We will also learn how to recover from deadlocks, when deadlocks occur.
19.2. Deadlock Detection
If there is no deadlock prevention or deadlock avoidance algorithm running in the system, deadlock situation may occur. In this case, it is necessary to detect deadlocks and recover from them. Two methods are explained in this section to detect deadlocks. The first method detects deadlocks when there is only one instance of each resource type. This method uses a variant of the resource-allocation graph. The second method detects deadlocks even when there are multiple instances of each resource type. The second method uses a variant of the banker’s algorithm.
19.2.1 Deadlock Detection – Single Instance of Each Resource Type
The system maintains a wait-for graph for detecting deadlocks. The wait-for graph is a variant of the resource-allocation graph. In the wait-for graph, the nodes are processes. If there is an edge Pi ® Pj, then it means that process Pi is waiting for process Pj. The corresponding resource-allocation graph would have had edges Pi ® Rq and Rq ® Pj which means that Pi is waiting for resource Rq and Rq is held by resource Pj. In the wait-for graph, the resource node is removed and there is an edge from Pi ® Pj. Figure 19.1 shows a resource-allocation graph and the corresponding wait-for graph.
To detect deadlocks from the wait-for graph, it is required to periodically invoke an algorithm that searches for a cycle in the wait-for graph. If there is a cycle in the wait-for graph, then it means that there is a deadlock in the system. If there are no cycles, then it means that there is no deadlock in the system. In the wait-for graph shown in Figure 19.1, there is a cycle P1®P2®P3®P4®P1. Hence, the system is in a deadlocked state. An algorithm to detect a cycle in a graph requires an order of n2 operations, where n is the number of vertices in the graph.
Fig. 19.1 Resource-Allocation Graph and Corresponding wait-for graph
The disadvantage of this method is that it is not suitable for a resource-allocation system with multiple instances of each resource type.
19.2.2 Deadlock Detection – Several Instances of a Resource Type
In this subsection we learn a deadlock detection algorithm that will detect deadlocks when there are multiple instances of each resource type.
Let n denote the number of processes and m denote the number of resource types present in the system.
Data Structures Used in the Algorithm:
- Available: A vector of length m indicates the number of available resources of each type
A | B | C |
2 | 3 | 0 |
In the example shown above, there are three resource types A, B and C. The number of available instances of resource types A, B and C are 2, 3 and 0 respectively.
• Allocation: An n x m matrix defines the number of resources of each type currently allocated to each process. The n rows correspond to n processes and the m columns correspond to the m resource types.
A B C | |
P0 | 0 1 0 |
P1 | 2 0 0 |
P2 | 3 0 2 |
P3 | 2 1 1 |
In the example shown above, there are three resource types A, B and C and four processes P0, P1, P2 and P3. The number of instances of each resource type currently allocated to each process is shown. Process P0 is currently allocated 0 instances of resource type A, 1 instance of resource type B and 0 instances of resource type C. Similarly, the current allocation of the other processes P1, P2 and P3 are also shown.
• Request: An n x m matrix indicates the current request of each process. If Request [i,j] = k, then process Pi is requesting k more instances of resource type Rj
A B C
P0 | 7 5 3 |
P1 | 3 2 2 |
P2 | 9 0 2 |
P3 | 2 2 2 |
In the example shown above, there are three resource types A, B and C and four processes P0, P1, P2 and P3. The number of instances of each resource type requested by each process is shown. Process P0 is requesting 7 instances of resource type A, 5 instances of resource type B and 3 instances of resource type C. Similarly, the request of the other processes P1, P2 and P3 are also shown.
Notations Used
If X and Y are vectors of length n,
– X ≤ Y iff X[i] ≤ Y[i] for all i = 1,2,…,n
That is, the ith element of vector X is less than or equal to the ith element of vector Y, for all i.
Consider the following vectors X and Y
X = (1,7,3,2), Y = (0,3,2,1)
In the above example, Y ≤ X because each element of vector Y is less than or equal to the corresponding element of vector X. That is, the first element of Y is less than or equal to the first element of X and so on. Also, note that, Y < X, if Y ≤ X and Y≠ X
Each row in the matrices Allocation and Request are treated as vectors and referred to as Allocationi and Requesti respectively.
The deadlock algorithm which is a variant of the Banker’s algorithm is given below:
Detection Algorithm
1. Let Work and Finish be vectors of length m and n, respectively. Initialize:
(a) Work = Available
(b) For i = 1,2, …, n, if Allocationi ¹ 0, then Finish[i] = false; otherwise,
Finish[i] = true.
2. Find an index i such that both:
(a) Finish[i] = false
(b) Requesti £ Work
If no such i exists, go to step 4.
3. Work = Work + Allocationi
Finish[i] = true
go to step 2.
4. If Finish[i] == false, for some i, 1 £ i £ n, then the system is in deadlock state.
Moreover, if Finish[i] == false, then Pi is deadlocked.
This algorithm requires an order of O(m x n2) operations to detect whether the system is in deadlocked state. The working of this algorithm can be understood by an example.
Example of Detection Algorithm
Consider 5 processes P0 through P4; 3 resource types A, B and C. There are 7 instances of A, 2 instances of B and 6 instances of C.
The snapshot at time T0 is given below:
Allocation A B C | Request A B C | Available A B C | |
P0 | 0 1 0 | 0 0 0 | 0 0 0 |
P1 | 2 0 0 | 2 0 2 | |
P2 | 3 0 3 | 0 0 0 | |
P3 | 2 1 1 | 1 0 0 | |
P4 | 0 0 2 | 0 0 2 |
We now simulate the algorithm for the above example.
Initially, Available = (0,0,0); Finish[i] = false for i = 0,1,2,3,4
i = 0
We check if Request0 ≤ Available? Yes
Therefore, Work = Work + Allocation0 =(0,0,0) + (0,1,0) = (0,1,0)
Finish[0] = true , P0 added to safe sequence < P0>
Allocation Request Available
A B C | A B C | A B C | |
P0 | 0 1 0 | 0 0 0 | 0 0 0 |
P1 | 2 0 0 | 2 0 2 | |
P2 | 3 0 3 | 0 0 0 | |
P3 | 2 1 1 | 1 0 0 | |
P4 | 0 0 2 | 0 0 2 |
Work = (0,1,0);
Is Request1 ≤ Available? No
Since Request1 is not less than Available, check the next process.
Allocation Request Available
A B C | A B C | A B C | |
P0 | 0 1 0 | 0 0 0 | 0 0 0 |
P1 | 2 0 0 | 2 0 2 | |
P2 | 3 0 3 | 0 0 0 | |
P3 | 2 1 1 | 1 0 0 | |
P4 | 0 0 2 | 0 0 2 |
Work = (0,1,0);
Is Request2 ≤ Available? Yes
Work = Work + Allocation2 =(0,1,0) + (3,0,3) = (3,1,3) Finish[2] = true , P2 added to safe sequence < P0, P2>
Allocation Request Available
A B C | A B C | A B C | |
P0 | 0 1 0 | 0 0 0 | 0 0 0 |
P1 | 2 0 0 | 2 0 2 | |
P2 | 3 0 3 | 0 0 0 | |
P3 | 2 1 1 | 1 0 0 | |
P4 | 0 0 2 | 0 0 2 |
Work = (3,1,3);
Is Request3 ≤ Available? Yes
Work = Work + Allocation3 =(3,1,3) + (2,1,1) = (5,2,4) Finish[3] = true,
P3 added to safe sequence and the safe sequence is now < P0, P2, P3 >
Allocation Request Available
A B C | A B C | A B C | |
P0 | 0 1 0 | 0 0 0 | 0 0 0 |
P1 | 2 0 0 | 2 0 2 | |
P2 | 3 0 3 | 0 0 0 | |
P3 | 2 1 1 | 1 0 0 | |
P4 | 0 0 2 | 0 0 2 |
Work = (5,2,4);
Is Request4 ≤ Available? Yes
Work = Work + Allocation4 =(5,2,4) + (0,0,2) = (5,2,6) Finish[4] = true,
P4 added to safe sequence and the safe sequence is < P0, P2, P3, P4 >
Now, we check again from the beginning all the other processes that were not added to the safe sequence.
Allocation Request Available
A B C | A B C | A B C | |
P0 | 0 1 0 | 0 0 0 | 0 0 0 |
P1 | 2 0 0 | 2 0 2 | |
P2 | 3 0 3 | 0 0 0 | |
P3 | 2 1 1 | 1 0 0 | |
P4 | 0 0 2 | 0 0 2 |
Work = (5,2,6);
Is Request1 ≤ Available? Yes
Work = Work + Allocation1 =(5,2,6) + (2,0,0) = (7,2,6) Finish[1] = true,
P1 added to safe sequence and the safe sequence now is < P0, P2, P3, P4, P1 >
Allocation Request Available
A B C | A B C | A B C | |
P0 | 0 1 0 | 0 0 0 | 0 0 0 |
P1 | 2 0 0 | 2 0 2 | |
P2 | 3 0 3 | 0 0 0 | |
P3 | 2 1 1 | 1 0 0 | |
P4 | 0 0 2 | 0 0 2 |
Sequence <P0, P2, P3, P4, P1> now results in Finish[i] = true for all i.
There can be more than one safe sequence, that is there can be correct safe sequences other than <P0, P2, P3, P1, P4>. We have found one safe sequence. Since there is at least one safe sequence, the system is in a safe state. There is no deadlock in the system.
Let process P2 now make an additional request for an instance of resource type C. The Request matrix is changed as shown below, after including the request of an instance of resource type C by process P2.
Request A B C
P0 | 0 0 0 |
P1 | 2 0 2 |
P2 | 0 0 1 |
P3 | 1 0 0 |
P4 | 0 0 2 |
Now, let us check if the system will be in a safe state. The deadlock detection algorithm is run again.
Allocation Request Available
A B C | A B C | A B C | |
P0 | 0 1 0 | 0 0 0 | 0 0 0 |
P1 | 2 0 0 | 2 0 2 | |
P2 | 3 0 3 | 0 0 1 | |
P3 | 2 1 1 | 1 0 0 | |
P4 | 0 0 2 | 0 0 2 |
Initially, Work = Available = (0,0,0); Finish[i] = false for i = 0,1,2,3,4 When i = 0,
Check if Request0 ≤ Available? Yes
Work = Work + Allocation0 =(0,0,0) + (0,1,0) = (0,1,0) Finish[0] = true, P0 is added to safe sequence < P0>
Allocation Request Available
A B C | A B C | A B C | |
P0 | 0 1 0 | 0 0 0 | 0 0 0 |
P1 | 2 0 0 | 2 0 2 | |
P2 | 3 0 3 | 0 0 1 | |
P3 | 2 1 1 | 1 0 0 | |
P4 | 0 0 2 | 0 0 2 |
Work is now (0,1,0);
Is Request1 ≤ Available? No
Is Request2 ≤ Available? No
Is Request3 ≤ Available? No
Is Request4 ≤ Available? No
Since the request of all the processes cannot be allocated, the system is not in a safe state. Though it possible to reclaim the resources held by process P0, there are insufficient resources to fulfill other processes’ requests. Thus, a deadlock exists, consisting of processes P1, P2, P3, and P4.
Deadlock Detection Algorithm Usage
When should we invoke the detection algorithm?
This depends on the answers to the following questions.
– How often is a deadlock likely to occur?
– How many processes will be affected by the deadlock when it happens?
If deadlocks occur frequently, then it is necessary to invoke the detection algorithm frequently. If detection is not done, the resources allocated to deadlocked processes will be idle. The number of processes involved in the deadlock cycle may also grow.
Deadlocks occur when a process makes a request that cannot be granted immediately. Therefore, the detection algorithm can be invoked every time a request for allocation cannot be granted immediately. During the detection process, not only the set of processes that are deadlocked are identified, but also the specific process that caused the deadlock is identified. If the cycle is completed by the most recent request, the process that requested the most recent request is identified as responsible for the deadlock. But invoking the deadlock detection algorithm each and every time a request is made results in considerable overhead in computation time.
Therefore, the other possibility is to invoke the detection algorithm at less frequent intervals. The detection algorithm may be invoked once per hour or when the CPU utilization falls below 40 percent. In this case there may be many cycles in the graph and it may not be possible to tell which process caused the deadlock.
19.3 Recovery from Deadlocks
Once deadlocks are detected, it is necessary to recover from deadlocks. There are different ways in which the system can recover from deadlocks. One method is to inform the operator that a deadlock has occurred. The operator deals with the deadlock manually. The second method is to let the system recover from the deadlock automatically. The third method is to break the deadlock. To break the deadlock, there are two ways. One is to abort one or more processes to break the circular wait (process termination). The second is to preempt some resources from one or more of the deadlocked processes (resource preemption).
19.3.1 Recovery from Deadlock: Process Termination
Termination of processes can be done in two ways. One is to abort all the deadlocked processes. This method will break the deadlock, but the results of all partial computations done by the aborted processes must be discarded and recomputed later. The second way is to abort one process at a time until the deadlock cycle is eliminated. This method involves a lot of overhead, because, after each process is aborted, deadlock-detection algorithm must be invoked to check if the deadlock still exists.
When there are many processes involved in the deadlock, it is necessary to choose a process to terminate first. In which order should we choose a process to abort? There are many factors based on which the process to be aborted can be chosen:
– Priority of the process (process with the least priority is chosen)
– How long process has computed, and how much longer to completion (the process that has done the least computations is chosen)
– Resources the process has used (the process that has used less number of resources is chosen)
– Resources process needs to complete (the process that needs many resources is chosen)
– How many processes will need to be terminated
– Is process interactive or batch? (batch processes are chosen to be terminated earlier than interactive processes)
19.3.2 Recovery from Deadlock: Resource Preemption
To recover from deadlocks, resources can be preempted from the deadlocked processes. When resources are preempted, it is necessary to consider certain issues. The issues are discussed below:
- Selecting a victim
It is necessary to select the victim process from which the resources are to be preempted. The victim process has to be selected such that the cost is minimized. Cost factors may include the number of resources a deadlock process is holding and the amount of time a deadlocked process has thus far consumed during its execution.
- Rollback
If a resource is preempted from a process, what to do with the process? The process cannot continue, because it is missing some needed resource. Therefore, the process must be rolled back to some safe state and the process must be restarted from that state. But, the problem is that it is difficult to determine a safe state. Therefore a total rollback might have to be done or the process must be aborted and restarted again.
- Starvation
While selecting a process for preempting resources, the same process may always be picked as victim. In that case, that process may be starved of resources and may not be able to complete its work. To reduce starvation, the number of rollbacks may also be included in the cost factor.
19.4 Summary
This module discussed deadlock detection algorithms when there is a single instance for each resource type (wait-for graph) and when there are multiple instances of resources (Banker’s algorithm). This module also discussed how the system can recover from deadlocks.
References
- Abraham Silberschatz, Peter B. Galvin, Greg Gagne, “Operating System Concepts”, Ninth Edition, John Wiley & Sons Inc., 2012.