5 Finite Automata –NFA
We have seen conversion of Regular Expression to ϵ-NFA using the Thompson’s Construction algorithm. In this module, we shall continue our discussion on the conversion from NFA to DFA since DFA is faster for string matching. The primary objective of this module is to learn how the subset construction algorithm converts a given NFA or ε-NFA to DFA.
5.1 Reasons for Conversion to DFA
As we have already discussed in the earlier modules, the language of a DFA is define as follows:
L = {w | δ(q, w) = r, where q is the start state and r is in F} | (5.1) |
Thus from equation (5.1), it is clear that the set of all strings “w” such that there is a path from the start state to the final state are those that belong to the DFA. The language of NFA is given by
L = {w | δ(q, w) = R, where ‘q’ is the start state and R is a set of states and at least
a ‘r’ in R is in F} | (5.2) |
As, it is obvious that from the multiple states that can be reached from the start state on the string “w”, even if one of them is a final state the string belongs to the NFA. For an ϵ-NFA, if the ε-closure( R ) contains final state, we say the string “w” belongs to the ϵ-NFA. Thus, it is clear that the DFA will have more number of states than a NFA or a ε-NFA for the same language L. However, a DFA will avoid the ambiguity that is present in the NFA in pattern matching for a string.
5.2. Conversion of NFA to DFA
The subset construction algorithm is used to convert the NFA to DFA. The fundamental behind the subset construction algorithm is that the state set of a state in a NFA is thought of as a following “state” of the state in the converted DFA. The following are the input and output of the subset construction algorithm
• Input. An NFA N=(S,S,move,S0,Z)
• Output. A DFA D= (Q,S,d,I0,F),
Essentially, we are trying to map “S” to “Q”. Q is defined as the subset states of “S”. Hence, Q = 2S . The input symbols of the input and output are same. I0 , the start state of the DFA is defined as the ϵ-Closure (S0). Hence, the start state of the DFA corresponds to a set of states of the ε-NFA. The final state F of the DFA, corresponds to all the states that contain the final state Z of the NFA.
The subset construction algorithm is a two step process.
- Pre-processing step: Determination of E-Closure
- DFA construction
5.2.1 E-Closure determination
e-closure(T) is defined as the set of NFA states reachable from NFA state “s” in T on e-transitions alone. Consider a state T such that T ÍS. e-closure is defined for all the states “s” in T of the NFA. Algorithm 5.1 is the pre-processing step to compute E-Closure
e-closure (T)
- push all states in T onto stack;
- initialize e-closure(T) to T;
- while stack is not empty do
{
- pop the top element of the stack into t;
- for each state u with an edge from t to u labeled e do
{if u is not in e-closure (T) { add u to e-closure (T)
push u into stack}
}
}
Algorithm 5.1 e-closure (T)
Line 1 is an indication to compute e-closure for all the states of the NFA. Hence, a stack is used to remember all the states in the input NFA. In Line 2, e-closure is computed for a state that belongs to T. Consider one state ‘q’ from the stack. This state is added to the e-closure (q). Hence, this is an indication that e-closure is not a null set. In line number 5, a for loop is initialized which checks all edges labeled ε from the state considered and added to e-closure (q). This state is pushed back to the stack. Since, this new state, for example ‘r’ which is reachable from ‘q’ on ε, will have edges labeled ε to some other states “S”. These “S” will also belong to the e-closure (q). Hence, this current state is pushed back into the stack.
———————————
Example 5.1 Consider the NFA given in Figure 5.1
In figure 5.1, let us consider computation of ε-Closure of all the states. According to the algorithm, ε-Closure (0) will include the state”0”. In addition, there is a direct edge from state “0” on ϵ to states “1” and “7”. Hence,
ε-Closure (0) = {0} U ε-Closure (1) U ε-Closure (7)
Let us consider ε-Closure(1), which will include state “1” and the ε-Closure(2) and ε-Closure(4), since there is a edge labeled ε from state “1” to “2” and “4”
ε-Closure (1) = {1} U ε-Closure (2) U ε-Closure (4)
On, the other hand, ε-Closure(2) will include just state “2”, since there is no edge labeled ε from state “2” and so is the case for ε-Closure(4).
ε-Closure (2) = {2}, ε-Closure (4) = {4}
Hence, ϵ-Closure (1) can be determined as
ε-Closure (1) = {1, 2, 4}
If we consider determination of ε-Closure (7), it will include “7” alone as there are no edges on ε from “7”
ε-Closure (7) = {7}
In a similar manner,
ε-Closure (8) = {8}, ε-Closure (9) = {9}, ε-Closure (10) = {10}
Consider, ε-Closure (3) which will include “3” and there is an edge labeled ε from “3” to “6”, we need to find ε-Closure (6).
ε-Closure (6) = {6} U ε-Closure (7) U ε-Closure (1)
= {6, 7, 1, 2, 4}
Hence, ε-Closure (3) = {3, 6, 7, 1, 2, 4} and so is ε-Closure (5) = {5, 6, 7, 1, 2, 4}.
———————–
After determining the ε-Closure ( ) of all the states, this is fed as input to the subset construction algorithm to construct the DFA.
5.2.2 Subset Construction Algorithm
The input to the algorithm is the ε-Closure (q) of all the states and the NFA to know the transition function. The output will be the DFA. This algorithm constructs a transition table Dtran for the DFA so that D simulate in parallel all possible moves the NFA can do on an input string. Dstates refer to the states of the DFA. Subset construction algorithm is given in Algorithm 5.2 to construct the DFA from the NFA and the pre-processed input
SubsetConstruction (ε-Closure (S0), NFA)
{
Initially, ε-Closure(S0) is the state in Dstates and it is unmarked
While there is an unmarked state T in Dstates do
{
Mark T;
for each input symbol ‘a’
{
U = ϵ-Closure(move(T,a))
If U is not in Dstates then
add U as an unmarked state to Dstates
Dtran[T, a] = U
}
}
}
Algorithm 5.2 Subset construction algorithm
The operation move (T,a), in addition to the ϵ-Closure is used to convert the NFA to DFA. A move(T,a) is defined as the set of NFA states to which there is a transition on input symbol “a” from some NFA state “s” in T. ε-Closure(S0) is computed and this is assigned as the start state I0 of the DFA. For all input symbols, ‘a’, the ε-Closure (S0) is referred and the states that are reachable from every state in ϵ-Closure (S0) on the input symbol ‘a’ is defined as move (ε-Closure (S0), a). The resultant will be another set of states, which is indicated as the next state in the DFA. This procedure is carried out, till there are no more new states in the resultant DFA The last step is to indicate the final states of the DFA. The final state F of the DFA is defined as
F= {I | I ∈ Q, such that I ∩ Z < >F}
Line 1 of the algorithm 5.2 initializes the start state of the DFA as the ϵ-Closure (start state of NFA). Line 2, is a while loop, which iterates till there are no more states being created. Line 3, accepts the start state as one of the DFA’s states and marks it. Line 4 runs a “for” loop which defines the transition from the initial state of the DFA on all input symbols. A single state on the DFA consists of the set of states of the NFA. The transition on an input symbol ‘a’ in the DFA is defined as the union of the states given in the DFA’s state, by referring to their individual transitions in the NFA. After identifying the transition, union of the ϵ-Closure of all the states that are reached is determined and this set of states is one state of the DFA. The new state U is labeled as unmarked, indicating that transition from this state has not been defined in line 6 of the algorithm. The transition table of the DFA, DTran is updated with transition information.
———
Consider example 5.1, which is given in figure 5.1. The start state of the DFA is ϵ-Closure (0). We already know that ϵ-Closure (0) = {0, 1, 2, 4, 7}. This is the start state of the DFA and is indicated by state “A”. From this state, move (0, a), (1, a), (2,a), (4, a), (7, a) is determined. The input NFA is referred and the transitions are only from 2, and 4 which is to state 3 and 8 respectively. Then we compute ϵ-Closure ((move (0, a)). This is nothing but ϵ-Closure (3,8). This is computed as union of ϵ-Closure (3) and ϵ-Closure (8) = {3, 6, 7, 1, 2, 4} U {8} = {3, 6, 7, 1, 2, 4, 8}. This is referred to as state B in the DFA. Similarly, from state A on input ‘b’ is computed as move(0, b), (1, b), (2,b), (4, b), (7, b). There is a transition only from state 4 on “b” which is 5. Hence, we compute ϵ-Closure(5) = {1,2,4,5,6,7} which is given as state C. Thus Dtran [A,a] = B and Dtran[A, b] = C and we have two new states B, C. From these two states transitions are defined on the input symbols, “a” and “b”. The entire, computation is shown in Table 5.1 and the automaton is represented in figure 5.2 and where “E” is the final state.
Summary
The Thompson’s subset construction algorithm, converts the regular expression to ε-NFA and then to DFA. The algorithm results in a non-minimized version of the DFA which needs to be minimized and will be discussed in the next module.