23 TCP Timers and delayed duplicates problem in TCP
Prof. Bhushan Trivedi
Introduction
TCP uses timers for many cases. We have seen a case where it needs a retransmission timer in module 15 and we have also seen how timeout value for that timer is calculated on the basis of RTT calculation in the previous module. In this module, we will look at few other types of timers used by TCP. That includes, connection establishment timer, time wait -2 timer, keep alive timer, delayed ACK timer and persist timer. TCP, unlike other layers, is plagued by a problem known as delayed duplicate problem. In this problem, the TCP segments roam around in the network and delivered to the receiver when the duplicates retransmitted reached there already. We will look at that problem in this module and how that problem is solved by a method provided by a researcher called Tomlinson. We will also see how delayed duplicates hampers the process of connection establishment and how the three-way handshake solves that problem.
Timers used by TCP
TCP uses many timers. This module introduces most commonly used timers. The first timer we will mention here is called connection establishment timer. This timer is used when the TCP establishes the connection. TCP follows a method known as a three-way handshake for connection establishment. We will, in the later part of this module, learn more about that process but here is a short description. A sender sends a connection request to the receiver, upon receipt, the receiver responds back with its own request as well as the ack of the sender’s request. The sender completes the handshake by sending the ack to the receiver’s response. In this process, it is possible that receiver does not receive the connection request or might also send the response but the response is lost, then, it is possible that the sender might wait forever for the response. This timer eliminates that possibility. As soon as the sender sends the connection request, it starts the timer, if it does not get the response before the timer goes off, it will resend the connection request. Another timer which we have already seen in module 15 is the delayed ack timer. When the acks are piggybacked, the receiver waits for the reverse traffic. It starts this timer when it receives a segment from the other side. If this timer goes off before the receiver generates a reverse traffic, it sends a special ack frame, otherwise, the ack is piggybacked on the reverse traffic.
We will discuss other timers one by one now. We have already described retransmission timer process in detail in the previous module so we will not discuss that further here.
Persist Timer
The persist timer’s function is depicted in figure 25.1. We have already seen in the previous module that TCP deploys flow control using window advertisement process. The process stops the traffic from the sender altogether if the receiver advertises a zero window. It is, in this case, possible that after a while the receiver application reads the data and the receiver transport layer sends a non-zero window advertisement. What if this segment is lost? We run into a deadlock. The sender believes that the receiver has given a zero window so he keeps waiting. Receiver believes that it has already sent a clear signal to the sender so it should send whenever it wishes to and continue waiting for segments on that connection. Thus both of them are waiting for the other to initiate something and connection remains on hold forever. To avoid this problem, persist timer is invoked. It works as shown in the figure 25.1.
Closely observe how the sender starts the persist timer when it receives a zero window advertisement. You can see that subsequent nonzero window advertisement is lost and receiver keeps on waiting for a sender to start transmitting. Only when the persist timer goes off, the sender sends a probe, a special segment which inquires if the window size is still zero. This probe is always responded back by a receiver, however busy it is, as it is of highest priority communication. You can see that in a response of sender’s probe, the receiver repeats the nonzero window advertisement, the deadlock is broken and communication starts all over again. What if the persist timer goes off but the receiver still has zero-window? The receiver responds back with zero-window yet again and the persist timer starts running afresh.
Keepalive timer
The keep alive timer is kept to handle a normal case where users shut off their machine, or close the application or their devices batteries drain off and normal connection close does not take place. For example, when we open our browser, it instructs the TCP process running underneath to establish a connection with the web server and when we log out, it requests the same TCP process to terminate that connection. If we do not log out and close the mail client, or shut the machine off or the battery of our machine is drained, the log out is not executed and thus our TCP won’t send connection close request to the other end and closes down. The problem it creates is, the server keeps on waiting for us to send something, keeps a process open for us waiting for our commands, spares some memory for us to store our segments etc. If the server does not come to know that we are done from our side, and if it keeps on waiting on such dead connections, it will soon run out of memory and processes to handle new incoming clients. To sort this problem out, another timer, called keep alive timer is used. The process is depicted in figure 25.2.
As soon as the server (Sender here) receives a query from a client (Receiver here), and responds back, it starts this timer. If there is no activity from the receiver for a substantial period, the keep alive timer goes off. Once the timer goes off, the server sends a typical message normally known as “Are you alive” message. If the client is really running, it will respond as shown in the figure, indicating it is running. The server restarts the keep alive timer yet again. If a client sends a query, it might respond yet again and restart that timer yet again. If a long enough period of inactivity again forces the keep alive timer to go off, the server again sends the ‘are you alive’ message. If this time, as shown in the figure, the client is down and cannot send the response back, the server tries calling it a few more times before concluding that the client is down and release resources.
Fin-Wait-2 Timer
TCP closes connection using a process known as four-way handshake, in which case, the sender TCP sends a disconnection request (DR) to the receiver when it feels it has sent everything it wanted to. The receiver provides the ack to the DR and keeps on sending whatever it has to. Once the receiver is done, it will send its own Disconnection Request and sender provides the ack in the final move. The problem occurs when the sender sends its DR and received the ACK but receiver’s DR does not reach the sender. The sender should not keep on waiting forever. The solution is called Fin-Wait-2 timer1. This timer is initiated upon receipt of the ack from the receiver to the DR request. If the timer goes off before getting the DR from the receiver, sender closes on its own.
Figure 25.3 depicts the connection close process that we described. You can see that the receiver responds to the sender’s DR first and after a while sends its own request. Such a closing process is sometimes denoted as asynchronous close; that means sender and receiver do not close at the same point in time. However, the process depicted in figure 25.3 indicates a perfect close without any trouble.
Now pay attention to figure 25.4. The connection close runs into trouble as the receiver fails to send the DR request, may be because the receiver process is crashed or user has shut off
1 This name is given on the basis of the connection status at that point in time. When the DR is sent, the TCP is said to have sent a Final segment. The wait for the ACK is known as Fin-Wait-1. When the ack is received, the sender further waits for the DR from the other side, which is subsequently called Fin-Wait-2.
that process inadvertently or receiver has sent the DR but lost in transit. In whatever case, the sender does not receive the response from the receiver in time. The sender does not end up in an infinite loop though, the Fin-wait-2 timer goes off and sender closes on its own, without waiting for the receiver’s DR any longer.
Time-Wait timer
Final timer we are going to learn is known as a time-wait timer. TCP works with the internet where the packets may take a much longer path due to congestion and take an inordinate time, forcing the sender to retransmit.
In this case, the original packets delayed to the point that they are received after the retransmitted packets. Such packets are known as delayed duplicates and are a serious problem. When a connection is closed, such delayed duplicates may still be around and reach the communicating party after the connection is closed. As per the laws of Internet, such delayed duplicates live for maximum two minutes and thus whenever a connection is closed, for two additional minutes, there is a likelihood of a segment belonging to that connection might reach the communicating party. To pick up these delayed duplicates and thrown them to the dustbin (discard them), we need a timer. This timer is known as a time-wait timer.
Observe the figure 25.5. The connection close process is completed. As soon as it is completed, the time wait timer commences and runs for the default 2 minutes or some lesser value which is provided at the time of installation. The receiver waits for all delayed duplicates and discards them if they ever turn up.
These timers help TCP come out of situations where the receiver fail to respond in time or if the delayed duplicates arrive after the connection is closed. We will throw some more light on the delayed duplicates now.
Delayed duplicates
Delayed duplicates, as mentioned earlier, are packets sent earlier and after having taken a much longer route, reach the destination later than the retransmitted copies of them. Let us take an example to understand. The complete process is depicted in four figures, 25.6 to 25.9.
original path is congested
Closely observe 25.6 and 25.7. 25.6 describes a normal path (which is shorter) through which a packet reaches at the other end quickly. Once in a while, when the usual path is congested, the packets take a longer route. 25.6 describes that longer route. The longer route obviously takes more time of the packets traveling over them. Such a case presents a situation where there are variable times to reach to the receiver, some take short while some take long paths and thus there is no guarantee that a packet will ever reach the destination in order of sending. The other point to note is that our retransmission timer, which is set for the normal traffic, times out and forces the sender to retransmit assuming the delayed duplicates are lost while in the true sense they aren’t. The serious problem arises when such assumed lost duplicate packets reach at the receiver at later times. Figures 25.8 and 25.9 depicts such a case. An employee’s paychecks are calculated at the location we call O in the network The printer is situated at a place known as N. For each of the employees, the connection is established by the paycheck calculation system (O) to the printer (N), paycheck details are sent, and the connection is closed. In a typical case mentioned in the figure, all three packets are assumed lost when traveling over a longer path and retransmitted. They reach before the delayed duplicates, the printer prints that fortunate employee’s paycheck and when delayed duplicates arrive, does that again without really knowing them to be duplicated and not a new data. The lucky employee gets two paychecks!
Understanding the problem with delayed duplicates, one must understand that the transport layer must find a solution to this problem. TCP designers have provided a solution based on a few important principles. Here is the description.
The first principle is to provide unique sequence numbers for each segment. The idea is simple. Provide a sequence number to each segment makes sure the receiver can easily identify two different segments with the same sequence number. The problem solved!
No! a small problem still remains. How long should one remember these sequences? A node might be receiving many connections from many senders across the world and if it tries to remember all sequence numbers for each of the senders forever, it just cannot work! The amount of memory needed will soon go beyond the reasonable level. The solution to the second problem boils down to another restriction, no segment is allowed to roam around forever in the network. There is a lifetime associated with every segment after which it is considered dead and discarded.
With both these measures, providing a unique sequence number, and a restricted lifetime for each of the segments, the solution to delayed duplicate problem is possible. The receiver, after accepting a typical segment, should not accept the segment from the same sender, with the same sequence number, for the lifetime of the segment. For example,
consider a segment with sequence number 12 is received by a receiver at 12:15:10 with a lifetime of two minutes getting over at 12:17:10. If the segment with sequence number 12 from the same sender is received at the receiver till 12:17:10 it will be considered a duplicate and discarded. Any delayed duplicate will reach the sender only if its lifetime is not over and receiver will surely catch it as the time wait timer is not yet off.
Let us try to understand it by an example. Suppose a receiver receives a segment with sequence number 12345 at time 12:30:15 with lifetime 60 seconds (earlier lifetime was of two minutes). Suppose receiver receives the segment with sequence number 12345 at 12:30:25, one minute is not yet elapsed and the receiver understands it to be duplicate and it is discarded.
TCP uses a 32-bit sequence number for this work. Internet default lifetime is 2 minutes. TCP can use a window scaling option to increase the window size to 232 and default lifetime can be reduced at the time of installation.
The problem is solved but again with a minor glitch. What if a machine goes down and forgets all these sequence number calculations?
When the machine goes down
Suppose sender is sending and TCP is obliging it with sending segments one after another. TCP has sent the segments till 1 to 5000 for example. The sender should not number any other segment any value between 1 to 5000 for next two minutes.
If the machine runs without any problem, the TCP successfully remembers the sequence numbers of the segments sent during last two minutes and do not repeat them. However, there is a problem. What if a machine goes down and TCP goes down with it? If the machine starts immediately, the user may restart the application without any delay and TCP also starts blasting at full speed yet again. There is a possibility that the new TCP connection accidentally uses the same sequence number which it was using a few seconds back. In that case, the innocuous new segment with the same sequence number as old is falsely considered to be duplicate and discarded.
A researcher called Tomlinson found the solution to this problem. Every computer has RTC (real time clock) as part of its hardware. The RTC is designed in a way that even when the computer goes down, it continues to run. The idea that Tomlinson gave was to pick up 32 higher digits of the clock as an initial sequence number of a TCP connection (and more if the window scale factor is used). Even when the TCP is down and forgot the older sequence numbers, the clock keeps ticking and when a new connection is established and a new initial sequence number is chosen, it is larger than the previous values.
Great! But the problem is not yet solved. The initial sequence number is different for different connection as well as larger than the previous connection. The sequence number of other segments are decided based on the amount of data being sent over each segment. The TCP may start sending at a very fast speed; faster than the clock itself, it might generate a duplicate sequence number which clock would choose for a new connection, as an initial sequence number. On the other hand, if the clock wraps around, it starts all over from zero and repeats numbers. This might match with a running connection if the connection is sending data at a much slower speed.
To handle that, two measures are taken. First, the clock runs much faster than the TCP speed. And second, the clock uses a much larger number of digits and does not wrap around quickly.
Finally, the connection establishment process also is plagued by the delayed duplicate problem. Tomlinson also provided a solution to that problem. When the connection is being established, we are beginning with new sequence numbers and thus we should be careful about delayed duplicates of connection requests and acknowledgments coming in and establishing a connection when none of the communicating parties want it.
Connection establishment problem and solution
Let us try to understand this problem by an example. Consider two cases depicted in figure 25.10. The figures are self-explanatory. In both cases, there is one connection which is established without the knowledge of either of the parties. Tomlinson also suggested a solution to that problem, known as 3-way-handshake. If you have closely observed the figure 25.10 you can see that this process is basically a 2-way handshake where a sender sends and the receiver acknowledges. The three-way handshake increases one additional phase.
Here is the solution in form of a three-way handshake. In this solution, the sender sends a CR with a typical initial sequence number (based on the RTC value) and the receiver responds back with its own initial sequence number (from its own RTC) as well as the ack of the CR. The sender, finally responds back with the ack of the receiver’s response.
The 25.11 (a) depicts a normal 3-way handshake where the sender uses initial sequence number as 45 while the receiver uses a number 99. Case depicted in 25.11(b) indicates a delayed duplicate of CR reaching a receiver. The receiver, unknowingly, sends the ack with its own initial sequence number 99. The receiver cannot complete the connection unless the sender sends back the ack. Sender, upon receipt of the receiver’s Ack 46, understands it to be an answer to request not sent and so denies the connection.
25.11(c) depicts a case where the receiver not only has the duplicate CR but duplicate ack as well. The sequence number, which receiver used in its response is different than what the sender has sent and thus it won’t match and the connection is not established yet again. Is there a possibility that receiver’s response (with sequence 149) and duplicate ack (with sequence number 99) ever use the same sequence number? No way, the receiver cannot use the same sequence number till the time old segment with same sequence number dies and thus it is unlikely that receiver has generated one segment with sequence number 99 and again generate that when the duplicates are still around. Thus, all problems are handled with 3-way-handshake.
Summary
We have looked at different types of timers in this module including persist timer, keep alive timer, time wait, and fin-wait-2 timers. We have also seen how delayed duplicates pose a serious threat to TCP communication and how Tomlinson solved the problem by providing a simple method to choose an initial sequence number for a new TCP connection. Finally, we have seen that connection establishment process also is plagued by the same problem and the solution is to use a three-way handshake.
you can view video on TCP Timers and delayed duplicates problem in TCP |
References
1. Computer Networks by Bhushan Trivedi, Oxford University Press
2. Data Communication and Networking, Bhushan Trivedi, Oxford University Press