Showing posts with label TCP. Show all posts
Showing posts with label TCP. Show all posts

Tuesday, March 19, 2013

TCP in Detail

Hi folks! In my previous post, we had an overview at Transmission Control Protocol. In this post, I want to dive deeper into details of TCP, try to explain what it does and what features it has. My posts on TCP will go on, but I think this post will help you grasp the fundamentals and the general operation of this fancy protocol. TCP is a rather complex protocol, therefore I suggest you to get down on your seat and try to keep your focus while you are reading.

What does TCP do?

TCP/IP Protocol Suite is the set of communications protocols on which the Internet and similar networks are built. As its name suggests, Transmission Control Protocol (TCP) and Internet Protocol (IP) are the most important protocols within the suite.

IP in TCP/IP Protocol Suite deals with classic network-layer tasks such as addressing, datagram packaging and routing, which provide basic internetworking capabilities. TCP, on the other hand, can be considered as a nice user interface to the capabilities of IP. It basically fills in the capabilities that IP does not provide. IP is connectionless, unreliable and unacknowledged. Data is sent over an IP internetwork in a "best effort" manner. No connection is established, there is no guarantee that the data is sent to the destination and the sender doesn't know if the data is got to the destination. Many applications, however, need to be able to know that the data they send will get to its destination without loss and error. Besides, they would want the connection between two devices to be managed and the problems such as congestion and flow control to be taken care automatically by the mechanism which would manage the connection. Unless any special meschanism were dedicated, applications would need to carry out these tasks individually. But, as you can imagine, that would be a serious waste of effort. Hopefully, OSI Referefence Model presents Transport Layer to handle all these important issues and TCP is a full-featured Transport Protocol.

Shortly speaking, TCP handles connections and provides reliability and flow control. Reliability can be defined as ensuring that the data which is sent actually arrives its destination, otherwise, detecting this and resending the data. Data Flow Control is managing the rate at which data is sent so that it does not overwhelm the device that is receiving it. TCP is a rather complex protocol that includes a number of sophisticated functions to ensure that applications function in the potentially difficult environment of a large internetwork.

Now let's hit the road and see some important functions of TCP:

TCP Addressing with Ports

TCP/IP is designed to allow different applications to send and receive data simultaneously by using the same Internet Protocol (IP) Software on a host device. It is necessary to multiplex transmitted data from many applications as it is passed down to the IP Layer in order to achieve simultaneous data sending. Accordingly, as the data is received, data is demultiplexed and the appropriate data is passed to each application software on the receiving host.

Transport Layer Protocols TCP and UDP represent a transition point between OSI model hardware-related layers (Layer 1,2,3) and the software-related layers (Layer 5,6,7). In regular usage of internet, most of us are usually running several different applications simultaneously. So, a typical TCP/IP host has many processes which want to send data to and receive data from remote hosts. All data to be sent must be sent using the same interface to the internetwork, using the IP Layer. This means that the data from all applications that need TCP/IP communication is aggregated for the use of IP (Network) Layer. The stream of data to be sent is packaged as segments in Transport Layer and then the segmented data is passed to the IP Layer, where they are packaged as IP datagrams and sent out to the internetwork in order to be delivered to different destinations. This technical term is called multiplexing. 

A complementary mechanism is responsible for the receipt of datagrams. While IP Layer multiplexes datagrams from several different application processes to be sent out, at the same time, it receives many datagrams that are intended for different processes and that come from different remote hosts. This is the reverse of multiplexing, which is called demultiplexing.

Here comes the question: Think of a series of IP datagrams that are received by a host. How are they demultiplexed to different application processes in the receiving host device? Does Destination Internet Address Field in IP Header help? Well, it actually sucks in this case. It is  because all the datagrams that are received in the IP Layer are expected to have the same Destination Internet Address field (which is supposed to be the receivng host device's IP).

So, how can we manage that? Demultiplexing the received data to the application processes is achieved in 2 steps actually. In IP Layer, Destination Protocol field in the IP header indicates where to send the decapsulated data in the received datagram. This is most probably TCP or UDP. This means TCP or UDP must figure out which process to send the data to. To make this possible, an additional addressing element is necessary. This address allows a more specific location which is a software process to be identified within a particular IP address. In TCP/IP, this transport layer adress is called a port : TCP/IP Transport Layer addressing is accomplished using ports. Each port number within a particular IP device identifies a software process. 

TCP header (UDP header as well) has 2 adressing fields: Source Port and Destination Port. These are analogous to the Source Internet Address and Destination Internet Address fields in the IP Header, but at a higher level of detail. They identify the originating process on the source machine and the destination process on the destination machine. They are filled in by TCP Software (or UDP Software) before transmission. The ports are used to direct the data to the correct process on the destination device.

The figure shows how TCP and UDP ports are used to achieve software multiplexing and demultiplexing.


To sum up,  application process multiplexing and demultiplexing in TCP/IP is implemented using the IP Protocol field and the UDP/TCP Source Port and Destination Port fields. Upon transmission, the Protocol field is given a number to indicate whether TCP or UDP was used, and the port numbers are filled in to indicate the sending and receiving software process. The device receiving the datagram uses the Protocol field to determine whether TCP or UDP was used, and then passes the data to the  software process indicated by the Destination Port number.

TCP Data Handling

Looking at the picture from OSI Reference Model Perspective, when an application process wants to send data out to the internetwork, the data is grouped in messages. The messages can be regarded as a letter in envelope, containing a piece of information. As the message is passed down to lower layers, it is encapsulated in the lower layers' headers until it is sent out by the Physical Layer.

TCP Segments
TCP, as a handy protocol, is capable of accepting application data of any size and is responsible for dividing the big streams of data to the data segments that Internet Protocol could handle. This is why, we describe TCP as a Stream-Oriented Protocol. IP, on the other hand, is a message-oriented protocol, and truth to be told, it badly needs TCP to handle the large streams of data sent by an application process. TCP's Stream Orientation Capability provides serious flexibility to the applications since they don't need to worry about data packaging, and they can send files or messages of any size. TCP takes care of packaging these bytes into messages called segments.

Sequence Numbers and Message Identification
TCP is a reliable transport protocol. It means that TCP needs to keep track of all data it receives from an application in order to make sure that all of the data is received by the destination. Moreover, TCP must make sure the data is received in the order it was sent, and must retransmit any lost data (God bless you son, you are a hero!!).

Sequence numbers in TCP headers help TCP handle reliability. Data segments that are grouped by TCP, travel in IP diagrams. It is probable that they can be lost or delivered out of order during transmission. To prevent data loss: the sender increments the sequence number for each byte it sends to the receiver and fills the TCP Header's Sequence number field with the Sequence Number of the last byte it is sending in the data segment. The receiver acknowledges the last byte it receives. If the receiver does not acknowledge all the bytes in a given timeout value, then the sender interprets the situation as some data is lost, therefore it retransmits the segments which comprise the lost bytes. To order the receiving segments: The receiver collects the data from arriving segments, looks at the Sequence Number Fields in the arriving  segments and reconstructs an exact copy of the stream. 

As mentioned above, the segments/messages involve message identifiers which are actually sequence numbers and the receiver uses this identifier in the acknowledgement. Message identification is important in order to handle data integrity and prevent data loss.

TCP Windowing

We already know that the sender has to wait for acknowledgement for some time after it sends data to the receiver. Now, let us consider different cases: Suppose, sender has to wait for acknowledgement after each byte it sends. That would be an awful waste of performance, the communication would be interrupted so often. OK, let us suppose the sender has to wait for acknowledgement after each segment it sends. Well, this seems  better, but why the hell should I wait after each segment I send? This way, my data throughput would still be not as good as I want. Isn't there a better option? OK boy, what if we get acknowledged after we send many segments? That absolutely sounds better!! This way, our data sending would be interrupted less and data throughput would be better.

A TCP Window is the amount of unacknowledged data a sender can send on a particular connection before it gets an acknowledgement back from the receiver. In other words, it is the number of data segments the transmitting machine is allowed to send without receiving an acknowledgement for them. The Window size is expressed in number of bytes and is determined by the receiving device when the connection is established and it can vary later.

The sending device can send all segments within the TCP Window size (as specified in the TCP header) without receiving an ACK, and should start a timer for each of them. The receiving device should acknowledge each segment it received, indicating the sequence number of the last well-received packet. After receiving the ACK from the receiver, the sender slides the window to the right side.


TCP basically places a memory buffer between the application and the network data flow. The buffer allows TCP to receive and process data independently of the upper application. The main purpose of the sliding window is to prevent from the sender to send too many packets/segments to overflow the network resource or the receiver's buffer.

Window announcements are sent by the receiver to the sender when the receiver acknowledges data receipt and the window announcement simply informs the sender of the current window size. If a window size of zero is reported, the sender must wait for an acknowledgement before sending the next segment of data. If the receiver reports that the buffer size is larger than the size of a single data packet/segment, the sender figures out that it can send multiple segments before waiting for an acknowledgement. Transmitting multiple segments between acknowledgements allows data to be transferred faster and more efficiently.   


TCP Window Size
One important concept to be mentioned in detail is the Window Size in TCP Header, which helps the receiver not to be overwhelmed by extra data that it cannot handle at a time. Think of a web server which has to service thousands of clients simultaneously. In this case, the server would want to inform clients that establish connection to it saying that: I want to handle the following number of messages from you at a time". The client would use this send limit to restrict the rate at which it sent messages to the server. The server could adjust Window Size depending on its current load and other factors to maximize performance in its communication session with the client. This enhanced system would thus provide reliability, efficieny and basic data flow control.

Please note that TCP Window Size and Maximum Segment Size (MSS) are different concepts. MSS is a parameter which specifies the data size in TCP segment in terms of bytes. TCP Window size is the parameter that specifies the size of TCP Window, which comprises TCP segments.

Positive Acknowledgement with Retransmission
As mentioned earlier, IP is unrealiable. It works in a "send and forget" manner. From another perspective, it is an open loop system. There is no feedback from the receiver, therefore the sender never knows if the transmitted datagram gets to the destination. TCP, as a complementary protocol to IP, provides a closed loop system with the acknowledgement feedback mechanism it presents. Since IP is unreliable, there may be situations like the message may in fact never get to its destination or the acknowledgement from the receiver gets lost on his way back to the sender. In such cases, the sender would wait for the acknowledgement forever. To prevent this from happening, when the sender first sends the message, it starts a timer. This timer allows sufficient time for the message to get to the receiver and the acknowledgement to travel back, plus some additional time to allow for possible delays. If the timer expires before the acknowledgementis received, the sender assumes there was a problem and retransmits its original message. This method is called "Positive Acknowledgement with Retransmission" (PAR).

Acknowledgements as well as sequence numbers play an important role in order to achieve reliability in a TCP connection. Here, I want to show you a figure which depicts how Acknowledgement works:



Looking at the figure, we see that the window size is 3 data segments. Host B sends 3 data segments to Host A and they are received in perfect condition so, Host A sends an "ACK 4" acknowledging the 3 data segments and requesting the next 3 data segments which will be 4, 5, 6. As a result, Host B sends data segments 4, 5, 6 but 5 gets lost somewhere along the way and Host A doesn't receive it so, after a bit of waiting, it realises that 5 got lost and sends an "ACK 5" to Host B, indicating that it would like data segment 5 retransmitted. Now you see why this method is called "Positive Acknowledgement with Retransmission".

At this point Host B sends data segment 5 and waits for Host A to send an "ACK" so it can continue sending the rest of the data. Host A receives the 5th data segment and sends "ACK 7" which means 'I received the previous data segment, now please send me the next 3'. The next step is not shown on the diagram but it would be Host B sending data segments 7, 8 and 9.

Monday, March 4, 2013

TCP Overview

TCP and OSI Model

Understanding how each network protocol fits into the OSI Model is important. Transmission Control Protocol (TCP) is placed at the Transport Layer of the OSI Model (Layer 4). Shortly speaking, Transport Layer provides transparent transfer of data between end users, providing reliable data transfer services to the upper layers. 


TCP is a full-featured transport layer protocol. It provides all the functions needed by a typical application for the reliable transportation of data across an arbitrary internetwork. It provides transport-layer addressing for application processes in the form of TCP Ports. TCP allows these ports to be used in establishing connections between machines (TCP is a connection-oriented protocol). Once connections have been created, data can be passed bidirectionally between two devices (full-duplex communication). Applications can send data to TCP as a simple stream of bytes. IP is a packet-based protocol while TCP is stream-based. TCP is built on top of IP, therefore TCP is responsible for breaking the streams of data that come from upper layer protocols to the packets. TCP handles packaging and sending the data as segments. Please note that the TCP segments are packaged into IP datagrams by Network Layer (OSI Layer 3). The receiving device's TCP implementation reverses the process and it passes the stream of data originally sent up to the applicationThe more we dive into TCP, the better you will see how TCP fits OSI Transport Layer.

The diagram below shows where TCP Header is located within a frame that has been generated by a host and sent to the network:


Where and When to use TCP

TCP, as a protocol, is not restricted to any type of network topology. Either in local area network (LAN) or wide area network (WAN), TCP is able to transport data from one location to the other.  

You may probably know that UDP (User Datagram Protocol) is another Transport Layer protocol (well, you learn it now if you haven't known that before). So, when should we use TCP and when should we use UDP?

TCP guarantees packet delivery and thus can be considered as "reliable". On the other hand, UDP is a best-effort service. UDP is "unrealiable", it provides no guarantees for delivery and no protection from duplication. The simplicity of UDP, however, reduces the overhead. A computer may send UDP packets without establishing a connection to the recipient. The computer completes the appropriate fields in the UDP Header (which is more light-weight compared to the TCP header) and forwards the data with the header to the IP Network Layer.

To sum up, TCP is for high-reliability data transmissions and UDP is for low-overhead transmissions. Well, simply, use UDP in applications where reliability is not critical but speed is (video streaming applications or networked gaming applications), and use TCP for the rest.

The Concept Of A Transport Protocol

Well, if you manage to read this post until here, you should have already grasped the idea that TCP is a transport protocol which means it is used to transfer the data of upper layer protocols.

Below, you will see a magically useful diagram which probably is the simplest way to show the concept of a transport protocol: