In this post, I provide a short and thorough explanation for TCP sockets used in UNIX network programming. Let’s begin by assuming that out on the internet is a server that you would like to communicate with. The server is already on and listening for requests. The client (you) open a web browser and type in the website you want to see. This is the connection request. The server receives the request and sends a reply back.
This goes back and forth until the client closes the connection which ends the session. The server then closes its end of the connection and listens for new client requests. Take a look at the flow chart below to see the entire process steps.
To perform network input and output, often seen as I/O, the process calls the socket function. This defines the type of communication protocol, such as TCP over IPv4, UDP over IPv6, and others. Next, let’s look at a code example for a socket function.
Socket Function Explanation
#include <sys/socket.h> int socket (int family, int type, int protocol);
The function returns a non-negative descriptor if the arguments are okay, otherwise a -1 for an error.
family defines the protocol family, type defines the protocol type, and protocol defines…you guessed it, the protocol type. See the figure below for further details on these arguments.
|AF_LOCAL||Unix Domain Protocols|
|SOCK_SEQPACKET||sequenced packet socket|
|IPPROTO_TCP||TCP transport protocol|
|IPPROTO_UDP||UDP transport protocol|
|IPPROTO_SCTP||SCTP transport protocol|
When successful, the socket function returns a small non-negative integer value. This is called the socket descriptor, or a sockfd.
The “AF_” prefix is an acronym for “address family” and the “PF_” prefix stands for “protocol family.”
Connect Function Explanation
The connect function is used by a TCP client to establish a connection with a TCP server.
#include <sys/socket.h> int connect(int sockfd, const struct sockaddr *servaddr, socklen_t addrlen);
The connect function returns a 0 if the arguments are okay, otherwise a -1 for an error.
If you recall from the socket function example, the sockfd is a socket descriptor. The const struct sockaddr *servaddr is a pointer to a socket address structure, while socklen_t addrlen is the size. The socket address structure must contain the IP address and port number of the server.
The connect function initiates a TCP three-way handshake (SYN, SYN/ACK, ACK). The function returns with only two possibilities, either when the connection is established or if errors occur. I will describe some of the error possibilities to watch for.
Connect Function Errors
- If the client TCP receives no response to its SYN segment, ETIMEDOUT is returned. For example, a SYN segment is sent when connect is called, after 75 seconds pass and no response is received, the error is returned to the client.
- If the server’s response to the client’s SYN is a reset (RST), this means that there are no processes waiting for connections on the server host machine at the port described. The error received at the client’s side is ECONNREFUSED. This is considered a hard error.
- RST is a type of TCP segment that is sent when something is wrong. There exist three situations that cause a RST.
- A SYN segment arrives for a port that has no listening server.
- TCP wants to abort an existing connection
- TCP receives a segment for a connection that does not exist.
- If a client’s SYN evokes an ICMP (Internet Control Message Protocol) “destination unreachable” from the intermediate router, this is a soft error. The returned error is EHOSTUNREACH or ENETUNREACH.
Bind Function Explanation
The bind function assigns a local protocol address to a socket. With IP (Internet protocol), the protocol address is the combination of a 32-bit IPv4 address like 220.127.116.11 or a 128-bit IPv6 address like 2001:0db8:85a3::8a2e:0370:7334 , including a 16-bit TCP or UDP port number.
#include <sys/socket.h> int bind (int sockfd, const struct sockaddr *myaddr, socklen_t addrlen);
This function returns a 0 if the arguments are okay, otherwise a -1 if errors occur.
Again we see sockfd which is the socket descriptor from the socket function. The const struct sockaddr *myaddr is the pointer to a protocol-specific address and socklen_t addrlen is the size of the address structure. With TCP, calling the bind function allows the specifying of a port number, IP address, both, or neither.
A process can bind a specific IP address to its socket and the IP address must belong to an interface on the host. TCP clients assign the source IP address that will be used for IP datagrams sent on the socket. TCP servers prohibit the socket to receive incoming client connections going to that IP address.
Listen Function Explanation
Only TCP servers call the listen function. Listen provides two actions:
- When the socket function is called, a socket is created and it is assumed to be active. The listen function transforms an unconnected socket into a passive one. This tells the kernel to accept incoming connection requests directed to the socket.
- The function identifies the maximum number of connections the kernel should queue for this socket.
#include <sys/socket.h> int listen (int sockfd, int backlog);
For the backlog argument, the kernel has two states to choose from:
- An incomplete connection queue, which is an entry for each SYN segment that has arrived from a client. The server waits for the completion of the TCP three-way handshake. These sockets are in what is called the SYN_RCVD state.
- A completed connection queue, which comprises of entries for each client that has completed the TCP three-way handshake. These sockets are referred to as in the ESTABLISHED state.
Consider the diagram below regarding the queue states for a given listening socket.
Here is a figure that shows the TCP three-way handshake and the two queue states for a listening socket.
RTT is an acronym for Round Trip Time which is normally near 150 ms.
Accept Function Explanation
The accept function is called by a TCP server to return the next completed connection from the completed connection queue. The process is put into a sleep state if the completed connection queue is empty.
#include <sys/socket.h> int accept (int sockfd, struct sockaddr *cliaddr, socklen_t *addrlen);
The accept function call returns a non-negative descriptor if the arguments are okay or a -1 if errors occur.
The cliaddr arguments are used to return the protocol address of the connected client. If the accept function call is successful, the return value is a new descriptor created by the kernel which is called a connected socket.
All clients and servers using TCP begin with a socket function call which returns a socket descriptor. Clients then call the connect function while servers call bind, listen, and accept functions.
I hope you enjoyed this brief explanation of TCP sockets for UNIX network programming. If you want to put this information to use, please check out my Guide to Writing a Network Client Application in Python to Connect with a Server Via TELNET.
Unix Network Programming, Volume 1: The Sockets Networking API (3rd Edition) 3rd Edition by W. Richard Stevens, Bill Fenner, Andrew M. Rudoff.