IC221: Systems Programming (SP17)


Home Policy Calendar Units Assignments Resources

Unit 8: Networking

Table of Contents

1 What is the internet?

The Internet by definition is a network of networks composed of computeres. As a non-technical term, we use the term Internet as a catchall for all connected computers, but in technical terms, it is just one part of a larger ecosystem of networks and protocols that enable the sharing of information.

This is not a class on networking or the internet, but the ability to communicate over a network is an integral part of modern operating systems. The programming interface provided byt he OS is called socket programming, and it provides a unified model for interacting with networked components. It is implemented by the operating system, within kernel, and there is a standard set of system calls used to request the O.S. to complete tasks.

But, to really understand network programming, you have to first have a decent understanding of the protocols that underly the Internet, and one thing you learn quickly about network programming is that the protocol is king. Understanding the protocols is will make you a better programmer.

1.1 Packet Switching

The internet is a packet switched network. A packet is defined as follows:

lec-21-packet.png

Figure 1: Packet with Header and Payload

All packets have a header, which stores the address or destination of a packet, and a payload which stores the data or message of the packet. The switching part of packet switching is that at network devices, like routers and switches, the packet arrives, and based solely on the header of the packet, the device knows where to send the data next. There are no pre-defined routs for data but the protocols ensure that the next hop in the path to the destination can be determined. As you might imagine, in such a model, addressing becomes very important.

1.2 The TCP/IP Protocol Stack

The Internet is modelled as a protocol stack where each protocol defines a different interaction layer. Information flows up and down the protocol stack, and at each layer, a different protocol comes to bare for forwarding the packet onward to the next hop.

lec-21-osi.png

Figure 2: TCP/IP Protocol Stack

Each layer has different goal in mind. Starting with the physical layer, the main purpose is to actually transmit 1's and 0's over medium, like a wire. The link layer adds protocols for how the medium is shared across many connected devices, as well as error correction. An example of protocols on the link layer is ethernet or wifi.

The internet and transport layer of are particular interest in this class because you will interact directly with these protocol through their addressing schemes. The purpose of the internet layer is to inter-connect networks; for example, the USNA has a network, and if you want to send data to Google, your packets must traverse the USNA network and potentially many other federated networks before finally reaching a server at Google. The internet layer both describes the protocols for how networks inter connect with each other and the way that computers are identified, via the internet protocol address or ip address.

At a certain point, though, two processes running on different computers are actually sending and receiving data across the vastness of the Internet. The transport layer provides an abstraction for those two process to apear as if they are communicating directly with each other. Since many process can be communicating on the network at the same time, the transport layer also provides a mechanism, called ports, to differentiate communication destined for one process versus another.

Finally, at the application layer, additional protocols are available depending on the task at hand. For example, SMTP is used to transmit email messages and HTTP is used to download web content and BitTorrent is used to pirate music and videos :) From a generic programmers perspective, the application layer is the domain where we get to choose what data is sent and received and how that data is interpreted; the systems programming perspective also concerns itself with system calls that enable that communication.

2 Internet Addressing

Each layer has its own addressing scheme and information needed to perform routing/switching. This information is traditionally encoded within the header of the packet. There are two key addressing systems that we will use in this class, ip addresses and ports. Additionally, we also refer to computers by name, a domain name, which must be translated into an address.

2.1 IP addresses

An ip address is a 4-byte/32-bit number in Version 4 of TCP/IP protocol. We usually represented in a dotted quad notation:

     4-bytes
 _____________
/             \
192.168.128.101
\_/
 |
1 byte

A byte is 8-bits, and thus can represent numbers between 0-255, which is why ip addresses do not have numbers greater than 255. The ip address is hierarchical where bytes to the left are more general while bytes to the right are more general. Based on a subset of the bytes, routers can determine where to send a packet next.

2.2 Domain Names and host

While IP addresses are somewhat usable, it is quite a burden to memorize the ip address of all the computers we might want to vist on the web. For exampel, while I might know the ip address of a single computer off hand, e.g., "10.53.37.142" is the ip address of a lab machine, I can't recall the ip address of Google or Facebook or much of anything else.

Instead, we use domain names to identify a networked device. A hostname, is a dotted string of names, usually ending in the canonical .com or .org or .edu or .gov or etc. For example, the domain name for Google is google.com, but the Internet does not function on domain names. It needs IP addresses. A separate protocol called the Domain Name Service or DNS is tasked with converting domain names into IP addresses.

2.2.1 host

We can access the DNS system on Unix through the host command. For example, suppose we wanted to learn the IP address of google.com:

#> google.com
google.com has address 74.125.228.110
google.com has address 74.125.228.103
google.com has address 74.125.228.96
google.com has address 74.125.228.99
google.com has address 74.125.228.104
google.com has address 74.125.228.101
google.com has address 74.125.228.102
google.com has address 74.125.228.97
google.com has address 74.125.228.100
google.com has address 74.125.228.105
google.com has address 74.125.228.98
google.com has IPv6 address 2607:f8b0:4004:803::1003
google.com mail is handled by 30 alt2.aspmx.l.google.com.
google.com mail is handled by 40 alt3.aspmx.l.google.com.
google.com mail is handled by 20 alt1.aspmx.l.google.com.
google.com mail is handled by 10 aspmx.l.google.com.
google.com mail is handled by 50 alt4.aspmx.l.google.com.

The output may not be what you expect. There are many, many different IP addresses available to server google.com, and this is by intention to balance the load of request across multiple machines. In fact, every time you rerun host, you'll find that you get a different set of IP address:

#> host google.com
google.com has address 74.125.228.104
google.com has address 74.125.228.101
google.com has address 74.125.228.102
google.com has address 74.125.228.97
google.com has address 74.125.228.100
google.com has address 74.125.228.105
google.com has address 74.125.228.98
google.com has address 74.125.228.110
google.com has address 74.125.228.103
google.com has address 74.125.228.96
google.com has address 74.125.228.99
google.com has IPv6 address 2607:f8b0:4004:803::1003
google.com mail is handled by 50 alt4.aspmx.l.google.com.
google.com mail is handled by 30 alt2.aspmx.l.google.com.
google.com mail is handled by 40 alt3.aspmx.l.google.com.
google.com mail is handled by 20 alt1.aspmx.l.google.com.
google.com mail is handled by 10 aspmx.l.google.com.

If we were to query a less used domain name, one that isn't serving as much traffic as google, we get IP addresses that are a bit more sane. For example, let's see what the IP addresses are for www.usna.edu:

#> host www.usna.edu
www.usna.edu is an alias for webster-new.dmz.usna.edu.
webster-new.dmz.usna.edu has address 10.4.32.41

You can even use host to do a reverse DNS lookup, that is, lookup the domain name based on an IP address:

#> host 10.4.32.41
41.32.4.10.in-addr.arpa domain name pointer webster-new.dmz.usna.edu.

2.3 Ports and /etc/services

The last bits of addressing relevant to this class is the port address. While the IP address is used to deliver packets to a destination computer, the port address is used to deliver the packets on the computer to the right process. Consider that a single computer all share the same IP address, there are many different applications using that connection at the same time. You might have multiple web pages open with email and playing games and etc, each of those interactions is performed by a separate process but all the data arrives at the computer through a single point.

The port address is a way for the Operating System to divide up the data arriving from the network based on the destination process. Additionally, ports tend to be tightly coupled with applications. For example, to initiate a HTTP connection for web browsing, you connect using port 80; to initiate a secure shell connection with ssh, you connect using port 22; and, to initiate a connection to send email, you connect using port 25, and so on. What makes ports important is that all those services, web server, ssh, and email, can all be running on the same computer. The ports allows the operating system differentiate traffic for each application.

2.3.1 /etc/services stores port application mappings

The mapping of ports to applications is deeply ingrained within the Unix systems, and often many programs wish to quickly map a port to an application. To assist in that process, most Unix system ship with a list of the current standards in port mapping, which is stored in the /etc/services file.

# Network services, Internet style
#
# Note that it is presently the policy of IANA to assign a single well-known
# port number for both TCP and UDP; hence, officially ports have two entries
# even if the protocol doesn't support UDP operations.
#
# Updated from http://www.iana.org/assignments/port-numbers and other
# sources like http://www.freebsd.org/cgi/cvsweb.cgi/src/etc/services .
# New ports will be added on request if they have been officially assigned
# by IANA and used in the real-world or are needed by a debian package.
# If you need a huge list of used numbers please install the nmap package.

tcpmux		1/tcp				# TCP port service multiplexer
echo		7/tcp
echo		7/udp
discard		9/tcp		sink null
discard		9/udp		sink null
systat		11/tcp		users
daytime		13/tcp
daytime		13/udp
netstat		15/tcp
qotd		17/tcp		quote
msp		18/tcp				# message send protocol
msp		18/udp
chargen		19/tcp		ttytst source
chargen		19/udp		ttytst source
ftp-data	20/tcp
ftp		21/tcp
fsp		21/udp		fspd
ssh		22/tcp				# SSH Remote Login Protocol
ssh		22/udp
telnet		23/tcp
smtp		25/tcp		mail
time		37/tcp		timserver
time		37/udp		timserver
rlp		39/udp		resource	# resource location
nameserver	42/tcp		name		# IEN 116
whois		43/tcp		nicname
tacacs		49/tcp				# Login Host Protocol (TACACS)
tacacs		49/udp
re-mail-ck	50/tcp				# Remote Mail Checking Protocol
re-mail-ck	50/udp
domain		53/tcp				# Domain Name Server
domain		53/udp
mtp		57/tcp				# deprecated
tacacs-ds	65/tcp				# TACACS-Database Service
tacacs-ds	65/udp
bootps		67/tcp				# BOOTP server
bootps		67/udp
bootpc		68/tcp				# BOOTP client
(...)

This file continues on for some while, but the takeaway regarding the sheer expanse and variety of network applications is clear. Further, the use of ports to different one service from another is vital to providing the diversity of services.

3 The Client-Server Connection Model and Transport Protocols

Most interactions of applications are dictated by the client-server model. In this model there exists clients who are requesting a services for a server.

lec-21-client-server.png

Figure 3: Client Server Model

In the model, we describe clients as connecting to servers and servers listening to incoming connections. When a connection is established, or data is received, the server replies to the client with data as required by the application protocol.

While this class will focus on the client server model, there are other models of network interaction. For example, the peer-to-peer model is when clients act as both client and servers. This is common for many distributed systems, such as BitTorrent or Skype.

3.1 Reliable Transport: TCP or SOCKSTREAM

The client server model fits into the protocol stack at the transport layer. There are typically two types of transport available for programmers, reliable and unreliable transport. Interestingly, none of the protocols in lower layers ensure any reliability — at any time packets can be drop, misrouted, delayed, or generally deformed without notice. The fact that such things can happen on the network is actually a positive because the lower layers can be much more efficient without having to worry about reliable delivery.

lec-21-TCP.png

Figure 4: TCP Session

The TCP or Transmission Control Protocol was developed to provide reliable transport on the inherently unreliable lower layer packet deliver system. The big idea behind TCP is that it establishes a stream or session between client and server where the expectation of packet delivery and acknowledgment forms the basis of reliable transport mechanism. Essentially, when you are using TCP it is as if the client and server are communicating directly with each other, like via a pipeline, even though there may potentially be a huge network between them. In the parlance of socket programing, which we will discuss in the next lesson, TCP protocol is described as a SOCKSTREAM because it proves a stream of information, much like a pipeline.

3.2 Unreliable Transport: UDP or DATAGRAM

Reliable transport has a cost, though. The cost is the retransmission of lost or deformed packets and acknowledgements of properly received packets. In order to have reliable transportation, all information must be properly acknowledged upon receipt and if a packet was not properly received, then it must be retransmitted. The result is that there exists a significant overhead, and this is worsened by the fact that not all communication needs to be reliable — dropping a few packets here and there never killed anyone, yet.

The complementary protocol to TCP is UDP or User Datagram Protocol, which is an unreliable transport mechanism. The UDP protocol, or DATAGRAM protcol, does not make any guarentees about the delivery if a packet. It might get there … or it might not. Datagram protocols are not session driven either; without reliability, the client and the server need not stay in sync to ensure that all messages are acknowledged. Instead, a server just listens for incoming data from clients and thats that.

lec-21-UDP.png

Figure 5: UDP

You might be wondering when would this ever be useful? UDP is quite common for a number of applications; for example, live audio streams. There is no need for audio streams to be reliable, if you miss a packet, so what, you'll just get the next one and keep playing the music. However, if you were to do this reliably, you'd have to stop the music while missed data was retransmitted, and the result is you might keep getting further and further behind in the live stream.

4 Networking Command Line Tools

Now that you've seen how to write socket programs, it is useful to test these your programs using the "swiss army knife" of networking command line tools, netcat. Also, you can use a tool called netstat to monitor your open socket connections. As a programmer, these tools are often indispensable for debugging and understanding the functionality of your program.

4.1 netcat : the network "swiss army knife"

The netcat program has often been described as the "swiss army knife" of networking because it can do anything and everything. It's pretty amazing once you get into the details, but its more basic functionality is fairly useful already for our purposes. We'll see how to use it as both a client and server using both UDP and TCP protocols.

4.1.1 netcat client

When working as a client, netcat takes two arguments:

netcat dest port

The dest is a destination address, which can either be a IP address as a dotted quad or a domain name. The port is a number representing the port address. This is all we need to make netcat act like a web client, so let's connect to a web server

#> netcat www.cnn.com 80
GET /index.html 


<!DOCTYPE HTML>
<html lang="en-US">
<head>
<title>CNN.com - Breaking News, U.S., World, Weather, Entertainment &amp; Video News</title>
<meta http-equiv="content-type" content="text/html;charset=utf-8"/>
<meta http-equiv="last-modified" content="2014-04-03T13:48:56Z"/>
<meta http-equiv="refresh" content="1800;url=http://www.cnn.com/?refresh=1"/>
<meta http-equiv="X-UA-Compatible" content="IE=edge"/>
<meta name="robots" content="index,follow"/>
(...)

What we've just done is establish a TCP connection on port 80, the HTTP port for web traffic, and make a request to the HTTP server to send us the main page for cnn.com. And it works!

4.1.2 netcat server

netcat can also act as a sever by listening for incoming connections on a given port. You do this with the -l command:

#> netcat -l 1845

There is now a service running on 1845, netcat, and we can connect to it using another netcat client.

<div class="side-by-side"> <div class="side-by-side-a">

#> netcat -l 1845
Hello
What's your name?
adam
me too, how strange.
strange ......
#>

</div> <div class="side-by-side-b">

#> netcat localhost 1845
Hello
What's your name?
adam
me too, how strange.
strange ......
^C
#>

</div> </div>

The domain name localhost refers to the current computer, this way we don't always ahve to remember the IP address. In the above example, information is typed back and forth between the netcat servers and clients.

4.2 netstat : monitor current connections

The other very useful command for Unix uses is nestat, which displays a list of information about current network usage on the computer. It's best to just see an example.

Suppose in one terminal, I've created a netcat server.

netcat -l 1845

We can see that this was properly established by vieing the netstat output with the -l flag, indicating we are interested in listening servers. We will also use -n flag so all port numbers are displaed

#> netstat -ln
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:36688           0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:1845            0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:44469           0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN     
tcp        0      0 127.0.0.1:631           0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:25              0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:538             0.0.0.0:*               LISTEN     
tcp6       0      0 :::55662                :::*                    LISTEN     
tcp6       0      0 :::591                  :::*                    LISTEN     
tcp6       0      0 :::111                  :::*                    LISTEN     
tcp6       0      0 :::22                   :::*                    LISTEN     
tcp6       0      0 ::1:631                 :::*                    LISTEN     
tcp6       0      0 :::25                   :::*                    LISTEN     
tcp6       0      0 :::39612                :::*                    LISTEN     
udp        0      0 0.0.0.0:57189           0.0.0.0:*                          
udp        0      0 0.0.0.0:68              0.0.0.0:*                          
udp        0      0 0.0.0.0:111             0.0.0.0:*                          
udp        0      0 10.53.33.232:123        0.0.0.0:*                          
udp        0      0 127.0.0.1:123           0.0.0.0:*                          
udp        0      0 0.0.0.0:123             0.0.0.0:*                          
udp        0      0 0.0.0.0:538             0.0.0.0:*                          
udp        0      0 0.0.0.0:601             0.0.0.0:*                          
udp        0      0 0.0.0.0:33370           0.0.0.0:*                          
udp        0      0 127.0.0.1:608           0.0.0.0:*                          
udp        0      0 0.0.0.0:49817           0.0.0.0:*                          
udp        0      0 0.0.0.0:5353            0.0.0.0:*                          
udp6       0      0 :::35113                :::*                               
udp6       0      0 :::47936                :::*                               
udp6       0      0 :::56801                :::*                               
udp6       0      0 :::111                  :::*                               
udp6       0      0 ::1:123                 :::*                               
udp6       0      0 fe80::d227:88ff:fed:123 :::*                               
udp6       0      0 :::123                  :::*                               
udp6       0      0 :::601                  :::*                               
udp6       0      0 :::5353                 :::*      
(...)

There is quite a lot of output, but at the top is the good stuff. What you can see is the combination of IP address (0.0.0.0 and 127.0.0.0 indicating localhost) and port that are curently being listend on by services. If you look closely, you can see that, yes, the netcat listening on port 1845 is present.

Let's look at the output when we connect the client netcat

#> netcat localhost 1845

This time we run netcat without the -l flag because we are interested in established connections.

#> netstat -n
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0    160 10.53.33.232:22         10.53.33.254:57078      ESTABLISHED
tcp        1      0 10.53.33.232:47982      91.189.89.144:80        CLOSE_WAIT 
tcp        0      0 127.0.0.1:41217         127.0.0.1:1845          ESTABLISHED
tcp        0      0 10.53.33.232:22         10.53.33.254:54510      ESTABLISHED
tcp        0      0 127.0.0.1:1845          127.0.0.1:41217         ESTABLISHED
tcp        0      0 10.53.33.232:801        10.1.83.18:2049         ESTABLISHED
tcp        0      0 10.53.33.232:22         10.53.33.254:58882      ESTABLISHED
tcp        0      0 10.53.33.232:48491      10.1.68.11:445          ESTABLISHED
(...)

Looking through the list, we find 127.0.0.1:1845 which is our netcat server is connected to 127.0.0.1:54510, which is our netcat client. The same entry is also found in reverse, client->server.

Now, something might seem a bit off with this because both client and server were using the same port, which is true, but that is for establishing a connection. A server must be listening on a given port so that it can be reached, but once the connection is established, the port doesn't matter as long as client and server can talk to each other. In fact, a random port is chosen, in this case 54510, to facilitate that communication, and we'll see this in action when we discuss socket programming in detail in the following lessons.

5 Sockets

In systems programming a socket is much like a file; in fact, it is a file descriptor that happens to write to the network device rather than to a file on disc. As long as you think of it that way, you're in pretty good shape.

The system call to open a socket is socket():

int socket(int domain, int type, int protocol);

The arguments can be described as follows:

  • domain is the addressing domain of socket, which for our purposes is an internet socket or AF_INET
  • type is the type of socket describing which transport protocol we are using. HTTP traffic is over TCP, so that would be SOCK_STREAM. If we wanted to open a UDP socket, the keyword would be SOCK_DGRAM
  • protocol is additional information about the protocol of the socket, but we will not need to use this option, so we'll set it to 0.

The return value of socket() is a integer file descriptor on success, and -1 on failure. All further socket operations, connect(), read(), write(), bind(), accept() and etc, require that integer file descriptor.

6 Addressing

Before we can dive into socket programming, we have to deal with network addressing. Unfortunately, in C, addressing of sockets can be frustrating because depending on the protocol, an address can be of different lengths. On the surface, this might not cause an issue, but it is HUGE for C program because we deal with fixed size buffers. The difference between a 4-byte and 8-byte or 16-byte address is substantial.

At the same time, we also want our programs, even with different address lengths to look the same, use roughly the same structure types, and function in the same way. The result of these needs is that addressing of sockets in C can be complicated, requiring many forms of casting until, finally, you reach the type your looking. In between "… these are not the types you're looking for."

To simplify our discussion, we'll only be covering addressing for IPv4, which uses 4-byte address space. The shorthand for IPv4 is AF_INET, which you will see used throughout.

6.1 Glossary of Structures and Functions

There will be many new structures and functions we'll use in this lessons. Below is quick overview of each of the structures you need with a brief description, more detailed discussion follows.

  • AF_INET : The address family value for IPv4 addresses.
  • struct in_addr : type for storing a internet address, it has a single member s_addr which is a unsigned integer to store the 4 bytes of IPv4 address.
  • struct sockaddr : a generic socket address structure that is used for all addressing types. Often, we'll cast these to a struct sockaddr_in since we are only concerned with IPv4 addresses.
  • struct sockaddr_in : a specific socket address to store IP and port information information. It has 3 members:
    • short sin_faily : the address family, which should be AF_INET.
    • short sin_port : the port number in network byte order, converted using htons()
    • struct in_addr sin_addr : a struct in_addr to store the IP address itself.
  • struct addrinfo : a structure used by getaddrinfo() to store address information and hints. It has a following relevant members:
    • int ai_family : the address family, which should be AF_INET for us
    • struct sockaddr ai_addr : the socket address returned after a query

Here is a glossary of library/system calls we will use and their purpose:

  • inet_ntoa() convert a struct in_addr to a dotted quad string: "Network-to-Address"
  • inet_aton() convert a string of an IP address as a dotted quad into a struct in_addr: "Address-to-Network=
  • getaddrinfo() convert a domain name into an IP address, stored in the ai_addr member as a struct sockaddr in the struct addrinfo result
  • htons() convert a short stored in host byte order into a network byte order: "Host-to-Network"
  • ntohs() convert a short sroted in network byte order to host byte order: "Network-to-Host"

7 Storing an IP address in struct in_addr

The structure that stores 32-bit IPv4 address is struct in_addr:

//uint32_t is the same as unsigned int
typedef uint32_t in_addr_t;
struct in_addr
{
  in_addr_t s_addr;
};

The in_addr structure has one member, s_addr, whose type is uint32_t, which is just a fancy way of saying unsigned int. Let's look at an example of using the in_addr structure to store an IP address:

/* hello_rawip.c*/
#include <stdio.h>
#include <stdlib.h>
#include <netinet/in.h> //for struct in_addr
#include <arpa/inet.h>  //for inet_ntoa()

int main(){

  //in_addr struct has a single member s_addr
  struct in_addr addr;
  unsigned char * ip;

  //have ip point to s_addr
  ip = (unsigned char *) &(addr.s_addr);

  //set the bytes for "10.4.32.41"
  ip[0]=10;
  ip[1]=4;
  ip[2]=32;
  ip[3]=41;

  //print it out
  printf("Hello %s\n", inet_ntoa(addr));

}

Recall, from our conversations on types and casting, that this line:

ip = (unsigned char *) &(addr.s_addr);

sets the pointer ip to reference the address. Then we can set the bytes directly byte-by-byte since ip is a unsigned char pointer. At the end, we can print out the address in quad notation using the function inet_ntoa() which stands for network-to-address.

This is clearly a very cumbersome way to set addresses, so instead we can just use the dotted quad notation and have the operating system do the conversion for us. To do this, we use inet_aton() which stands for address-to-network, which converts an IPv4 address into struct addr.

/* hello_aton.c*/
#include <stdio.h>
#include <stdlib.h>
#include <netinet/in.h>
#include <arpa/inet.h>


int main(){

  //in_addr struct has a single member s_addr
  struct in_addr addr;

  //Convert the IP dotted quad into struct in_addr
  inet_aton("10.4.32.41", &(addr));

  printf("Hello %s\n", inet_ntoa(addr));

}

7.1 Storing IP Addresses in struct sockaddr and struct sockaddr_in

The next level of addressing is to combine an IP address with port information. Since we are concerned with internet addressing, we will use the struct sockaddr_in:

struct sockaddr_in {
    short            sin_family;   //address family, set to AF_INET
    unsigned short   sin_port;     //the port in network byte order
    struct in_addr   sin_addr;     //the inet address
};

This is just one kind of socket address, but a socket can be used for a variety of things, not just internet communication. As a result, there is also a generic socket type called struct sockaddr without the _in suffix. Know that whenever you see data typed as struct sockaddr you can convert it into struct sockaddr_in and vice versa. The two types are the same size, that is, occupy the same number of bytes in memory. The casting just changes how those bytes are interpreted.

In the following code, for example:

/*hello_sockaddr.c*/
#include <stdio.h>
#include <stdlib.h>

#include <netinet/in.h>
#include <arpa/inet.h>


int main(){


  //use a generic socket address to store everything
  struct sockaddr saddr;

  //cast generic socket to an inet socket
  struct sockaddr_in * saddr_in = (struct sockaddr_in *) &saddr;

  //Convert IP address into inet address stored in sockaddr
  inet_aton("10.4.32.41", &(saddr_in->sin_addr));

  //print out IP address
  printf("Hello %s\n", inet_ntoa(saddr_in->sin_addr));

}

We declare a generic socket address on the stack, but it is cast to a inet socket address to acces the sin_addr member.

8 Resolving Domain Names

The next part of addressing is the conversion of a domain name into an IP address. IP address are not completely unusable for humans, but they are not the preferred way to reference a remote host. Instead, we use the domain name. For example, when we go to the www.usna.edu that domain must be resolved into an ip address.

8.1 Converting a Domain Name into an IP address

The resolving protocol is called DNS, or Domain Name System, and it is implemented for us through the gataddrinfo() library function. Here is the function declaration:

int getaddrinfo(const char *node, const char *service,
                      const struct addrinfo *hints,
                      struct addrinfo **res);

Here is a description of the arguments:

  • node : a string of the address/domain name you wish to be resolved
  • service : name of the service for the domain you're interested in, set to NULL for our usage
  • hints : an addrinfo of "hints" describe the kinds of address information we are interested in related to the domain
  • res : a new addrinfo structure will be allocated and the pointer will be referenced by res

On errror, getaddrinfo() will return a non-zero value, which will set a special error informatio field. To catch errors you will use the following format style:

if( (s = getaddrinfo(hostname, NULL, &hints, &result)) != 0){
  fprintf(stderr, "getaddrinfo: %s\n",gai_strerror(s));  // <---- converts the error into a message
  exit(1);
}

The next part of using getaddrinfo() properly is understanding the struct addrinfo, which has the following members:

struct addrinfo {
          int              ai_flags;  
          int              ai_family;
          int              ai_socktype; 
          int              ai_protocol;
          size_t           ai_addrlen;
          struct sockaddr *ai_addr;
          char            *ai_canonname;
          struct addrinfo *ai_next;
      };

For our purposes, since we are only concerned with IPv4 internet addressing, we only need to focus on two fields: ai_family and ai_addr.

  • ai_family : indicates the address family of this address, should always be set to AF_INET
  • ai_addr : a socket address storing the resolved IP address

One peculiar aspect of a getaddrinfo() call beyond retrieving the results is that you must also specify hints to the kinds of addresses you are interested in. We are only interested in AF_INET address types, so we can always declare and use the hints option like so:

struct addrinfo hints;     

//zero out the hints structure (look up memset for details)
memset(&hints,0,sizeof(struct addrinfo));  

//set the ai_family field per our needs, AF_INET
hints.ai_family = AF_INET;

Now we have enough to write a hello-world program for resolving an domain name.

/* hello_getaddrinfo.c*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>

int main(){

  char hostname[]="www.usna.edu";  //the hostname we are looking up

  struct addrinfo *result;    //to store results
  struct addrinfo hints;      //to indicate information we want

  struct sockaddr_in *saddr;  //to reference address

  int s; //for error checking

  memset(&hints,0,sizeof(struct addrinfo));  //zero out hints
  hints.ai_family = AF_INET; //we only want IPv4 addresses

  //Convert the hostname to an address
  if( (s = getaddrinfo(hostname, NULL, &hints, &result)) != 0){
    fprintf(stderr, "getaddrinfo: %s\n",gai_strerror(s));
    exit(1);
  }

  //convert generic socket address to inet socket address
  saddr = (struct sockaddr_in *) result->ai_addr;

  //print the address
  printf("Hello %s\n", inet_ntoa(saddr->sin_addr));

  //free the addrinfo struct
  freeaddrinfo(result);
}

One thing to be careful about when dealing with getaddrinfo() is that the resulting address is stored in struct sockaddr but we are only using struct sockaddr_in. Fortunately, since we only hinted towards AF_INET, we know that the sockaddr is actually a sockaddr_in and so we can cast:

struct sockaddr_in *saddr;  //to reference address

//convert generic socket address to inet socket address
saddr = (struct sockaddr_in *) result->ai_addr;

Once we've called the data the right type, we can treat it like a sockaddr_in and get to the underlying in_addr:

//print the address
printf("Hello %s\n", inet_ntoa(saddr->sin_addr));

Last, but not least: getaddrinfo() allocates new memory to store the results addrinfo. It must be freed with freeaddrinfo()

//free the addrinfo struct
freeaddrinfo(result);

8.2 Resolving IP addresses to IP addresses?

One nice thing about getaddrinfo() is that it can take a domain name or an IP address. If it finds that you've provided an IP address, it will not resolve it and return it already set in the ai_sockaddr for the results. For example:

/* hello_getaddrinfo_ip.c*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>

int main(){

  char hostname[]="10.4.32.41";  //<--- Not a hostname, but an IP address

  struct addrinfo *result;    //to store results
  struct addrinfo hints;      //to indicate information we want

  struct sockaddr_in *saddr;  //to reference address

  int s; //for error checking

  memset(&hints,0,sizeof(struct addrinfo));  //zero out hints
  hints.ai_family = AF_INET; //we only want IPv4 addresses

  //Convert the hostname to an address
  if( (s = getaddrinfo(hostname, NULL, &hints, &result)) != 0){
    fprintf(stderr, "getaddrinfo: %s\n",gai_strerror(s));
    exit(1);
  }

  //convert generic socket address to inet socket address
  saddr = (struct sockaddr_in *) result->ai_addr;

  //print the address
  printf("Hello %s\n", inet_ntoa(saddr->sin_addr));  //<---- will print that IP address here

  //free the addrinfo struct
  freeaddrinfo(result);
}

8.3 Network Byte Order and Ports

The last part of the puzzle for addressing is the port number. Recall the sockaddr_in structure:

struct sockaddr_in {
    short            sin_family;   //address family, set to AF_INET
    unsigned short   sin_port;     //the port in network byte order
    struct in_addr   sin_addr;     //the inet address
};

You see that there is a member, sin_port and your instinct would be to set that directly. For example, if we wanted to contact www.usna.edu on port 80, after doing getaddrinfo(), we'd do something like this:

//convert generic socket address to inet socket address
saddr = (struct sockaddr_in *) result->ai_addr;

saddr->sin_port = 80; //<-- setting port in host order!

It turns out that this DOES NOT work, and it doesn't because of a fundamental problem with data representation. Let's consider the number 80 as it is represented in bits:

pow: 128  64  32  16   8   4   2   1
     ----------------------------------      
bits:  0   1   0   1   0   0   0   0

The above makes since, 8010 = 01010002. If we look more closely, we see that the bit furthest to the left is most significant, or has the highest value, 128. If you think a bit more about it, it seems like an arbitrary choice, couldn't the bit furthest to the left be most significant instead? Instead, we have a the following representation:

pow:   1   2    4   8  16  32  64  128
     ----------------------------------
bits:  0   0    0   0   1   0   1   0

Now we find that 8010 = 000010102. While this might seem a bit awkward, it really is no better nor worse than the other choice. In fact, in the early days of computing, there was a holy war about which order the bits should be written in.

The term used for the ordering of bits is called endian or endianness, and there are two camps: big endian and little endian.

  • Little Endian : The most significant byte is stored in the smallest address.
  • Big Endian : The most significant byte is stored in the biggest address.

In most modern computing systems, little endian is king, but back when the internet was designed and the equipment put in place, it was not clear what data representation should be preferred. As such, the network is big endian, or more precisely routing information is stored in network byte order while data on the host is stored in host byte order.

To facilitate this use, all Unix systems implement a set of conversion functions:

  • htons() : convert data from host order to network order for a short (or two-byte) type
  • ntohs() : convert data from network order to host order for a short (or two-byte) type

There area also hton*() for other types, like integers and longs and etc. On big endian machines host and network order are the same, so these functions do nothing, but it is good practice to always attempt to convert accordingly.

Now, finally, returning to the setting of the port address, we see that we actually must do so like such:

//convert generic socket address to inet socket address
saddr = (struct sockaddr_in *) result->ai_addr;

saddr->sin_port = htons(80); //<-- setting port in network byte order!!!

9 Socket System Call

In systems programming a socket is much like a file; in fact, it is a file descriptor that happens to write to the network device rather than to a file on disc. As long as you think of it that way, you're in pretty good shape.

The system call to open a socket is socket():

int socket(int domain, int type, int protocol);

The arguments can be described as follows:

  • domain is the addressing domain of socket, which for our purposes is an internet socket or AF_INET
  • type is the type of socket describing which transport protocol we are using. HTTP traffic is over TCP, so that would be SOCK_STREAM. If we wanted to open a UDP socket, the keyword would be SOCKDGRAM
  • protocol is additional information about the protocol of the socket, but we will not need to use this option, so we'll set it to 0.

The return value of socket() is a integer file descriptor on success, and -1 on failure. All further socket operations, connect(), read(), write(), and etc, require that integer file descriptor. Today we will focus just on the client side operations of a socket.

9.1 Client Sockets

A client sockets goal is to connect to a foreign address where a server socket is listening. Visually, we think of this process like so.

lec23-client-socket.png

Figure 6: Client Socket Life Cycle

A new socket is opened using the socket() system call, but the socket being open doesn't mean it is connected to anything. To do that, you use the connect() system call that takes as input a given address, IP-port pair. Once connected with the remote server, the client can read and write to that socket to participate using an application layer protocol. The connect system call takes the following arguments:

int connect(int socket, const struct sockaddr *address, socklen_t address_len);

Generally, given a socket socket and a socket address address, try and connect the socket to that foreign address. Note that the socket address is a generate socket (struct sockaddr) which is not necessarily an IP socket address (struct soaddr_in), so you'll need to cast:

int sock;
struct sockaddr_in saddr_in;

//fill in the address for usna.edu
saddr_in.sin_family = AF_INET;
inet_aton("10.4.32.41", &(saddr_in.sin_addr));
saddr_in.sin_prot = htons(80);

//open a socket
if( (sock = socket(AF_INET, SOCK_STREAM, 0))  < 0){
  perror("socket");
  exit(1);
}

//connect socket to the server
if(connect(sock, (struct sockaddr *) &saddr_in, sizeof(struct sockaddr_in)) < 0){
  perror("connect");
  exit(1);
 }

9.2 Socket I/O

Sockets are file descriptors, and so we use the same interface to read and write from them as we did for other kinds of file descriptors, like open files and pipes and etc. That interface is read() and write(), and should be very familiar to you now.

//read from socket and write to stdout
while( (n = read(sock, buf, BUF_SIZE)) > 0){
   if( write(sock, buf, n) < 0){
       perror("write");
       exit(1);
   }
}

if( n < 0 ){
  perror("read");
  exit(1);
}

9.3 Putting it all together

A typical client program we might want to write is one that can connect to a web server and download the web page, i.e., the HTML. Here is such a program:

/*get_usna.c*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>

int main(){

  char hostname[]="usna.edu";    //the hostname we are looking up
  short port=80;                 //the port we are connecting on

  struct addrinfo *result;       //to store results
  struct addrinfo hints;         //to indicate information we want

  struct sockaddr_in *saddr_in;  //socket interent address

  int s,n;                       //for error checking

  int sock;                      //socket file descriptor

  char request[]="GET /index.html\n\r"; //the GET request

  char response[4096];           //read in 4096 byte chunks


  //setup our hints
  memset(&hints,0,sizeof(struct addrinfo));  //zero out hints
  hints.ai_family = AF_INET; //we only want IPv4 addresses

  //Convert the hostname to an address
  if( (s = getaddrinfo(hostname, NULL, &hints, &result)) != 0){
    fprintf(stderr, "getaddrinfo: %s\n",gai_strerror(s));
    exit(1);
  }

  //convert generic socket address to inet socket address
  saddr_in = (struct sockaddr_in *) result->ai_addr;

  //set the port in network byte order
  saddr_in->sin_port = htons(port);

  //open a socket
  if( (sock = socket(AF_INET, SOCK_STREAM, 0))  < 0){
    perror("socket");
    exit(1);
  }

  //connect to the server
  if(connect(sock, (struct sockaddr *) saddr_in, sizeof(*saddr_in)) < 0){
    perror("connect");
    exit(1);
  }

  //send the request
  if(write(sock,request,strlen(request)) < 0){
    perror("send");
  }

  //read the response until EOF
  while( (n = read(sock, response, 4096)) > 0){

    //write response to stdout
    if(write(1, response, n) < 0){
      perror("write");
      exit(1);
    }
  }

  if (n<0){
    perror("read");
  }

  //close the socket
  close(sock);

  return 0; //success
}

Following the client program, we can see that first we resolve the hostname into an IP address, storing that information in the socket address saddr_in. Next we open a new TCP socket, connect to the server, and the read all the data to the server, printing the output to the terminal.

10 Server Sockets

Server sockets are much like client sockets except instead of connecting they accept incoming connection. The result of accepting an incoming connection generates a new socekt for that client which is used for all further communication. The server socket remains and can accept more incoming connections on the listening port.

Before a connection can be accepted, there is some setup, which we can trace through the server socket life cycle below:

lec23-server-socket.png

Figure 7: Server Socket Life Cycle

10.1 Binding a Socket: bind()

The first step in establishing a server socket is to bind the socket to a given IP address and port. This way the O.S. knows what port the socket is listening on. The bind() system call has the following description:

int bind(int socket, const struct sockaddr *address, socklen_t address_len);

The arguments can be interpreted as follows:

  • socket : an open socket from socket()
  • address : a reference to a socket address to be bound, in our case, this will reference a sockaddr_in with an IP address and port in the AF_INET family.
  • address_len : the size of the socket address, which will be sizeof(struct sockaddr_in)

It returns 0 on success and a negative value on error. We can see how this functions in code with the following example:

char hostname[]="127.0.0.1";   //localhost ip address to bind to or
                               //you can use INADDR_ANY if you don't
                               //care which IP is bound to

short port=1845;               //the port we are to bind to
int server_sock;               //file descriptor for the server socket

struct sockaddr_in saddr_in;  //socket interent address of server
//set up the address information
saddr_in.sin_family = AF_INET;
inet_aton(hostname, &saddr_in.sin_addr); // OR
                                         // saddr_in.sin_addr.s_addr=INADDR_ANY;
                                         // (if you don't care about
                                         // binding to a particular
                                         // address)
saddr_in.sin_port = htons(port);

//open a socket
if( (server_sock = socket(AF_INET, SOCK_STREAM, 0))  < 0){
  perror("socket");
  exit(1);
}

//bind the socket
if(bind(server_sock, (struct sockaddr *) &saddr_in, saddr_len) < 0){
  perror("bind");
  exit(1);
}

Note that we are binding to an ip address 127.0.0.1 which is special IP address referring the local machine. You could also use the domain name localhost and perform a look up with getaddrinfo().

However, if you don't care about which IP address to bind to, there is a special value, INADDR_ANY, that you can use which instructs the OS to bind to any available IP address. The addressing simplifies to something like below:

saddr_in.sin_family = AF_INET;
saddr_in.sin_addr.s_addr = INADDR_ANY;
saddr_in.sin_port = htnos(1845);

10.2 Queuing incoming connections: listen()

Once you've bound the socket to an IP address and port, you must still indicated to the Operating System that this socket is a server socket. The system call that does this is called listen() and has the following function description:

int listen(int socket, int backlog);

The argument socket is the server socket file descriptor, but the argument backlog requires a bit more explanation. As you will see below, to establish a connection with a client, accept() must be called, but this doesn't happen immediately. There is a period of limbo between when the incoming connection is recognized and accept() is called to establish the connection. Further, many incoming connections can occur at the same time, and the operating system has limited resources to queue up client connections prior to accept(). The backlog argument indicates to the OS how many incoming connections should be allowed to queue prior to accept() before starting to reject connections. A typical value for backlog is 5, but higher and lower values is acceptable.

Here is an example of the listen() in the running example:

//ready to listen, queue up to 5 pending connections
if(listen(server_sock, 5)  < 0){
  perror("listen");
  exit(1);
}

10.3 Accepting Incoming Connections: accept()

Finally, everything is in place to accept a connection, and to do that we use the accept() system call. It has the following function description:

int accept(int socket, struct sockaddr *restrict address, socklen_t *restrict address_len);

The arguments to accept are as follows:

  • socket is the server socket that you have bound and established as a listener.
  • adress is a reference to socket address structure of the client. Since we are using AF_INET sockets, you can pass a reference to a struct sockaddr_in and cast appropriately.
  • address_len is a pointer to a size reference for the address. This is necessary because, as we learned already, not all socket addresses are the same size, but since we are using AF_INET we know that this will sizeof(struct sockaddr_in)

When a client connection has been accepted properly, the return value of accept is a file descriptor for a new socket that we can use to communicate with the client. The server socket remains because we might want to use that to accept other incoming connections.

Here is an example in code:

int client_sock;

//accept incoming connections
if((client_sock = accept(server_sock, (struct sockaddr *) &client_saddr_in, &saddr_len)) < 0){
  perror("accept");
  exit(1);
}

printf("Connection From: %s:%d (%d)\n", 
       inet_ntoa(client_saddr_in.sin_addr), //address as dotted quad
       ntohs(client_saddr_in.sin_port),     //the port in host order
       client_sock);                        //the file descriptor number

You'll note that the address of the new socket is different than the address of the server socket. This make sense. You can't have two sockets communicating on the same port, so when a client connects, the OS must establish a new connection on a different port so not to collide with the server port. Further, if you think about it more, once the connection is established, the port doesn't really matter as long as both ends agree on it. The server port address, the address that accepts connections, is the only port that must truly be known and declared. We will see more examples of this later.

Another important thing to note about the server socket is that accepting an incoming connections is a blocking operation. That means, accept() will not return until a new connection is provided. This fact becomes quite a challenge when developing many server programs that wish to provide service to multiple clients, and we will explore different techniques for achieving multi-client services.

10.4 Communicating with the Client: read()/write()/close()

Once we have the connection established with the client, the new client socket, from accept(), is how we communicate. Recall that a socket is just a file descriptor, so we can use the standard read() and write() operations on the socket to send and receive data from the child. At this point, these procedures should be familiar to you:

 //read from client
if((n = read(client_sock,response, BUF_SIZE-1)) < 0){
  perror("read");
  exit(1);
}
response[n] = '\0'; //NULL terminate string

printf("Read from client: %s", response);

//construct response
snprintf(response, BUF_SIZE, "Hello %s:%d \nGo Navy! Beat Army\n", 
         inet_ntoa(client_saddr_in.sin_addr),    //address as dotted quad
         ntohs(client_saddr_in.sin_port));       //the port in host order

printf("Sending: %s",response);

//send response
if(write(client_sock, response, strlen(response)) < 0){
  perror("write");
  exit(1);
}

printf("Closing socket\n\n");

//close client socket
close(client_sock);

//close the server socket
close(server_sock);

Once all the operations are over, the act of closing the socket will bring down the connection with client. Closing the server socket stops the listening process.

10.5 Putting it all together

With all the pieces in place, we can connect the client and server socket procedures and see how the two interfaces interact:

lec23-client-server-socket.png

Figure 8: Client and Server Socket Life Cycle

And the entirity of the hello server program:

/*hello_server.c*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>

#define BUF_SIZE 4096

int main(){

  char hostname[]="127.0.0.1";   //localhost ip address to bind to
  short port=1845;               //the port we are to bind to


  struct sockaddr_in saddr_in;  //socket interent address of server
  struct sockaddr_in client_saddr_in;  //socket interent address of client

  socklen_t saddr_len = sizeof(struct sockaddr_in); //length of address

  int server_sock, client_sock;         //socket file descriptor


  char response[BUF_SIZE];           //what to send to the client
  int n;                             //length measure

  //set up the address information
  saddr_in.sin_family = AF_INET;
  inet_aton(hostname, &saddr_in.sin_addr);
  saddr_in.sin_port = htons(port);

  //open a socket
  if( (server_sock = socket(AF_INET, SOCK_STREAM, 0))  < 0){
    perror("socket");
    exit(1);
  }

  //bind the socket
  if(bind(server_sock, (struct sockaddr *) &saddr_in, saddr_len) < 0){
    perror("bind");
    exit(1);
  }

  //ready to listen, queue up to 5 pending connectinos
  if(listen(server_sock, 5)  < 0){
    perror("listen");
    exit(1);
  }


  saddr_len = sizeof(struct sockaddr_in); //length of address

  printf("Listening On: %s:%d\n", inet_ntoa(saddr_in.sin_addr), ntohs(saddr_in.sin_port));

  //accept incoming connections
  if((client_sock = accept(server_sock, (struct sockaddr *) &client_saddr_in, &saddr_len)) < 0){
    perror("accept");
    exit(1);
  }


  printf("Connection From: %s:%d (%d)\n", 
         inet_ntoa(client_saddr_in.sin_addr), //address as dotted quad
         ntohs(client_saddr_in.sin_port),     //the port in host order
         client_sock);                        //the file descriptor number

  //read from client
  if((n = read(client_sock,response, BUF_SIZE-1)) < 0){
    perror("read");
    exit(1);
  }
  response[n] = '\0'; //NULL terminate string

  printf("Read from client: %s", response);

  //construct response
  snprintf(response, BUF_SIZE, "Hello %s:%d \nGo Navy! Beat Army\n", 
           inet_ntoa(client_saddr_in.sin_addr),    //address as dotted quad
           ntohs(client_saddr_in.sin_port));       //the port in host order

  printf("Sending: %s",response);

  //send response
  if(write(client_sock, response, strlen(response)) < 0){
    perror("write");
    exit(1);
  }

  printf("Closing socket\n\n");

  //close client_sock
  close(client_sock);

  //close the socket
  close(server_sock);

  return 0; //success
}

And we can see a few runs of the program. In one terminal we have our small server program running and in the other we have a netcat client running.

<div class="side-by-side"> <div class="side-by-side-a">

#> ./hello_server
Listening On: 127.0.0.1:1845
Connection From: 127.0.0.1:58740 (4)
Read from client: hello
Sending: Hello 127.0.0.1:58740 
Go Navy! Beat Army
Closing socket

</div> <div class="side-by-side-b">

#> netcat localhost 1845
hello
Hello 127.0.0.1:58740 
Go Navy! Beat Army

</div> </div>

Note that the client socket, after the connection was accepted, is operating on the port 58740, which is not the same as the port the server socket was listening on 1845.

11 Handling Multiple Incoming Connections of Server Sockets

Now that we understand how to setup a server, let's consider how we might be able to handle multiple incoming connections. This is a common task for servers since every client has its own client socket, it is only natural to serve multiple clients at the same time. However, this is more complicated then it might seem at first because of blocking operations. Let's first explore the perils of blocking operations and how to overcome this challenge. In a later lessons, we'll see another method for handling multiple client services that uses threads that is simpler but comes with its own challenge.

11.1 Challenge of blocking

Let's consider an improvement to our server: instead of just responding with a token phrase, it echos back whatever is sent to it until the client closes the connection. Further, we'd like to be able to serve multiple clients. To start, we could simply just change the program logic to look like this:

//accept incoming connections in a loop
 while((client_sock = accept(server_sock, (struct sockaddr *) &client_saddr_in, &saddr_len)) > 0){

   printf("Connection From: %s:%d (%d)\n", 
          inet_ntoa(client_saddr_in.sin_addr), 
          ntohs(client_saddr_in.sin_port), 
          client_sock);


   //echo loop, break when read 0 or error
   while((n = read(client_sock, response, BUF_SIZE-1)) > 0){
     response[n] = '\0' ; //NULL terminate

     printf("Received From: %s:%d (%d): %s\n",  //LOGGING
            inet_ntoa(client_saddr_in.sin_addr), 
            ntohs(client_saddr_in.sin_port), 
            client_sock, 
            response);


     if(write(client_sock, response, n) < 0){
       perror("write");
       break;
     }
   }

   if( n < 0){
     perror("read");
   }

   printf("Client Closed: %s:%d (%d)\n",    //LOGGING
          inet_ntoa(client_saddr_in.sin_addr), 
          ntohs(client_saddr_in.sin_port), 
          client_sock);

   //close client socket
   close(client_sock);


   //reset socket len just in case
   saddr_len = sizeof(struct sockaddr_in); //length of address

 }

Essentially, we've place the server accepting incoming connections in a loop. When there is a connection, anything that is written from the client is echoed back. If the client closes the connection, so does the server. We can see that happening with a simple example:

<div class="side-by-side"> <div class="side-by-side-a">

#> ./echo_server
>./echo_server
serer sock listening: (3)
Connection From: 127.0.0.1:59088 (4)
Received From: 127.0.0.1:59088 (4): testing client #1

Received From: 127.0.0.1:59088 (4): who are you?

</div> <div class="side-by-side-b">

#>netcat localhost 1845
testing client #1
testing client #1
who are you?
who are you?

</div> </div>

However, if the first client keeps the connection open when the next client connects, what happens?

#>nc localhost 1845
testing
why am I not getting an echo?

There is no echo response. That's because the server is blocking while attempting to read() from the first client and the connection has not been accepted(). The connection is queued, so when the first client closes the socket, the expected response is provided, but we'd like all of this to occur simultaneously. In this way, a server can provide services to multiple clients.

11.2 Identifying Readable File Descriptors with select()

There are a few ways to solve this problem. One is to set all the socket file descriptors to non-blocking, which is a possibility and we've seen how to do this with fnctl() and pipes. But it can overly complicate the code.

Instead, we need a way to check to see if a file descriptors is ready to be used (e.g., read, write, or accepted) so that the operation will not block and return immediately. The simplest method for testing the readiness of a file descriptor is to use select().

select() is a system call that given a set of file descriptors, will allow you to iterate over the file descriptors that are ready for reading (or writing). The protocol for select has a few functions, but it's easy to see their use with an example.

/*echo_server.c*/
fd_set select_set; //stores interested file descriptors

FD_ZERO(select_set); //clear the set
FD_SET(fd, select_set); //add fd to the set
//add other file descriptors

//select at most FD_SETSIZE file descriptor from set that are ready for an action
select(FD_SETSIZE, &select_set, NULL, NULL, NULL) < 0)

//check for activity on all file descriptors                                                                                                                                                                    
 for(i=0; i < FD_SETSIZE; i++){

   //was the file descriptor i set?
   if(FD_ISSET(i, &select_set)){

     //i is the file descriptor number
     read( i, buf, BUF_SIZE); 
     //etc.

     FD_CLR(i,select_set); //remove file descriptor i from the set
   }
}

First, a set of file descriptors must be declared, this is of type fd_set. After initialization, FD_ZERO(), interested file descriptors can be added to the set using FD_SET(). Once all the file descriptors are provided, the select() system call will check all the file descriptors in the set to see if they are reading for an action, like a read(). Finally, you can iterate through all the file descriptor numbers checking if the file descriptor was selected with FD_ISSET() and if so, do some action. A file descriptor can be removed from the set with FD_CLR().

The specifics of how this works is not important here. The key takeaway is that we can now check if a file descriptor needs an action before taking said action: we are avoiding blocking! Let's see how we would use select() with our echo server:

/* echo_server_select.c*/
  fd_set activefds, readfds;


  //server setup and etc.


  while(1){ //loop


    //update the set of selectable file descriptors
    readfds = activefds;

    //Perform a select
    if( select(FD_SETSIZE, &readfds, NULL, NULL, NULL) < 0){
      perror("select");
      exit(1);
    }

    //check for activity on all file descriptors
    for(i=0; i < FD_SETSIZE; i++){

      //was the file descriptor i set?
      if(FD_ISSET(i, &readfds)){

        if(i == server_sock){ //activity on server socket, incoming connection

          //accept incoming connections = NON BLOCKING
          client_sock = accept(server_sock, (struct sockaddr *) &client_saddr_in, &saddr_len);

          printf("Connection From: %s:%d (%d)\n", inet_ntoa(client_saddr_in.sin_addr), 
                 ntohs(client_saddr_in.sin_port), client_sock);

          //add socket file descriptor to set
          FD_SET(client_sock, &activefds);

        }else{

          //otherwise client socket sent something to us
          client_sock = i;

          //get the address of the socket
          getpeername(client_sock, (struct sockaddr *) &client_saddr_in, &saddr_len);

          //read from client and echo back
          n = read(client_sock, response, BUF_SIZE-1);   

          if(n <= 0){ //closed or error on socket

            //close client sockt
            close(client_sock);

            //remove file descriptor from set
            FD_CLR(client_sock, &activefds);

            printf("Client Closed: %s:%d (%d)\n",           //LOG
                   inet_ntoa(client_saddr_in.sin_addr), 
                   ntohs(client_saddr_in.sin_port), 
                   client_sock);

          }else{ //client sent a message

            response[n] = '\0'; //NULL terminate

            //echo messget to client
            write(client_sock, response, n);

            printf("Received From: %s:%d (%d): %s",         //LOG
                   inet_ntoa(client_saddr_in.sin_addr), 
                   ntohs(client_saddr_in.sin_port), 
                   client_sock, response);
          }

        }

      }

    }

  }

That's a lot to take in, but here are few key points. First off, the accept() call is also a blocking operation, so we can use select() to determine if we have an incoming connection. We still need to check for clients closing on every read, which increases the complexity of the code. In the end, though, it works.

But, it's not clean. It lacks a certain something. What's not quite right about this code block is that we like to think of the process of handling a client a lot more like the while loop from echo_server and not the while loop from echo_server_select. What we want is a way to parallelize the process so we can write a simple bit of code that can handle all client connections the same and let that code run in parallel from the accepting connection. That's exactly what we'll look at next when we investigated threading and how threading can be used for socket server programming.