Unit 8: Networking
Table of Contents
- 1. What is the internet?
- 2. Internet Addressing
- 3. The Client-Server Connection Model and Transport Protocols
- 4. Networking Command Line Tools
- 5. Sockets
- 6. Addressing
- 7. Storing an IP address in
struct in_addr
- 8. Resolving Domain Names
- 9. Socket System Call
- 10. Server Sockets
- 11. Handling Multiple Incoming Connections of Server Sockets
1 What is the internet?
The Internet by definition is a network of networks composed of computeres. As a non-technical term, we use the term Internet as a catchall for all connected computers, but in technical terms, it is just one part of a larger ecosystem of networks and protocols that enable the sharing of information.
This is not a class on networking or the internet, but the ability to communicate over a network is an integral part of modern operating systems. The programming interface provided byt he OS is called socket programming, and it provides a unified model for interacting with networked components. It is implemented by the operating system, within kernel, and there is a standard set of system calls used to request the O.S. to complete tasks.
But, to really understand network programming, you have to first have a decent understanding of the protocols that underly the Internet, and one thing you learn quickly about network programming is that the protocol is king. Understanding the protocols is will make you a better programmer.
1.1 Packet Switching
The internet is a packet switched network. A packet is defined as follows:
All packets have a header, which stores the address or destination of a packet, and a payload which stores the data or message of the packet. The switching part of packet switching is that at network devices, like routers and switches, the packet arrives, and based solely on the header of the packet, the device knows where to send the data next. There are no pre-defined routs for data but the protocols ensure that the next hop in the path to the destination can be determined. As you might imagine, in such a model, addressing becomes very important.
1.2 The TCP/IP Protocol Stack
The Internet is modelled as a protocol stack where each protocol defines a different interaction layer. Information flows up and down the protocol stack, and at each layer, a different protocol comes to bare for forwarding the packet onward to the next hop.
Each layer has different goal in mind. Starting with the physical layer, the main purpose is to actually transmit 1's and 0's over medium, like a wire. The link layer adds protocols for how the medium is shared across many connected devices, as well as error correction. An example of protocols on the link layer is ethernet or wifi.
The internet and transport layer of are particular interest in this class because you will interact directly with these protocol through their addressing schemes. The purpose of the internet layer is to inter-connect networks; for example, the USNA has a network, and if you want to send data to Google, your packets must traverse the USNA network and potentially many other federated networks before finally reaching a server at Google. The internet layer both describes the protocols for how networks inter connect with each other and the way that computers are identified, via the internet protocol address or ip address.
At a certain point, though, two processes running on different computers are actually sending and receiving data across the vastness of the Internet. The transport layer provides an abstraction for those two process to apear as if they are communicating directly with each other. Since many process can be communicating on the network at the same time, the transport layer also provides a mechanism, called ports, to differentiate communication destined for one process versus another.
Finally, at the application layer, additional protocols are available depending on the task at hand. For example, SMTP is used to transmit email messages and HTTP is used to download web content and BitTorrent is used to pirate music and videos :) From a generic programmers perspective, the application layer is the domain where we get to choose what data is sent and received and how that data is interpreted; the systems programming perspective also concerns itself with system calls that enable that communication.
2 Internet Addressing
Each layer has its own addressing scheme and information needed to perform routing/switching. This information is traditionally encoded within the header of the packet. There are two key addressing systems that we will use in this class, ip addresses and ports. Additionally, we also refer to computers by name, a domain name, which must be translated into an address.
2.1 IP addresses
An ip address is a 4-byte/32-bit number in Version 4 of TCP/IP protocol. We usually represented in a dotted quad notation:
4-bytes _____________ / \ 192.168.128.101 \_/ | 1 byte
A byte is 8-bits, and thus can represent numbers between 0-255, which is why ip addresses do not have numbers greater than 255. The ip address is hierarchical where bytes to the left are more general while bytes to the right are more general. Based on a subset of the bytes, routers can determine where to send a packet next.
2.2 Domain Names and host
While IP addresses are somewhat usable, it is quite a burden to memorize the ip address of all the computers we might want to vist on the web. For exampel, while I might know the ip address of a single computer off hand, e.g., "10.53.37.142" is the ip address of a lab machine, I can't recall the ip address of Google or Facebook or much of anything else.
Instead, we use domain names to identify a networked device. A
hostname, is a dotted string of names, usually ending in the canonical
.com
or .org
or .edu
or .gov
or etc. For example, the domain
name for Google is google.com
, but the Internet does not function on
domain names. It needs IP addresses. A separate protocol called the
Domain Name Service or DNS is tasked with converting domain names into
IP addresses.
2.2.1 host
We can access the DNS system on Unix through the host
command. For
example, suppose we wanted to learn the IP address of google.com:
#> google.com google.com has address 74.125.228.110 google.com has address 74.125.228.103 google.com has address 74.125.228.96 google.com has address 74.125.228.99 google.com has address 74.125.228.104 google.com has address 74.125.228.101 google.com has address 74.125.228.102 google.com has address 74.125.228.97 google.com has address 74.125.228.100 google.com has address 74.125.228.105 google.com has address 74.125.228.98 google.com has IPv6 address 2607:f8b0:4004:803::1003 google.com mail is handled by 30 alt2.aspmx.l.google.com. google.com mail is handled by 40 alt3.aspmx.l.google.com. google.com mail is handled by 20 alt1.aspmx.l.google.com. google.com mail is handled by 10 aspmx.l.google.com. google.com mail is handled by 50 alt4.aspmx.l.google.com.
The output may not be what you expect. There are many, many different
IP addresses available to server google.com, and this is by intention
to balance the load of request across multiple machines. In fact,
every time you rerun host
, you'll find that you get a different set
of IP address:
#> host google.com google.com has address 74.125.228.104 google.com has address 74.125.228.101 google.com has address 74.125.228.102 google.com has address 74.125.228.97 google.com has address 74.125.228.100 google.com has address 74.125.228.105 google.com has address 74.125.228.98 google.com has address 74.125.228.110 google.com has address 74.125.228.103 google.com has address 74.125.228.96 google.com has address 74.125.228.99 google.com has IPv6 address 2607:f8b0:4004:803::1003 google.com mail is handled by 50 alt4.aspmx.l.google.com. google.com mail is handled by 30 alt2.aspmx.l.google.com. google.com mail is handled by 40 alt3.aspmx.l.google.com. google.com mail is handled by 20 alt1.aspmx.l.google.com. google.com mail is handled by 10 aspmx.l.google.com.
If we were to query a less used domain name, one that isn't serving as
much traffic as google, we get IP addresses that are a bit more
sane. For example, let's see what the IP addresses are for www.usna.edu
:
#> host www.usna.edu www.usna.edu is an alias for webster-new.dmz.usna.edu. webster-new.dmz.usna.edu has address 10.4.32.41
You can even use host
to do a reverse DNS lookup, that is, lookup
the domain name based on an IP address:
#> host 10.4.32.41 41.32.4.10.in-addr.arpa domain name pointer webster-new.dmz.usna.edu.
2.3 Ports and /etc/services
The last bits of addressing relevant to this class is the port address. While the IP address is used to deliver packets to a destination computer, the port address is used to deliver the packets on the computer to the right process. Consider that a single computer all share the same IP address, there are many different applications using that connection at the same time. You might have multiple web pages open with email and playing games and etc, each of those interactions is performed by a separate process but all the data arrives at the computer through a single point.
The port address is a way for the Operating System to divide up the data arriving from the network based on the destination process. Additionally, ports tend to be tightly coupled with applications. For example, to initiate a HTTP connection for web browsing, you connect using port 80; to initiate a secure shell connection with ssh, you connect using port 22; and, to initiate a connection to send email, you connect using port 25, and so on. What makes ports important is that all those services, web server, ssh, and email, can all be running on the same computer. The ports allows the operating system differentiate traffic for each application.
2.3.1 /etc/services
stores port application mappings
The mapping of ports to applications is deeply ingrained within the
Unix systems, and often many programs wish to quickly map a port to an
application. To assist in that process, most Unix system ship with a
list of the current standards in port mapping, which is stored in the
/etc/services
file.
# Network services, Internet style # # Note that it is presently the policy of IANA to assign a single well-known # port number for both TCP and UDP; hence, officially ports have two entries # even if the protocol doesn't support UDP operations. # # Updated from http://www.iana.org/assignments/port-numbers and other # sources like http://www.freebsd.org/cgi/cvsweb.cgi/src/etc/services . # New ports will be added on request if they have been officially assigned # by IANA and used in the real-world or are needed by a debian package. # If you need a huge list of used numbers please install the nmap package. tcpmux 1/tcp # TCP port service multiplexer echo 7/tcp echo 7/udp discard 9/tcp sink null discard 9/udp sink null systat 11/tcp users daytime 13/tcp daytime 13/udp netstat 15/tcp qotd 17/tcp quote msp 18/tcp # message send protocol msp 18/udp chargen 19/tcp ttytst source chargen 19/udp ttytst source ftp-data 20/tcp ftp 21/tcp fsp 21/udp fspd ssh 22/tcp # SSH Remote Login Protocol ssh 22/udp telnet 23/tcp smtp 25/tcp mail time 37/tcp timserver time 37/udp timserver rlp 39/udp resource # resource location nameserver 42/tcp name # IEN 116 whois 43/tcp nicname tacacs 49/tcp # Login Host Protocol (TACACS) tacacs 49/udp re-mail-ck 50/tcp # Remote Mail Checking Protocol re-mail-ck 50/udp domain 53/tcp # Domain Name Server domain 53/udp mtp 57/tcp # deprecated tacacs-ds 65/tcp # TACACS-Database Service tacacs-ds 65/udp bootps 67/tcp # BOOTP server bootps 67/udp bootpc 68/tcp # BOOTP client (...)
This file continues on for some while, but the takeaway regarding the sheer expanse and variety of network applications is clear. Further, the use of ports to different one service from another is vital to providing the diversity of services.
3 The Client-Server Connection Model and Transport Protocols
Most interactions of applications are dictated by the client-server model. In this model there exists clients who are requesting a services for a server.
In the model, we describe clients as connecting to servers and servers listening to incoming connections. When a connection is established, or data is received, the server replies to the client with data as required by the application protocol.
While this class will focus on the client server model, there are other models of network interaction. For example, the peer-to-peer model is when clients act as both client and servers. This is common for many distributed systems, such as BitTorrent or Skype.
3.1 Reliable Transport: TCP
or SOCKSTREAM
The client server model fits into the protocol stack at the transport layer. There are typically two types of transport available for programmers, reliable and unreliable transport. Interestingly, none of the protocols in lower layers ensure any reliability — at any time packets can be drop, misrouted, delayed, or generally deformed without notice. The fact that such things can happen on the network is actually a positive because the lower layers can be much more efficient without having to worry about reliable delivery.
The TCP
or Transmission Control Protocol was developed to provide
reliable transport on the inherently unreliable lower layer packet
deliver system. The big idea behind TCP is that it establishes a
stream or session between client and server where the expectation
of packet delivery and acknowledgment forms the basis of reliable
transport mechanism. Essentially, when you are using TCP it is as if
the client and server are communicating directly with each other, like
via a pipeline, even though there may potentially be a huge network
between them. In the parlance of socket programing, which we will
discuss in the next lesson, TCP protocol is described as a
SOCKSTREAM
because it proves a stream of information, much like a
pipeline.
3.2 Unreliable Transport: UDP
or DATAGRAM
Reliable transport has a cost, though. The cost is the retransmission of lost or deformed packets and acknowledgements of properly received packets. In order to have reliable transportation, all information must be properly acknowledged upon receipt and if a packet was not properly received, then it must be retransmitted. The result is that there exists a significant overhead, and this is worsened by the fact that not all communication needs to be reliable — dropping a few packets here and there never killed anyone, yet.
The complementary protocol to TCP is UDP or User Datagram
Protocol, which is an unreliable transport mechanism. The UDP
protocol, or DATAGRAM
protcol, does not make any guarentees about
the delivery if a packet. It might get there … or it might
not. Datagram protocols are not session driven either; without
reliability, the client and the server need not stay in sync to ensure
that all messages are acknowledged. Instead, a server just listens for
incoming data from clients and thats that.
You might be wondering when would this ever be useful? UDP is quite common for a number of applications; for example, live audio streams. There is no need for audio streams to be reliable, if you miss a packet, so what, you'll just get the next one and keep playing the music. However, if you were to do this reliably, you'd have to stop the music while missed data was retransmitted, and the result is you might keep getting further and further behind in the live stream.
4 Networking Command Line Tools
Now that you've seen how to write socket programs, it is useful to
test these your programs using the "swiss army knife" of networking
command line tools, netcat
. Also, you can use a tool called
netstat
to monitor your open socket connections. As a programmer,
these tools are often indispensable for debugging and understanding
the functionality of your program.
4.1 netcat
: the network "swiss army knife"
The netcat
program has often been described as the "swiss army
knife" of networking because it can do anything and
everything. It's pretty amazing once you get into the details, but
its more basic functionality is fairly useful already for our
purposes. We'll see how to use it as both a client and server using
both UDP and TCP protocols.
4.1.1 netcat client
When working as a client, netcat
takes two arguments:
netcat dest port
The dest
is a destination address, which can either be a IP
address as a dotted quad or a domain name. The port
is a number
representing the port address. This is all we need to make
netcat
act like a web client, so let's connect to a web server
#> netcat www.cnn.com 80 GET /index.html <!DOCTYPE HTML> <html lang="en-US"> <head> <title>CNN.com - Breaking News, U.S., World, Weather, Entertainment & Video News</title> <meta http-equiv="content-type" content="text/html;charset=utf-8"/> <meta http-equiv="last-modified" content="2014-04-03T13:48:56Z"/> <meta http-equiv="refresh" content="1800;url=http://www.cnn.com/?refresh=1"/> <meta http-equiv="X-UA-Compatible" content="IE=edge"/> <meta name="robots" content="index,follow"/> (...)
What we've just done is establish a TCP connection on port 80, the HTTP port for web traffic, and make a request to the HTTP server to send us the main page for cnn.com. And it works!
4.1.2 netcat server
netcat
can also act as a sever by listening for incoming
connections on a given port. You do this with the -l
command:
#> netcat -l 1845
There is now a service running on 1845, netcat
, and we can
connect to it using another netcat
client.
<div class="side-by-side"> <div class="side-by-side-a">
#> netcat -l 1845 Hello What's your name? adam me too, how strange. strange ...... #>
</div> <div class="side-by-side-b">
#> netcat localhost 1845 Hello What's your name? adam me too, how strange. strange ...... ^C #>
</div> </div>
The domain name localhost
refers to the current computer, this
way we don't always ahve to remember the IP address. In the above
example, information is typed back and forth between the netcat
servers and clients.
4.2 netstat
: monitor current connections
The other very useful command for Unix uses is nestat
, which
displays a list of information about current network usage on the
computer. It's best to just see an example.
Suppose in one terminal, I've created a netcat
server.
netcat -l 1845
We can see that this was properly established by vieing the netstat
output with the -l
flag, indicating we are interested in listening
servers. We will also use -n
flag so all port numbers are displaed
#> netstat -ln Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:36688 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:1845 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:44469 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:25 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:538 0.0.0.0:* LISTEN tcp6 0 0 :::55662 :::* LISTEN tcp6 0 0 :::591 :::* LISTEN tcp6 0 0 :::111 :::* LISTEN tcp6 0 0 :::22 :::* LISTEN tcp6 0 0 ::1:631 :::* LISTEN tcp6 0 0 :::25 :::* LISTEN tcp6 0 0 :::39612 :::* LISTEN udp 0 0 0.0.0.0:57189 0.0.0.0:* udp 0 0 0.0.0.0:68 0.0.0.0:* udp 0 0 0.0.0.0:111 0.0.0.0:* udp 0 0 10.53.33.232:123 0.0.0.0:* udp 0 0 127.0.0.1:123 0.0.0.0:* udp 0 0 0.0.0.0:123 0.0.0.0:* udp 0 0 0.0.0.0:538 0.0.0.0:* udp 0 0 0.0.0.0:601 0.0.0.0:* udp 0 0 0.0.0.0:33370 0.0.0.0:* udp 0 0 127.0.0.1:608 0.0.0.0:* udp 0 0 0.0.0.0:49817 0.0.0.0:* udp 0 0 0.0.0.0:5353 0.0.0.0:* udp6 0 0 :::35113 :::* udp6 0 0 :::47936 :::* udp6 0 0 :::56801 :::* udp6 0 0 :::111 :::* udp6 0 0 ::1:123 :::* udp6 0 0 fe80::d227:88ff:fed:123 :::* udp6 0 0 :::123 :::* udp6 0 0 :::601 :::* udp6 0 0 :::5353 :::* (...)
There is quite a lot of output, but at the top is the good
stuff. What you can see is the combination of IP address (0.0.0.0 and
127.0.0.0 indicating localhost) and port that are curently being
listend on by services. If you look closely, you can see that, yes,
the netcat
listening on port 1845 is present.
Let's look at the output when we connect the client netcat
#> netcat localhost 1845
This time we run netcat
without the -l
flag because we are
interested in established connections.
#> netstat -n Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 160 10.53.33.232:22 10.53.33.254:57078 ESTABLISHED tcp 1 0 10.53.33.232:47982 91.189.89.144:80 CLOSE_WAIT tcp 0 0 127.0.0.1:41217 127.0.0.1:1845 ESTABLISHED tcp 0 0 10.53.33.232:22 10.53.33.254:54510 ESTABLISHED tcp 0 0 127.0.0.1:1845 127.0.0.1:41217 ESTABLISHED tcp 0 0 10.53.33.232:801 10.1.83.18:2049 ESTABLISHED tcp 0 0 10.53.33.232:22 10.53.33.254:58882 ESTABLISHED tcp 0 0 10.53.33.232:48491 10.1.68.11:445 ESTABLISHED (...)
Looking through the list, we find 127.0.0.1:1845
which is our
netcat
server is connected to 127.0.0.1:54510
, which is our
netcat
client. The same entry is also found in reverse, client->server.
Now, something might seem a bit off with this because both client and server were using the same port, which is true, but that is for establishing a connection. A server must be listening on a given port so that it can be reached, but once the connection is established, the port doesn't matter as long as client and server can talk to each other. In fact, a random port is chosen, in this case 54510, to facilitate that communication, and we'll see this in action when we discuss socket programming in detail in the following lessons.
5 Sockets
In systems programming a socket is much like a file; in fact, it is a file descriptor that happens to write to the network device rather than to a file on disc. As long as you think of it that way, you're in pretty good shape.
The system call to open a socket is socket()
:
int socket(int domain, int type, int protocol);
The arguments can be described as follows:
domain
is the addressing domain of socket, which for our purposes is an internet socket orAF_INET
type
is the type of socket describing which transport protocol we are using. HTTP traffic is over TCP, so that would beSOCK_STREAM
. If we wanted to open a UDP socket, the keyword would beSOCK_DGRAM
protocol
is additional information about the protocol of the socket, but we will not need to use this option, so we'll set it to 0.
The return value of socket()
is a integer file descriptor on
success, and -1 on failure. All further socket operations,
connect()
, read()
, write()
, bind()
, accept()
and etc, require that integer
file descriptor.
6 Addressing
Before we can dive into socket programming, we have to deal with network addressing. Unfortunately, in C, addressing of sockets can be frustrating because depending on the protocol, an address can be of different lengths. On the surface, this might not cause an issue, but it is HUGE for C program because we deal with fixed size buffers. The difference between a 4-byte and 8-byte or 16-byte address is substantial.
At the same time, we also want our programs, even with different address lengths to look the same, use roughly the same structure types, and function in the same way. The result of these needs is that addressing of sockets in C can be complicated, requiring many forms of casting until, finally, you reach the type your looking. In between "… these are not the types you're looking for."
To simplify our discussion, we'll only be covering addressing for
IPv4, which uses 4-byte address space. The shorthand for IPv4 is
AF_INET
, which you will see used throughout.
6.1 Glossary of Structures and Functions
There will be many new structures and functions we'll use in this lessons. Below is quick overview of each of the structures you need with a brief description, more detailed discussion follows.
AF_INET
: The address family value for IPv4 addresses.struct in_addr
: type for storing a internet address, it has a single members_addr
which is a unsigned integer to store the 4 bytes of IPv4 address.struct sockaddr
: a generic socket address structure that is used for all addressing types. Often, we'll cast these to astruct sockaddr_in
since we are only concerned with IPv4 addresses.struct sockaddr_in
: a specific socket address to store IP and port information information. It has 3 members:short sin_faily
: the address family, which should beAF_INET
.short sin_port
: the port number in network byte order, converted usinghtons()
struct in_addr sin_addr
: astruct in_addr
to store the IP address itself.
struct addrinfo
: a structure used bygetaddrinfo()
to store address information and hints. It has a following relevant members:int ai_family
: the address family, which should beAF_INET
for usstruct sockaddr ai_addr
: the socket address returned after a query
Here is a glossary of library/system calls we will use and their purpose:
inet_ntoa()
convert astruct in_addr
to a dotted quad string: "Network-to-Address"inet_aton()
convert a string of an IP address as a dotted quad into astruct in_addr
: "Address-to-Network=getaddrinfo()
convert a domain name into an IP address, stored in theai_addr
member as astruct sockaddr
in thestruct addrinfo
result
htons()
convert a short stored in host byte order into a network byte order: "Host-to-Network"ntohs()
convert a short sroted in network byte order to host byte order: "Network-to-Host"
7 Storing an IP address in struct in_addr
The structure that stores 32-bit IPv4 address is struct in_addr
:
//uint32_t is the same as unsigned int
typedef uint32_t in_addr_t;
struct in_addr
{
in_addr_t s_addr;
};
The in_addr
structure has one member, s_addr
, whose type is
uint32_t
, which is just a fancy way of saying unsigned int
. Let's
look at an example of using the in_addr
structure to store an IP
address:
/* hello_rawip.c*/
#include <stdio.h>
#include <stdlib.h>
#include <netinet/in.h> //for struct in_addr
#include <arpa/inet.h> //for inet_ntoa()
int main(){
//in_addr struct has a single member s_addr
struct in_addr addr;
unsigned char * ip;
//have ip point to s_addr
ip = (unsigned char *) &(addr.s_addr);
//set the bytes for "10.4.32.41"
ip[0]=10;
ip[1]=4;
ip[2]=32;
ip[3]=41;
//print it out
printf("Hello %s\n", inet_ntoa(addr));
}
Recall, from our conversations on types and casting, that this line:
ip = (unsigned char *) &(addr.s_addr);
sets the pointer ip
to reference the address. Then we can set the
bytes directly byte-by-byte since ip
is a unsigned char
pointer. At the end, we can print out the address in quad notation
using the function inet_ntoa()
which stands for network-to-address.
This is clearly a very cumbersome way to set addresses, so instead we
can just use the dotted quad notation and have the operating system do
the conversion for us. To do this, we use inet_aton()
which stands
for address-to-network, which converts an IPv4 address into struct
addr
.
/* hello_aton.c*/
#include <stdio.h>
#include <stdlib.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(){
//in_addr struct has a single member s_addr
struct in_addr addr;
//Convert the IP dotted quad into struct in_addr
inet_aton("10.4.32.41", &(addr));
printf("Hello %s\n", inet_ntoa(addr));
}
7.1 Storing IP Addresses in struct sockaddr
and struct sockaddr_in
The next level of addressing is to combine an IP address with port
information. Since we are concerned with internet addressing, we will
use the struct sockaddr_in
:
struct sockaddr_in {
short sin_family; //address family, set to AF_INET
unsigned short sin_port; //the port in network byte order
struct in_addr sin_addr; //the inet address
};
This is just one kind of socket address, but a socket can be used
for a variety of things, not just internet communication. As a
result, there is also a generic socket type called struct sockaddr
without the _in
suffix. Know that whenever you see data typed as
struct sockaddr
you can convert it into struct sockaddr_in
and
vice versa. The two types are the same size, that is, occupy the same
number of bytes in memory. The casting just changes how those bytes
are interpreted.
In the following code, for example:
/*hello_sockaddr.c*/
#include <stdio.h>
#include <stdlib.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(){
//use a generic socket address to store everything
struct sockaddr saddr;
//cast generic socket to an inet socket
struct sockaddr_in * saddr_in = (struct sockaddr_in *) &saddr;
//Convert IP address into inet address stored in sockaddr
inet_aton("10.4.32.41", &(saddr_in->sin_addr));
//print out IP address
printf("Hello %s\n", inet_ntoa(saddr_in->sin_addr));
}
We declare a generic socket address on the stack, but it is cast to
a inet socket address to acces the sin_addr
member.
8 Resolving Domain Names
The next part of addressing is the conversion of a domain name into
an IP address. IP address are not completely unusable for humans,
but they are not the preferred way to reference a remote
host. Instead, we use the domain name. For example, when we go to
the www.usna.edu
that domain must be resolved into an ip
address.
8.1 Converting a Domain Name into an IP address
The resolving protocol is called DNS, or Domain Name
System, and it is implemented for us through the gataddrinfo()
library function. Here is the function declaration:
int getaddrinfo(const char *node, const char *service,
const struct addrinfo *hints,
struct addrinfo **res);
Here is a description of the arguments:
node
: a string of the address/domain name you wish to be resolvedservice
: name of the service for the domain you're interested in, set toNULL
for our usagehints
: an addrinfo of "hints" describe the kinds of address information we are interested in related to the domainres
: a newaddrinfo
structure will be allocated and the pointer will be referenced by res
On errror, getaddrinfo()
will return a non-zero value, which will
set a special error informatio field. To catch errors you will use
the following format style:
if( (s = getaddrinfo(hostname, NULL, &hints, &result)) != 0){
fprintf(stderr, "getaddrinfo: %s\n",gai_strerror(s)); // <---- converts the error into a message
exit(1);
}
The next part of using getaddrinfo()
properly is understanding
the struct addrinfo
, which has the following members:
struct addrinfo {
int ai_flags;
int ai_family;
int ai_socktype;
int ai_protocol;
size_t ai_addrlen;
struct sockaddr *ai_addr;
char *ai_canonname;
struct addrinfo *ai_next;
};
For our purposes, since we are only concerned with IPv4 internet
addressing, we only need to focus on two fields: ai_family
and
ai_addr
.
ai_family
: indicates the address family of this address, should always be set toAF_INET
ai_addr
: a socket address storing the resolved IP address
One peculiar aspect of a getaddrinfo()
call beyond retrieving the
results is that you must also specify hints to the kinds of
addresses you are interested in. We are only interested in
AF_INET
address types, so we can always declare and use the
hints
option like so:
struct addrinfo hints;
//zero out the hints structure (look up memset for details)
memset(&hints,0,sizeof(struct addrinfo));
//set the ai_family field per our needs, AF_INET
hints.ai_family = AF_INET;
Now we have enough to write a hello-world program for resolving an domain name.
/* hello_getaddrinfo.c*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>
int main(){
char hostname[]="www.usna.edu"; //the hostname we are looking up
struct addrinfo *result; //to store results
struct addrinfo hints; //to indicate information we want
struct sockaddr_in *saddr; //to reference address
int s; //for error checking
memset(&hints,0,sizeof(struct addrinfo)); //zero out hints
hints.ai_family = AF_INET; //we only want IPv4 addresses
//Convert the hostname to an address
if( (s = getaddrinfo(hostname, NULL, &hints, &result)) != 0){
fprintf(stderr, "getaddrinfo: %s\n",gai_strerror(s));
exit(1);
}
//convert generic socket address to inet socket address
saddr = (struct sockaddr_in *) result->ai_addr;
//print the address
printf("Hello %s\n", inet_ntoa(saddr->sin_addr));
//free the addrinfo struct
freeaddrinfo(result);
}
One thing to be careful about when dealing with getaddrinfo()
is
that the resulting address is stored in struct sockaddr
but we
are only using struct sockaddr_in
. Fortunately, since we only
hinted towards AF_INET
, we know that the sockaddr
is actually a
sockaddr_in
and so we can cast:
struct sockaddr_in *saddr; //to reference address
//convert generic socket address to inet socket address
saddr = (struct sockaddr_in *) result->ai_addr;
Once we've called the data the right type, we can treat it like a
sockaddr_in
and get to the underlying in_addr
:
//print the address
printf("Hello %s\n", inet_ntoa(saddr->sin_addr));
Last, but not least: getaddrinfo()
allocates new memory to store
the results addrinfo
. It must be freed with freeaddrinfo()
//free the addrinfo struct
freeaddrinfo(result);
8.2 Resolving IP addresses to IP addresses?
One nice thing about getaddrinfo()
is that it can take a domain
name or an IP address. If it finds that you've provided an IP
address, it will not resolve it and return it already set in the
ai_sockaddr
for the results. For example:
/* hello_getaddrinfo_ip.c*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>
int main(){
char hostname[]="10.4.32.41"; //<--- Not a hostname, but an IP address
struct addrinfo *result; //to store results
struct addrinfo hints; //to indicate information we want
struct sockaddr_in *saddr; //to reference address
int s; //for error checking
memset(&hints,0,sizeof(struct addrinfo)); //zero out hints
hints.ai_family = AF_INET; //we only want IPv4 addresses
//Convert the hostname to an address
if( (s = getaddrinfo(hostname, NULL, &hints, &result)) != 0){
fprintf(stderr, "getaddrinfo: %s\n",gai_strerror(s));
exit(1);
}
//convert generic socket address to inet socket address
saddr = (struct sockaddr_in *) result->ai_addr;
//print the address
printf("Hello %s\n", inet_ntoa(saddr->sin_addr)); //<---- will print that IP address here
//free the addrinfo struct
freeaddrinfo(result);
}
8.3 Network Byte Order and Ports
The last part of the puzzle for addressing is the port
number. Recall the sockaddr_in
structure:
struct sockaddr_in {
short sin_family; //address family, set to AF_INET
unsigned short sin_port; //the port in network byte order
struct in_addr sin_addr; //the inet address
};
You see that there is a member, sin_port
and your instinct would
be to set that directly. For example, if we wanted to contact
www.usna.edu
on port 80, after doing getaddrinfo()
, we'd do
something like this:
//convert generic socket address to inet socket address
saddr = (struct sockaddr_in *) result->ai_addr;
saddr->sin_port = 80; //<-- setting port in host order!
It turns out that this DOES NOT work, and it doesn't because of a fundamental problem with data representation. Let's consider the number 80 as it is represented in bits:
pow: 128 64 32 16 8 4 2 1 ---------------------------------- bits: 0 1 0 1 0 0 0 0
The above makes since, 8010 = 01010002. If we look more closely, we see that the bit furthest to the left is most significant, or has the highest value, 128. If you think a bit more about it, it seems like an arbitrary choice, couldn't the bit furthest to the left be most significant instead? Instead, we have a the following representation:
pow: 1 2 4 8 16 32 64 128 ---------------------------------- bits: 0 0 0 0 1 0 1 0
Now we find that 8010 = 000010102. While this might seem a bit awkward, it really is no better nor worse than the other choice. In fact, in the early days of computing, there was a holy war about which order the bits should be written in.
The term used for the ordering of bits is called endian or endianness, and there are two camps: big endian and little endian.
- Little Endian : The most significant byte is stored in the smallest address.
- Big Endian : The most significant byte is stored in the biggest address.
In most modern computing systems, little endian is king, but back when the internet was designed and the equipment put in place, it was not clear what data representation should be preferred. As such, the network is big endian, or more precisely routing information is stored in network byte order while data on the host is stored in host byte order.
To facilitate this use, all Unix systems implement a set of conversion functions:
- htons() : convert data from host order to network order for a short (or two-byte) type
- ntohs() : convert data from network order to host order for a short (or two-byte) type
There area also hton*()
for other types, like integers and longs
and etc. On big endian machines host and network order are the same,
so these functions do nothing, but it is good practice to always
attempt to convert accordingly.
Now, finally, returning to the setting of the port address, we see that we actually must do so like such:
//convert generic socket address to inet socket address
saddr = (struct sockaddr_in *) result->ai_addr;
saddr->sin_port = htons(80); //<-- setting port in network byte order!!!
9 Socket System Call
In systems programming a socket is much like a file; in fact, it is a file descriptor that happens to write to the network device rather than to a file on disc. As long as you think of it that way, you're in pretty good shape.
The system call to open a socket is socket():
int socket(int domain, int type, int protocol);
The arguments can be described as follows:
domain
is the addressing domain of socket, which for our purposes is an internet socket orAF_INET
type
is the type of socket describing which transport protocol we are using. HTTP traffic is over TCP, so that would beSOCK_STREAM
. If we wanted to open a UDP socket, the keyword would be SOCKDGRAMprotocol
is additional information about the protocol of the socket, but we will not need to use this option, so we'll set it to 0.
The return value of socket() is a integer file descriptor on success, and -1 on failure. All further socket operations, connect(), read(), write(), and etc, require that integer file descriptor. Today we will focus just on the client side operations of a socket.
9.1 Client Sockets
A client sockets goal is to connect to a foreign address where a server socket is listening. Visually, we think of this process like so.
A new socket is opened using the socket()
system call, but the
socket being open doesn't mean it is connected to anything. To do
that, you use the connect()
system call that takes as input a
given address, IP-port pair. Once connected with the remote server,
the client can read and write to that socket to participate using an
application layer protocol. The connect system call takes the
following arguments:
int connect(int socket, const struct sockaddr *address, socklen_t address_len);
Generally, given a socket socket
and a socket address address
,
try and connect the socket to that foreign address. Note that the
socket address is a generate socket (struct sockaddr
) which is
not necessarily an IP socket address (struct soaddr_in
), so
you'll need to cast:
int sock;
struct sockaddr_in saddr_in;
//fill in the address for usna.edu
saddr_in.sin_family = AF_INET;
inet_aton("10.4.32.41", &(saddr_in.sin_addr));
saddr_in.sin_prot = htons(80);
//open a socket
if( (sock = socket(AF_INET, SOCK_STREAM, 0)) < 0){
perror("socket");
exit(1);
}
//connect socket to the server
if(connect(sock, (struct sockaddr *) &saddr_in, sizeof(struct sockaddr_in)) < 0){
perror("connect");
exit(1);
}
9.2 Socket I/O
Sockets are file descriptors, and so we use the same interface to
read and write from them as we did for other kinds of file
descriptors, like open files and pipes and etc. That interface is
read()
and write()
, and should be very familiar to you now.
//read from socket and write to stdout
while( (n = read(sock, buf, BUF_SIZE)) > 0){
if( write(sock, buf, n) < 0){
perror("write");
exit(1);
}
}
if( n < 0 ){
perror("read");
exit(1);
}
9.3 Putting it all together
A typical client program we might want to write is one that can connect to a web server and download the web page, i.e., the HTML. Here is such a program:
/*get_usna.c*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>
int main(){
char hostname[]="usna.edu"; //the hostname we are looking up
short port=80; //the port we are connecting on
struct addrinfo *result; //to store results
struct addrinfo hints; //to indicate information we want
struct sockaddr_in *saddr_in; //socket interent address
int s,n; //for error checking
int sock; //socket file descriptor
char request[]="GET /index.html\n\r"; //the GET request
char response[4096]; //read in 4096 byte chunks
//setup our hints
memset(&hints,0,sizeof(struct addrinfo)); //zero out hints
hints.ai_family = AF_INET; //we only want IPv4 addresses
//Convert the hostname to an address
if( (s = getaddrinfo(hostname, NULL, &hints, &result)) != 0){
fprintf(stderr, "getaddrinfo: %s\n",gai_strerror(s));
exit(1);
}
//convert generic socket address to inet socket address
saddr_in = (struct sockaddr_in *) result->ai_addr;
//set the port in network byte order
saddr_in->sin_port = htons(port);
//open a socket
if( (sock = socket(AF_INET, SOCK_STREAM, 0)) < 0){
perror("socket");
exit(1);
}
//connect to the server
if(connect(sock, (struct sockaddr *) saddr_in, sizeof(*saddr_in)) < 0){
perror("connect");
exit(1);
}
//send the request
if(write(sock,request,strlen(request)) < 0){
perror("send");
}
//read the response until EOF
while( (n = read(sock, response, 4096)) > 0){
//write response to stdout
if(write(1, response, n) < 0){
perror("write");
exit(1);
}
}
if (n<0){
perror("read");
}
//close the socket
close(sock);
return 0; //success
}
Following the client program, we can see that first we resolve the
hostname into an IP address, storing that information in the socket
address saddr_in
. Next we open a new TCP socket, connect to the
server, and the read all the data to the server, printing the output
to the terminal.
10 Server Sockets
Server sockets are much like client sockets except instead of connecting they accept incoming connection. The result of accepting an incoming connection generates a new socekt for that client which is used for all further communication. The server socket remains and can accept more incoming connections on the listening port.
Before a connection can be accepted, there is some setup, which we can trace through the server socket life cycle below:
10.1 Binding a Socket: bind()
The first step in establishing a server socket is to bind the
socket to a given IP address and port. This way the O.S. knows what
port the socket is listening on. The bind()
system call has the
following description:
int bind(int socket, const struct sockaddr *address, socklen_t address_len);
The arguments can be interpreted as follows:
socket
: an open socket fromsocket()
address
: a reference to a socket address to be bound, in our case, this will reference asockaddr_in
with an IP address and port in theAF_INET
family.address_len
: the size of the socket address, which will besizeof(struct sockaddr_in)
It returns 0 on success and a negative value on error. We can see how this functions in code with the following example:
char hostname[]="127.0.0.1"; //localhost ip address to bind to or
//you can use INADDR_ANY if you don't
//care which IP is bound to
short port=1845; //the port we are to bind to
int server_sock; //file descriptor for the server socket
struct sockaddr_in saddr_in; //socket interent address of server
//set up the address information
saddr_in.sin_family = AF_INET;
inet_aton(hostname, &saddr_in.sin_addr); // OR
// saddr_in.sin_addr.s_addr=INADDR_ANY;
// (if you don't care about
// binding to a particular
// address)
saddr_in.sin_port = htons(port);
//open a socket
if( (server_sock = socket(AF_INET, SOCK_STREAM, 0)) < 0){
perror("socket");
exit(1);
}
//bind the socket
if(bind(server_sock, (struct sockaddr *) &saddr_in, saddr_len) < 0){
perror("bind");
exit(1);
}
Note that we are binding to an ip address 127.0.0.1
which is
special IP address referring the local machine. You could also use
the domain name localhost
and perform a look up with
getaddrinfo()
.
However, if you don't care about which IP address
to bind to, there is a special value, INADDR_ANY
, that you can
use which instructs the OS to bind to any available IP address. The
addressing simplifies to something like below:
saddr_in.sin_family = AF_INET;
saddr_in.sin_addr.s_addr = INADDR_ANY;
saddr_in.sin_port = htnos(1845);
10.2 Queuing incoming connections: listen()
Once you've bound the socket to an IP address and port, you must
still indicated to the Operating System that this socket is a
server socket. The system call that does this is called listen()
and has the following function description:
int listen(int socket, int backlog);
The argument socket
is the server socket file descriptor, but the
argument backlog
requires a bit more explanation. As you will see
below, to establish a connection with a client, accept()
must be
called, but this doesn't happen immediately. There is a period of
limbo between when the incoming connection is recognized and
accept()
is called to establish the connection. Further, many
incoming connections can occur at the same time, and the operating
system has limited resources to queue up client connections prior
to accept()
. The backlog
argument indicates to the OS how many
incoming connections should be allowed to queue prior to accept()
before starting to reject connections. A typical value for
backlog
is 5, but higher and lower values is acceptable.
Here is an example of the listen()
in the running example:
//ready to listen, queue up to 5 pending connections
if(listen(server_sock, 5) < 0){
perror("listen");
exit(1);
}
10.3 Accepting Incoming Connections: accept()
Finally, everything is in place to accept a connection, and to do
that we use the accept()
system call. It has the following
function description:
int accept(int socket, struct sockaddr *restrict address, socklen_t *restrict address_len);
The arguments to accept are as follows:
socket
is the server socket that you have bound and established as a listener.adress
is a reference to socket address structure of the client. Since we are usingAF_INET
sockets, you can pass a reference to astruct sockaddr_in
and cast appropriately.address_len
is a pointer to a size reference for the address. This is necessary because, as we learned already, not all socket addresses are the same size, but since we are usingAF_INET
we know that this willsizeof(struct sockaddr_in)
When a client connection has been accepted properly, the return
value of accept
is a file descriptor for a new socket that we
can use to communicate with the client. The server socket remains
because we might want to use that to accept other incoming
connections.
Here is an example in code:
int client_sock;
//accept incoming connections
if((client_sock = accept(server_sock, (struct sockaddr *) &client_saddr_in, &saddr_len)) < 0){
perror("accept");
exit(1);
}
printf("Connection From: %s:%d (%d)\n",
inet_ntoa(client_saddr_in.sin_addr), //address as dotted quad
ntohs(client_saddr_in.sin_port), //the port in host order
client_sock); //the file descriptor number
You'll note that the address of the new socket is different than the address of the server socket. This make sense. You can't have two sockets communicating on the same port, so when a client connects, the OS must establish a new connection on a different port so not to collide with the server port. Further, if you think about it more, once the connection is established, the port doesn't really matter as long as both ends agree on it. The server port address, the address that accepts connections, is the only port that must truly be known and declared. We will see more examples of this later.
Another important thing to note about the server socket is that
accepting an incoming connections is a blocking operation. That
means, accept()
will not return until a new connection is
provided. This fact becomes quite a challenge when developing many
server programs that wish to provide service to multiple clients,
and we will explore different techniques for achieving multi-client
services.
10.4 Communicating with the Client: read()/write()/close()
Once we have the connection established with the client, the new
client socket, from accept()
, is how we communicate. Recall that
a socket is just a file descriptor, so we can use the standard
read()
and write()
operations on the socket to send and receive
data from the child. At this point, these procedures should be
familiar to you:
//read from client
if((n = read(client_sock,response, BUF_SIZE-1)) < 0){
perror("read");
exit(1);
}
response[n] = '\0'; //NULL terminate string
printf("Read from client: %s", response);
//construct response
snprintf(response, BUF_SIZE, "Hello %s:%d \nGo Navy! Beat Army\n",
inet_ntoa(client_saddr_in.sin_addr), //address as dotted quad
ntohs(client_saddr_in.sin_port)); //the port in host order
printf("Sending: %s",response);
//send response
if(write(client_sock, response, strlen(response)) < 0){
perror("write");
exit(1);
}
printf("Closing socket\n\n");
//close client socket
close(client_sock);
//close the server socket
close(server_sock);
Once all the operations are over, the act of closing the socket will bring down the connection with client. Closing the server socket stops the listening process.
10.5 Putting it all together
With all the pieces in place, we can connect the client and server socket procedures and see how the two interfaces interact:
And the entirity of the hello server program:
/*hello_server.c*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>
#define BUF_SIZE 4096
int main(){
char hostname[]="127.0.0.1"; //localhost ip address to bind to
short port=1845; //the port we are to bind to
struct sockaddr_in saddr_in; //socket interent address of server
struct sockaddr_in client_saddr_in; //socket interent address of client
socklen_t saddr_len = sizeof(struct sockaddr_in); //length of address
int server_sock, client_sock; //socket file descriptor
char response[BUF_SIZE]; //what to send to the client
int n; //length measure
//set up the address information
saddr_in.sin_family = AF_INET;
inet_aton(hostname, &saddr_in.sin_addr);
saddr_in.sin_port = htons(port);
//open a socket
if( (server_sock = socket(AF_INET, SOCK_STREAM, 0)) < 0){
perror("socket");
exit(1);
}
//bind the socket
if(bind(server_sock, (struct sockaddr *) &saddr_in, saddr_len) < 0){
perror("bind");
exit(1);
}
//ready to listen, queue up to 5 pending connectinos
if(listen(server_sock, 5) < 0){
perror("listen");
exit(1);
}
saddr_len = sizeof(struct sockaddr_in); //length of address
printf("Listening On: %s:%d\n", inet_ntoa(saddr_in.sin_addr), ntohs(saddr_in.sin_port));
//accept incoming connections
if((client_sock = accept(server_sock, (struct sockaddr *) &client_saddr_in, &saddr_len)) < 0){
perror("accept");
exit(1);
}
printf("Connection From: %s:%d (%d)\n",
inet_ntoa(client_saddr_in.sin_addr), //address as dotted quad
ntohs(client_saddr_in.sin_port), //the port in host order
client_sock); //the file descriptor number
//read from client
if((n = read(client_sock,response, BUF_SIZE-1)) < 0){
perror("read");
exit(1);
}
response[n] = '\0'; //NULL terminate string
printf("Read from client: %s", response);
//construct response
snprintf(response, BUF_SIZE, "Hello %s:%d \nGo Navy! Beat Army\n",
inet_ntoa(client_saddr_in.sin_addr), //address as dotted quad
ntohs(client_saddr_in.sin_port)); //the port in host order
printf("Sending: %s",response);
//send response
if(write(client_sock, response, strlen(response)) < 0){
perror("write");
exit(1);
}
printf("Closing socket\n\n");
//close client_sock
close(client_sock);
//close the socket
close(server_sock);
return 0; //success
}
And we can see a few runs of the program. In one terminal we have
our small server program running and in the other we have a netcat
client running.
<div class="side-by-side"> <div class="side-by-side-a">
#> ./hello_server Listening On: 127.0.0.1:1845 Connection From: 127.0.0.1:58740 (4) Read from client: hello Sending: Hello 127.0.0.1:58740 Go Navy! Beat Army Closing socket
</div> <div class="side-by-side-b">
#> netcat localhost 1845 hello Hello 127.0.0.1:58740 Go Navy! Beat Army
</div> </div>
Note that the client socket, after the connection was accepted, is
operating on the port 58740, which is not the same as the port the
server socket was listening on 1845
.
11 Handling Multiple Incoming Connections of Server Sockets
Now that we understand how to setup a server, let's consider how we might be able to handle multiple incoming connections. This is a common task for servers since every client has its own client socket, it is only natural to serve multiple clients at the same time. However, this is more complicated then it might seem at first because of blocking operations. Let's first explore the perils of blocking operations and how to overcome this challenge. In a later lessons, we'll see another method for handling multiple client services that uses threads that is simpler but comes with its own challenge.
11.1 Challenge of blocking
Let's consider an improvement to our server: instead of just responding with a token phrase, it echos back whatever is sent to it until the client closes the connection. Further, we'd like to be able to serve multiple clients. To start, we could simply just change the program logic to look like this:
//accept incoming connections in a loop
while((client_sock = accept(server_sock, (struct sockaddr *) &client_saddr_in, &saddr_len)) > 0){
printf("Connection From: %s:%d (%d)\n",
inet_ntoa(client_saddr_in.sin_addr),
ntohs(client_saddr_in.sin_port),
client_sock);
//echo loop, break when read 0 or error
while((n = read(client_sock, response, BUF_SIZE-1)) > 0){
response[n] = '\0' ; //NULL terminate
printf("Received From: %s:%d (%d): %s\n", //LOGGING
inet_ntoa(client_saddr_in.sin_addr),
ntohs(client_saddr_in.sin_port),
client_sock,
response);
if(write(client_sock, response, n) < 0){
perror("write");
break;
}
}
if( n < 0){
perror("read");
}
printf("Client Closed: %s:%d (%d)\n", //LOGGING
inet_ntoa(client_saddr_in.sin_addr),
ntohs(client_saddr_in.sin_port),
client_sock);
//close client socket
close(client_sock);
//reset socket len just in case
saddr_len = sizeof(struct sockaddr_in); //length of address
}
Essentially, we've place the server accepting incoming connections in a loop. When there is a connection, anything that is written from the client is echoed back. If the client closes the connection, so does the server. We can see that happening with a simple example:
<div class="side-by-side"> <div class="side-by-side-a">
#> ./echo_server >./echo_server serer sock listening: (3) Connection From: 127.0.0.1:59088 (4) Received From: 127.0.0.1:59088 (4): testing client #1 Received From: 127.0.0.1:59088 (4): who are you?
</div> <div class="side-by-side-b">
#>netcat localhost 1845 testing client #1 testing client #1 who are you? who are you?
</div> </div>
However, if the first client keeps the connection open when the next client connects, what happens?
#>nc localhost 1845 testing why am I not getting an echo?
There is no echo response. That's because the server is blocking
while attempting to read()
from the first client and the connection
has not been accepted()
. The connection is queued, so when the
first client closes the socket, the expected response is provided,
but we'd like all of this to occur simultaneously. In this way, a
server can provide services to multiple clients.
11.2 Identifying Readable File Descriptors with select()
There are a few ways to solve this problem. One is to set all the
socket file descriptors to non-blocking, which is a possibility and
we've seen how to do this with fnctl()
and pipes. But it can
overly complicate the code.
Instead, we need a way to check to see if a file descriptors is
ready to be used (e.g., read, write, or accepted) so that the
operation will not block and return immediately. The simplest
method for testing the readiness of a file descriptor is to use
select()
.
select()
is a system call that given a set of file descriptors,
will allow you to iterate over the file descriptors that are ready
for reading (or writing). The protocol for select has a few
functions, but it's easy to see their use with an example.
/*echo_server.c*/
fd_set select_set; //stores interested file descriptors
FD_ZERO(select_set); //clear the set
FD_SET(fd, select_set); //add fd to the set
//add other file descriptors
//select at most FD_SETSIZE file descriptor from set that are ready for an action
select(FD_SETSIZE, &select_set, NULL, NULL, NULL) < 0)
//check for activity on all file descriptors
for(i=0; i < FD_SETSIZE; i++){
//was the file descriptor i set?
if(FD_ISSET(i, &select_set)){
//i is the file descriptor number
read( i, buf, BUF_SIZE);
//etc.
FD_CLR(i,select_set); //remove file descriptor i from the set
}
}
First, a set of file descriptors must be declared, this is of type
fd_set
. After initialization, FD_ZERO()
, interested file
descriptors can be added to the set using FD_SET()
. Once all the
file descriptors are provided, the select()
system call will
check all the file descriptors in the set to see if they are
reading for an action, like a read()
. Finally, you can iterate
through all the file descriptor numbers checking if the file
descriptor was selected with FD_ISSET()
and if so, do some
action. A file descriptor can be removed from the set with
FD_CLR()
.
The specifics of how this works is not important here. The key
takeaway is that we can now check if a file descriptor needs an
action before taking said action: we are avoiding blocking! Let's
see how we would use select()
with our echo server:
/* echo_server_select.c*/
fd_set activefds, readfds;
//server setup and etc.
while(1){ //loop
//update the set of selectable file descriptors
readfds = activefds;
//Perform a select
if( select(FD_SETSIZE, &readfds, NULL, NULL, NULL) < 0){
perror("select");
exit(1);
}
//check for activity on all file descriptors
for(i=0; i < FD_SETSIZE; i++){
//was the file descriptor i set?
if(FD_ISSET(i, &readfds)){
if(i == server_sock){ //activity on server socket, incoming connection
//accept incoming connections = NON BLOCKING
client_sock = accept(server_sock, (struct sockaddr *) &client_saddr_in, &saddr_len);
printf("Connection From: %s:%d (%d)\n", inet_ntoa(client_saddr_in.sin_addr),
ntohs(client_saddr_in.sin_port), client_sock);
//add socket file descriptor to set
FD_SET(client_sock, &activefds);
}else{
//otherwise client socket sent something to us
client_sock = i;
//get the address of the socket
getpeername(client_sock, (struct sockaddr *) &client_saddr_in, &saddr_len);
//read from client and echo back
n = read(client_sock, response, BUF_SIZE-1);
if(n <= 0){ //closed or error on socket
//close client sockt
close(client_sock);
//remove file descriptor from set
FD_CLR(client_sock, &activefds);
printf("Client Closed: %s:%d (%d)\n", //LOG
inet_ntoa(client_saddr_in.sin_addr),
ntohs(client_saddr_in.sin_port),
client_sock);
}else{ //client sent a message
response[n] = '\0'; //NULL terminate
//echo messget to client
write(client_sock, response, n);
printf("Received From: %s:%d (%d): %s", //LOG
inet_ntoa(client_saddr_in.sin_addr),
ntohs(client_saddr_in.sin_port),
client_sock, response);
}
}
}
}
}
That's a lot to take in, but here are few key points. First off,
the accept()
call is also a blocking operation, so we can use
select()
to determine if we have an incoming connection. We still
need to check for clients closing on every read, which increases the
complexity of the code. In the end, though, it works.
But, it's not clean. It lacks a certain something. What's not quite
right about this code block is that we like to think of the process
of handling a client a lot more like the while loop from
echo_server
and not the while loop from
echo_server_select
. What we want is a way to parallelize the
process so we can write a simple bit of code that can handle all
client connections the same and let that code run in parallel from
the accepting connection. That's exactly what we'll look at next
when we investigated threading and how threading can be used for
socket server programming.