Lec 26: Socket Addressing and Client Socket Programming
Table of Contents
1 Sockets
In systems programming a socket is much like a file; in fact, it is a file descriptor that happens to write to the network device rather than to a file on disc. As long as you think of it that way, you're in pretty good shape.
The system call to open a socket is socket()
:
int socket(int domain, int type, int protocol);
The arguments can be described as follows:
domain
is the addressing domain of socket, which for our purposes is an internet socket orAF_INET
type
is the type of socket describing which transport protocol we are using. HTTP traffic is over TCP, so that would beSOCK_STREAM
. If we wanted to open a UDP socket, the keyword would beSOCK_DGRAM
protocol
is additional information about the protocol of the socket, but we will not need to use this option, so we'll set it to 0.
The return value of socket()
is a integer file descriptor on
success, and -1 on failure. All further socket operations,
connect()
, read()
, write()
, bind()
, accept()
and etc, require that integer
file descriptor.
2 Addressing
Before we can dive into socket programming, we have to deal with network addressing. Unfortunately, in C, addressing of sockets can be frustrating because depending on the protocol, an address can be of different lengths. On the surface, this might not cause an issue, but it is HUGE for C program because we deal with fixed size buffers. The difference between a 4-byte and 8-byte or 16-byte address is substantial.
At the same time, we also want our programs, even with different address lengths to look the same, use roughly the same structure types, and function in the same way. The result of these needs is that addressing of sockets in C can be complicated, requiring many forms of casting until, finally, you reach the type your looking. In between "… these are not the types you're looking for."
To simplify our discussion, we'll only be covering addressing for
IPv4, which uses 4-byte address space. The shorthand for IPv4 is
AF_INET
, which you will see used throughout.
2.1 Glossary of Structures and Functions
There will be many new structures and functions we'll use in this lessons. Below is quick overview of each of the structures you need with a brief description, more detailed discussion follows.
AF_INET
: The address family value for IPv4 addresses.struct in_addr
: type for storing a internet address, it has a single members_addr
which is a unsigned integer to store the 4 bytes of IPv4 address.struct sockaddr
: a generic socket address structure that is used for all addressing types. Often, we'll cast these to astruct sockaddr_in
since we are only concerned with IPv4 addresses.struct sockaddr_in
: a specific socket address to store IP and port information information. It has 3 members:short sin_faily
: the address family, which should beAF_INET
.short sin_port
: the port number in network byte order, converted usinghtons()
struct in_addr sin_addr
: astruct in_addr
to store the IP address itself.
struct addrinfo
: a structure used bygetaddrinfo()
to store address information and hints. It has a following relevant members:int ai_family
: the address family, which should beAF_INET
for usstruct sockaddr ai_addr
: the socket address returned after a query
Here is a glossary of library/system calls we will use and their purpose:
inet_ntoa()
convert astruct in_addr
to a dotted quad string: "Network-to-Address"inet_aton()
convert a string of an IP address as a dotted quad into astruct in_addr
: "Address-to-Network=getaddrinfo()
convert a domain name into an IP address, stored in theai_addr
member as astruct sockaddr
in thestruct addrinfo
result
htons()
convert a short stored in host byte order into a network byte order: "Host-to-Network"ntohs()
convert a short sroted in network byte order to host byte order: "Network-to-Host"
3 Storing an IP address in struct in_addr
The structure that stores 32-bit IPv4 address is struct in_addr
:
//uint32_t is the same as unsigned int typedef uint32_t in_addr_t; struct in_addr { in_addr_t s_addr; };
The in_addr
structure has one member, s_addr
, whose type is
uint32_t
, which is just a fancy way of saying unsigned int
. Let's
look at an example of using the in_addr
structure to store an IP
address:
#include <stdio.h> #include <stdlib.h> #include <netinet/in.h> //for struct in_addr #include <arpa/inet.h> //for inet_ntoa() int main(){ //in_addr struct has a single member s_addr struct in_addr addr; unsigned char * ip; //have ip point to s_addr ip = (unsigned char *) &(addr.s_addr); //set the bytes for "10.4.32.41" ip[0]=10; ip[1]=4; ip[2]=32; ip[3]=41; //print it out printf("Hello %s\n", inet_ntoa(addr)); }
Recall, from our conversations on types and casting, that this line:
ip = (unsigned char *) &(addr.s_addr);
sets the pointer ip
to reference the address. Then we can set the
bytes directly byte-by-byte since ip
is a unsigned char
pointer. At the end, we can print out the address in quad notation
using the function inet_ntoa()
which stands for network-to-address.
This is clearly a very cumbersome way to set addresses, so instead we
can just use the dotted quad notation and have the operating system do
the conversion for us. To do this, we use inet_aton()
which stands
for address-to-network, which converts an IPv4 address into struct
addr
.
#include <stdio.h> #include <stdlib.h> #include <netinet/in.h> #include <arpa/inet.h> int main(){ //in_addr struct has a single member s_addr struct in_addr addr; //Convert the IP dotted quad into struct in_addr inet_aton("10.4.32.41", &(addr)); printf("Hello %s\n", inet_ntoa(addr)); }
3.1 Storing IP Addresses in struct sockaddr
and struct sockaddr_in
The next level of addressing is to combine an IP address with port
information. Since we are concerned with internet addressing, we will
use the struct sockaddr_in
:
struct sockaddr_in { short sin_family; //address family, set to AF_INET unsigned short sin_port; //the port in network byte order struct in_addr sin_addr; //the inet address };
This is just one kind of socket address, but a socket can be used
for a variety of things, not just internet communication. As a
result, there is also a generic socket type called struct sockaddr
without the _in
suffix. Know that whenever you see data typed as
struct sockaddr
you can convert it into struct sockaddr_in
and
vice versa. The two types are the same size, that is, occupy the same
number of bytes in memory. The casting just changes how those bytes
are interpreted.
In the following code, for example:
#include <stdio.h> #include <stdlib.h> #include <netinet/in.h> #include <arpa/inet.h> int main(){ //use a generic socket address to store everything struct sockaddr saddr; //cast generic socket to an inet socket struct sockaddr_in * saddr_in = (struct sockaddr_in *) &saddr; //Convert IP address into inet address stored in sockaddr inet_aton("10.4.32.41", &(saddr_in->sin_addr)); //print out IP address printf("Hello %s\n", inet_ntoa(saddr_in->sin_addr)); }
We declare a generic socket address on the stack, but it is cast to
a inet socket address to acces the sin_addr
member.
4 Resolving Domain Names
The next part of addressing is the conversion of a domain name into
an IP address. IP address are not completely unusable for humans,
but they are not the preferred way to reference a remote
host. Instead, we use the domain name. For example, when we go to
the www.usna.edu
that domain must be resolved into an ip
address.
4.1 Converting a Domain Name into an IP address
The resolving protocol is called DNS, or Domain Name
System, and it is implemented for us through the gataddrinfo()
library function. Here is the function declaration:
int getaddrinfo(const char *node, const char *service, const struct addrinfo *hints, struct addrinfo **res);
Here is a description of the arguments:
node
: a string of the address/domain name you wish to be resolvedservice
: name of the service for the domain you're interested in, set toNULL
for our usagehints
: an addrinfo of "hints" describe the kinds of address information we are interested in related to the domainres
: a newaddrinfo
structure will be allocated and the pointer will be referenced by res
On errror, getaddrinfo()
will return a non-zero value, which will
set a special error informatio field. To catch errors you will use
the following format style:
if( (s = getaddrinfo(hostname, NULL, &hints, &result)) != 0){ fprintf(stderr, "getaddrinfo: %s\n",gai_strerror(s)); // <---- converts the error into a message exit(1); }
The next part of using getaddrinfo()
properly is understanding
the struct addrinfo
, which has the following members:
struct addrinfo { int ai_flags; int ai_family; int ai_socktype; int ai_protocol; size_t ai_addrlen; struct sockaddr *ai_addr; char *ai_canonname; struct addrinfo *ai_next; };
For our purposes, since we are only concerned with IPv4 internet
addressing, we only need to focus on two fields: ai_family
and
ai_addr
.
ai_family
: indicates the address family of this address, should always be set toAF_INET
ai_addr
: a socket address storing the resolved IP address
One peculiar aspect of a getaddrinfo()
call beyond retrieving the
results is that you must also specify hints to the kinds of
addresses you are interested in. We are only interested in
AF_INET
address types, so we can always declare and use the
hints
option like so:
struct addrinfo hints; //zero out the hints structure (look up memset for details) memset(&hints,0,sizeof(struct addrinfo)); //set the ai_family field per our needs, AF_INET hints.ai_family = AF_INET;
Now we have enough to write a hello-world program for resolving an domain name.
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <netinet/in.h> #include <arpa/inet.h> #include <netdb.h> int main(){ char hostname[]="www.usna.edu"; //the hostname we are looking up struct addrinfo *result; //to store results struct addrinfo hints; //to indicate information we want struct sockaddr_in *saddr; //to reference address int s; //for error checking memset(&hints,0,sizeof(struct addrinfo)); //zero out hints hints.ai_family = AF_INET; //we only want IPv4 addresses //Convert the hostname to an address if( (s = getaddrinfo(hostname, NULL, &hints, &result)) != 0){ fprintf(stderr, "getaddrinfo: %s\n",gai_strerror(s)); exit(1); } //convert generic socket address to inet socket address saddr = (struct sockaddr_in *) result->ai_addr; //print the address printf("Hello %s\n", inet_ntoa(saddr->sin_addr)); //free the addrinfo struct freeaddrinfo(result); }
One thing to be careful about when dealing with getaddrinfo()
is
that the resulting address is stored in struct sockaddr
but we
are only using struct sockaddr_in
. Fortunately, since we only
hinted towards AF_INET
, we know that the sockaddr
is actually a
sockaddr_in
and so we can cast:
struct sockaddr_in *saddr; //to reference address //convert generic socket address to inet socket address saddr = (struct sockaddr_in *) result->ai_addr;
Once we've called the data the right type, we can treat it like a
sockaddr_in
and get to the underlying in_addr
:
//print the address printf("Hello %s\n", inet_ntoa(saddr->sin_addr));
Last, but not least: getaddrinfo()
allocates new memory to store
the results addrinfo
. It must be freed with freeaddrinfo()
//free the addrinfo struct freeaddrinfo(result);
4.2 Resolving IP addresses to IP addresses?
One nice thing about getaddrinfo()
is that it can take a domain
name or an IP address. If it finds that you've provided an IP
address, it will not resolve it and return it already set in the
ai_sockaddr
for the results. For example:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <netinet/in.h> #include <arpa/inet.h> #include <netdb.h> int main(){ char hostname[]="10.4.32.41"; //<--- Not a hostname, but an IP address struct addrinfo *result; //to store results struct addrinfo hints; //to indicate information we want struct sockaddr_in *saddr; //to reference address int s; //for error checking memset(&hints,0,sizeof(struct addrinfo)); //zero out hints hints.ai_family = AF_INET; //we only want IPv4 addresses //Convert the hostname to an address if( (s = getaddrinfo(hostname, NULL, &hints, &result)) != 0){ fprintf(stderr, "getaddrinfo: %s\n",gai_strerror(s)); exit(1); } //convert generic socket address to inet socket address saddr = (struct sockaddr_in *) result->ai_addr; //print the address printf("Hello %s\n", inet_ntoa(saddr->sin_addr)); //<---- will print that IP address here //free the addrinfo struct freeaddrinfo(result); }
4.3 Network Byte Order and Ports
The last part of the puzzle for addressing is the port
number. Recall the sockaddr_in
structure:
struct sockaddr_in { short sin_family; //address family, set to AF_INET unsigned short sin_port; //the port in network byte order struct in_addr sin_addr; //the inet address };
You see that there is a member, sin_port
and your instinct would
be to set that directly. For example, if we wanted to contact
www.usna.edu
on port 80, after doing getaddrinfo()
, we'd do
something like this:
//convert generic socket address to inet socket address saddr = (struct sockaddr_in *) result->ai_addr; saddr->sin_port = 80; //<-- setting port in host order!
It turns out that this DOES NOT work, and it doesn't because of a fundamental problem with data representation. Let's consider the number 80 as it is represented in bits:
pow: 128 64 32 16 8 4 2 1 ---------------------------------- bits: 0 1 0 1 0 0 0 0
The above makes since, 8010 = 01010002. If we look more closely, we see that the bit furthest to the left is most significant, or has the highest value, 128. If you think a bit more about it, it seems like an arbitrary choice, couldn't the bit furthest to the left be most significant instead? Instead, we have a the following representation:
pow: 1 2 4 8 16 32 64 128 ---------------------------------- bits: 0 0 0 0 1 0 1 0
Now we find that 8010 = 000010102. While this might seem a bit awkward, it really is no better nor worse than the other choice. In fact, in the early days of computing, there was a holy war about which order the bits should be written in.
The term used for the ordering of bits is called endian or endianness, and there are two camps: big endian and little endian.
- Little Endian : The most significant byte is stored in the smallest address.
- Big Endian : The most significant byte is stored in the biggest address.
In most modern computing systems, little endian is king, but back when the internet was designed and the equipment put in place, it was not clear what data representation should be preferred. As such, the network is big endian, or more precisely routing information is stored in network byte order while data on the host is stored in host byte order.
To facilitate this use, all Unix systems implement a set of conversion functions:
- htons() : convert data from host order to network order for a short (or two-byte) type
- ntohs() : convert data from network order to host order for a short (or two-byte) type
There area also hton*()
for other types, like integers and longs
and etc. On big endian machines host and network order are the same,
so these functions do nothing, but it is good practice to always
attempt to convert accordingly.
Now, finally, returning to the setting of the port address, we see that we actually must do so like such:
//convert generic socket address to inet socket address saddr = (struct sockaddr_in *) result->ai_addr; saddr->sin_port = htons(80); //<-- setting port in network byte order!!!
5 Socket
In systems programming a socket is much like a file; in fact, it is a file descriptor that happens to write to the network device rather than to a file on disc. As long as you think of it that way, you're in pretty good shape.
The system call to open a socket is socket():
int socket(int domain, int type, int protocol);
The arguments can be described as follows:
domain
is the addressing domain of socket, which for our purposes is an internet socket orAF_INET
type
is the type of socket describing which transport protocol we are using. HTTP traffic is over TCP, so that would beSOCK_STREAM
. If we wanted to open a UDP socket, the keyword would be SOCKDGRAMprotocol
is additional information about the protocol of the socket, but we will not need to use this option, so we'll set it to 0.
The return value of socket() is a integer file descriptor on success, and -1 on failure. All further socket operations, connect(), read(), write(), and etc, require that integer file descriptor. Today we will focus just on the client side operations of a socket.
5.1 Client Sockets
A client sockets goal is to connect to a foreign address where a server socket is listening. Visually, we think of this process like so.
A new socket is opened using the socket()
system call, but the
socket being open doesn't mean it is connected to anything. To do
that, you use the connect()
system call that takes as input a
given address, IP-port pair. Once connected with the remote server,
the client can read and write to that socket to participate using an
application layer protocol. The connect system call takes the
following arguments:
int connect(int socket, const struct sockaddr *address, socklen_t address_len);
Generally, given a socket socket
and a socket address address
,
try and connect the socket to that foreign address. Note that the
socket address is a generate socket (struct sockaddr
) which is
not necessarily an IP socket address (struct soaddr_in
), so
you'll need to cast:
int sock; struct sockaddr_in saddr_in; //fill in the address for usna.edu saddr_in.sin_family = AF_INET; inet_aton("10.4.32.41", &(saddr_in.sin_addr)); saddr_in.sin_prot = htons(80); //open a socket if( (sock = socket(AF_INET, SOCK_STREAM, 0)) < 0){ perror("socket"); exit(1); } //connect socket to the server if(connect(sock, (struct sockaddr *) &saddr_in, sizeof(struct sockaddr_in)) < 0){ perror("connect"); exit(1); }
5.2 Socket I/O
Sockets are file descriptors, and so we use the same interface to
read and write from them as we did for other kinds of file
descriptors, like open files and pipes and etc. That interface is
read()
and write()
, and should be very familiar to you now.
//read from socket and write to stdout while( (n = read(sock, buf, BUF_SIZE)) > 0){ if( write(sock, buf, n) < 0){ perror("write"); exit(1); } } if( n < 0 ){ perror("read"); exit(1); }
5.3 Putting it all together
A typical client program we might want to write is one that can connect to a web server and download the web page, i.e., the HTML. Here is such a program:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <netinet/in.h> #include <arpa/inet.h> #include <netdb.h> int main(){ char hostname[]="usna.edu"; //the hostname we are looking up short port=80; //the port we are connecting on struct addrinfo *result; //to store results struct addrinfo hints; //to indicate information we want struct sockaddr_in *saddr_in; //socket interent address int s,n; //for error checking int sock; //socket file descriptor char request[]="GET /index.html\n\r"; //the GET request char response[4096]; //read in 4096 byte chunks //setup our hints memset(&hints,0,sizeof(struct addrinfo)); //zero out hints hints.ai_family = AF_INET; //we only want IPv4 addresses //Convert the hostname to an address if( (s = getaddrinfo(hostname, NULL, &hints, &result)) != 0){ fprintf(stderr, "getaddrinfo: %s\n",gai_strerror(s)); exit(1); } //convert generic socket address to inet socket address saddr_in = (struct sockaddr_in *) result->ai_addr; //set the port in network byte order saddr_in->sin_port = htons(port); //open a socket if( (sock = socket(AF_INET, SOCK_STREAM, 0)) < 0){ perror("socket"); exit(1); } //connect to the server if(connect(sock, (struct sockaddr *) saddr_in, sizeof(*saddr_in)) < 0){ perror("connect"); exit(1); } //send the request if(write(sock,request,strlen(request)) < 0){ perror("send"); } //read the response until EOF while( (n = read(sock, response, 4096)) > 0){ //write response to stdout if(write(1, response, n) < 0){ perror("write"); exit(1); } } if (n<0){ perror("read"); } //close the socket close(sock); return 0; //success }
Following the client program, we can see that first we resolve the
hostname into an IP address, storing that information in the socket
address saddr_in
. Next we open a new TCP socket, connect to the
server, and the read all the data to the server, printing the output
to the terminal.