Lab 11: Client Socket Programming
Table of Contents
1 Preliminaries
In this lab you will complete a set of C programs to expose you to client socket programming and addressing.
1.1 Lab Learning Goals
In this lab, you will learn the following topics and practice C programming skills.
- Using and converting between different address structures
- Converting data in and from network byte order
- Opening and Connecting a client socket
- Reading and Writing from a client socket
1.2 Lab Setup
Run the following command
~aviv/bin/ic221-up
Change into the lab directory
cd ~/ic221/labs/11
All the material you need to complete the lab can be found in the lab directory. All material you will submit, you should place within the lab directory. Throughout this lab, we refer to the lab directory, which you should interpret as the above path.
1.3 Submission Folder
For this lab, all ubmission should be placed in the following folder:
~/ic221/labs/11/
This directory contains 4 sub-directories; examples
, timer
,
term-status
, and mini-sh
. In the examples
directory you will
find any source code in this lab document. All lab work should be
done in the remaining directories.
- Only source files found in the folder will be graded.
- Do not change the names of any source files
Finally, in the top level of the lab directory, you will find a
README
file. You must complete the README
file, and include any
additional details that might be needed to complete this lab.
1.4 Compiling your programs with clang
and make
You are not required to provide your own Makefiles for this lab.
1.5 README
In the top level of the lab directory, you will find a README
file. You must fill out the README file with your name and alpha.
Please include a short summary of each of the tasks and any other
information you want to provide to the instructor.
1.6 Test Script
You are provided a test script which prints pass/fail information
for a set of tests for your programs. Note that passing all the
tests does not mean you will receive a perfect score: other tests
will be performed on your submission. To run the test script,
execute test.sh
from the lab directory.
./test.sh
You can comment out individual tests while working on different parts of the lab. Open up the test script and place comments at the bottom where appropriate.
2 Part 1: myhost
In this part of the lab you will implement your own version of the
host
command. Recall that the Unix tool host
performs a DNS look
up of a domain name to resolve it to the IP address, but as we
observed in the previous lesson, a domain name may resolve to
multiple IP addresses in both version 4 and version 6 of the IP
address specification.
The library function to resolve an domain name to an IP address is
getaddrinfo()
and the relevant result structure is an struct
addrinfo
.
struct addrinfo { int ai_flags; int ai_family; int ai_socktype; int ai_protocol; size_t ai_addrlen; struct sockaddr *ai_addr; char *ai_canonname; struct addrinfo *ai_next; };
So far, we've discussed the ai_addr
, which stores the socket
address that we cast to a ipv6 socket, to eventually get the IP
address. An interesting aspect of this structure is that it also a
node within a linked next. The ai_next
field stores a pointer to
the next addrinfo
which stores other resolved to IP address for
the domain. Eventually, ai_next
references NULL, which indicates
the end of the list. We can now iterate over the linked list of
addrinfo
's like so:
struct addrinfo * cur_result, *results, *hints; //... //Convert the hostname to an address if( (s = getaddrinfo(argv[1], NULL, &hints, &result)) != 0){ fprintf(stderr, "getaddrinfo: %s\n",gai_strerror(s)); exit(1); } for(cur_result = result; cur_result != NULL; cur_result = cur_result->ai_next){ //do something with the current result }
Another important aspect of the addrinfo
is the ai_protocol
which describes how the domain name is resolved based on which
application layer protocol we are interested in. For example, often
domains have different IP addresses for web traffic and email, and
this difference is represented in the ai_protocol
field. Since we
are only interested in the primary resolution of the domain, we can
check for that by comparing against the constant IPPROTO_TCP
since
we want IP address accepting TCP connections.
cur_result->ai_protocol == IPPROTO_TCP
The last aspect of the addrinfo structure to consider is that
ai_family
, which describes the kind of address was resolved. This
could either be IPv4 (AFINET) or IPv6 (AFINET6). We are primarily
concerned with IPv4, so you can compare ai_family
to AF_INET
to
ensure you are only access resolution of the right IP address:
cur_result->ai_family == AF_INET
For both of these options, you could specify that choice in the
hint
addrinfo when making the getaddrinfo()
. Then,
getaddrinfo()
will only return results that fit those
requirements.
2.1 Task 1: myhost
For this task, change into the myhost
directory, where you will
find two files:
myhost.c
:myhost
source file which you will completeMakefile
: to compile your program
Your goal is to complete the myhost
program so that it meets the
following specification.
- For a given domain name, it must resolve all the IP addresses associated with protocol 0 for that domain.
- It should print out the results as similarly to the Unix
host
command as possible. - It should only consider IPv4 addresses
Below is some sample output:
#> ./myhost google.com
google.com has address 74.125.228.0
google.com has address 74.125.228.3
google.com has address 74.125.228.8
google.com has address 74.125.228.2
google.com has address 74.125.228.1
google.com has address 74.125.228.9
google.com has address 74.125.228.4
google.com has address 74.125.228.14
google.com has address 74.125.228.6
google.com has address 74.125.228.7
google.com has address 74.125.228.5
#> ./myhost yahoo.com
yahoo.com has address 98.139.183.24
yahoo.com has address 98.138.253.109
yahoo.com has address 206.190.36.45
#> ./myhost microsoft.com
microsoft.com has address 64.4.11.37
microsoft.com has address 65.55.58.201
#> ./myhost www.seas.upenn.edu
www.seas.upenn.edu has address 158.130.68.91
#> ./myhost www.cs.stanford.edu
www.cs.stanford.edu has address 171.64.64.64
#> ./myhost badbabbadbad.bad.bcom
getaddrinfo: nodename nor servname provided, or not known
#> ./myhost
getaddrinfo: nodename nor servname provided, or not known
EXTRA CREDIT (10 points): Add the ability for your myhost
program
to resolve the IPv6 address with the same format as host
. For example:
#> ./myhost www.seas.upenn.edu
www.seas.upenn.edu has address 158.130.68.91
www.seas.upenn.edu has IPv6 address 2607:f470:8:64:5ea5::9
./myhost www.yahoo.com
www.yahoo.com has address 98.138.252.30
www.yahoo.com has address 98.138.253.109
www.yahoo.com has address 98.139.180.149
www.yahoo.com has IPv6 address 2001:4998:f00b:1fe::3000
www.yahoo.com has IPv6 address 2001:4998:f00d:1fe::3001
www.yahoo.com has IPv6 address 2001:4998:f00b:1fe::3001
www.yahoo.com has IPv6 address 2001:4998:f00d:1fe::3000
Check out the man pages for inet_ntop()
and inet_pton()
for some
useful details.
3 Part 2: mywget
In this part of the lab you will implement your own version of the
wget
Unix command line tool, which will download content from web
pages. For example, if you want to download the webpage for this
class, you might do something like this:
wget http://www.usna.edu/Users/cs/aviv/classes/ic221/s14/index.html
And this will download the HTML content at the domain www.usna.edu
and retrieve the documenta t the path
/Users/cs/aviv/classes/ic221/s14/index.html
and save it at the file
name index.html
.
To do all of that, you need to use sockets and know a bit about the HTTP. Let's start with the socket part, since that should be familiar from your experience with Java.
3.1 Sockets
In systems programming a socket is much like a file; in fact, it is a file descriptor that happens to write to the network device rather than to a file on disc. As long as you think of it that way, you're in pretty good shape.
The system call to open a socket is socket()
:
int socket(int domain, int type, int protocol);
The arguments can be described as follows:
domain
is the addressing domain of socket, which for our purposes is an internet socket orAF_INET
type
is the type of socket describing which transport protocol we are using. HTTP traffic is over TCP, so that would beSOCK_STREAM
. If we wanted to open a UDP socket, the keyword would beSOCK_DGRAM
protocol
is additional information about the protocol of the socket, but we will not need to use this option, so we'll set it to 0.
The return value of socket()
is a integer file descriptor on
success, and -1 on failure. All further socket operations,
connect()
, read()
, write()
, and etc, require that integer
file descriptor.
3.2 Connecting a Socket
Opening a socket isn't the same as connecting the socket to a
foreign address. To act a s a client and connect to a server using
the socket, you use the connect()
system call:
int connect(int socket, const struct sockaddr *address, socklen_t address_len);
Generally, given a socket socket
and a socket address address
,
try and connect the socket to that foreign address. Note that the
socket address is a generate socket (struct sockaddr
) which is
not necessarily an IP socket address (struct soaddr_in
), so
you'll need to cast:
struct sockaddr_in saddr_in; //fill in the address for usna.edu saddr_in.sin_family = AF_INET; inet_aton("10.4.32.41", &(saddr_in.sin_addr)); saddr_in.sin_prot = htons(80); //connect to the server if(connect(sock, (struct sockaddr *) &saddr_in, sizeof(struct sockaddr_in)) < 0){ perror("connect"); exit(1); }
3.3 Socket I/O
Sockets are file descriptors, and so we use the same interface to
read and write from them as we did for other kinds of file
descriptors, like open files and pipes and etc. That interface is
read()
and write()
, and should be very familiar to you now.
//read from socket and write to stdout while( (n = read(sock, buf, BUF_SIZE)) > 0){ if( write(sock, buf, n) < 0){ perror("write); exit(1); } } if( n < 0 ){ perror("read"); exit(1); }
3.4 HTTP GET interface
The only part of the protocol for HTTP you will need to implement is the GET request. A GET request is simply:
GET path HTTP/1.0
which basically request the web server to return the item at the path using version 1.0 of HTTP. If the file at the path exists, the server will respond with:
HTTP/1.1 200 OK
Where the "HTTP/1.1" indicates the protocol and "200" is the code, which in this case is success.
Other codes exist for errors, such as:
const char ERROR_300[]="HTTP/1.1 300 Multiple Choices\n"; const char ERROR_301[]="HTTP/1.1 301 Moved Permanently\n"; const char ERROR_400[]="HTTP/1.1 400 Bad Request\n"; const char ERROR_403[]="HTTP/1.1 403 Forbidden\n"; const char ERROR_404[]="HTTP/1.1 404 Not Found\n"; const char ERROR_500[]="HTTP/1.1 500 Internal Server Error\n";
Based on a response, it is easy to check an error code knowing that that code number occurs 9 bytes into the response.
if(! strncmp(response+9,"300",3)){ fprintf(stderr, "%s", ERROR_300); exit(1); }
If the code is a success (200), then the rest of the document request follows the HTTP headers. In your lab, you'll write those out to the file.
3.5 Task 1: mywget
For this task, change into the mywget
directory, where you will
find two files:
mywget.c
:myget
source file which you will completeMakefile
: to compile your program
Your goal is to complete the mywget
program so that it meets the
following specification.
- Given a domain name or IP address, perform a get request for the specified path by connecting to the server on port 80 (or on the other port provided)
- If the file exist at the path, save a copy of the file to the
basename
of the file. - If the server reports an error code, report the appropriate error code.
- Report other errors with sysetm calls, such as domain not exists and what not
An example, request my home page:
#> ./mywget
ERROR: Require domain and path
mywget domain path [port]
connect to the web server at domain and port, if provided, and request
the file at path. If the file exist, save the file based on the
filename of value in the path
If domain is not reachable, report error
#> ./mywget www.usna.edu /Users/cs/aviv/index.html
HTTP/1.1 200 OK
#>head -30 index.html
HTTP/1.1 200 OK
Date: Tue, 08 Apr 2014 20:31:17 GMT
Server: Apache
X-Powered-By: PHP/5.3.24
Connection: close
Content-Type: text/html
<html>
<head>
<style type="text/css">
p {color:black; font-family:arial; font-size:14px; text-align:justify; width:800px; }
ul {font-family:arial; font-size:14px; text-align:justify; width:800px; margin:10px; padding: 10px;}
li {padding:10px}
table {font-family:arial; text-align:justify; width:800px; margin:10px; padding: 10px;}
a:link {color: #003366;}
a:visited {color: #003366;}
a:hover {text-decoration: underline;}
</style>
<title> Adam J. Aviv</title>
</head>
<body>
<p>
<table>
<tr>
<td width="45%"></td>
(...)
The reason the data gets saved to index.html
is that its the
basename of the path, or the last entry in the path
/Users/cs/aviv/index.html
. For example, if we were retrieving a
different file:
#> ./mywget www.usna.edu /Users/cs/aviv/classes/ic221/s14/cal.html
HTTP/1.1 200 OK
#> head cal.html
HTTP/1.1 200 OK
Date: Tue, 08 Apr 2014 20:34:25 GMT
Server: Apache
X-Powered-By: PHP/5.3.24
Connection: close
Content-Type: text/html
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<link rel="stylesheet" type="text/css" href="class.css" />
You can retrieve the basename of a path with the library function
basename()
(see section 3 of the man page).
Finally, you should be able to detect a variety of error codes and error conditions:
>./mywget badbabdabda /index.html
ERROR: getaddrinfo: nodename nor servname provided, or not known: badbabdabda
>./mywget cnn.com /doesnotexist.html
HTTP/1.1 301 Moved Permanently
> ./mywget www.usna.edu /doesnotexist.html
HTTP/1.1 404 Not Found
#> ./mywget www.usna.edu /i
HTTP/1.1 300 Multiple Choices
Extra Credit (10 points: Add functionality so that you can use your
mywget
to connect to services on other ports and save to a given
file. For example:
#> ./mywget 10.53.33.232 batman.txt 6666
#> cat batman.txt
MMMMMMMMMMMMMMMMMMMMM. MMMMMMMMMMMMMMMMMMMMM
`MMMMMMMMMMMMMMMMMMMM M\ /M MMMMMMMMMMMMMMMMMMMM'
`MMMMMMMMMMMMMMMMMMM MMMMMM MMMMMMMMMMMMMMMMMMM'
MMMMMMMMMMMMMMMMMMM-_______MMMMMMMM_______-MMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
.MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM.
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
`MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM'
`MMMMMMMMMMMMMMMMMM'
`MMMMMMMMMM'
MMMMMM
MMMM
MM
#> ./mywget 10.53.33.232 anyfilename.txt 6666
#> cat anyfilename.txt
MMMMMMMMMMMMMMMMMMMMM. MMMMMMMMMMMMMMMMMMMMM
`MMMMMMMMMMMMMMMMMMMM M\ /M MMMMMMMMMMMMMMMMMMMM'
`MMMMMMMMMMMMMMMMMMM MMMMMM MMMMMMMMMMMMMMMMMMM'
MMMMMMMMMMMMMMMMMMM-_______MMMMMMMM_______-MMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
.MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM.
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
`MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM'
`MMMMMMMMMMMMMMMMMM'
`MMMMMMMMMM'
MMMMMM
MMMM
MM
You should take a look at atoi()
to convert a string to a number and
also think about how you might know when data has completed being sent
(perhaps, less data is sent then you expected?)