Lab 12: Client Socket Programming
Table of Contents
1 Preliminaries
In this lab you will complete a set of C programs to expose you to client socket programming and addressing.
1.1 Lab Learning Goals
In this lab, you will learn the following topics and practice C programming skills.
- Using and converting between different address structures
- Converting data in and from network byte order
- Opening and Connecting a client socket
- Reading and Writing from a client socket
1.2 Lab Setup
Run the following command
~aviv/bin/ic221-up
Change into the lab directory
cd ~/ic221/labs/12
All the material you need to complete the lab can be found in the lab directory. All material you will submit, you should place within the lab directory. Throughout this lab, we refer to the lab directory, which you should interpret as the above path.
1.3 Submission Folder
For this lab, all ubmission should be placed in the following folder:
~/ic221/labs/12/
This directory contains 4 sub-directories; examples
, timer
,
term-status
, and mini-sh
. In the examples
directory you will
find any source code in this lab document. All lab work should be
done in the remaining directories.
- Only source files found in the folder will be graded.
- Do not change the names of any source files
Finally, in the top level of the lab directory, you will find a
README
file. You must complete the README
file, and include any
additional details that might be needed to complete this lab.
1.4 Compiling your programs with clang
and make
You are not required to provide your own Makefiles for this lab.
1.5 README
In the top level of the lab directory, you will find a README
file. You must fill out the README file with your name and alpha.
Please include a short summary of each of the tasks and any other
information you want to provide to the instructor.
1.6 Test Script
You are provided a test script which prints pass/fail information
for a set of tests for your programs. Note that passing all the
tests does not mean you will receive a perfect score: other tests
will be performed on your submission. To run the test script,
execute test.sh
from the lab directory.
./test.sh
You can comment out individual tests while working on different parts of the lab. Open up the test script and place comments at the bottom where appropriate.
2 Part 1: myhost
In this part of the lab you will implement your own version of the
host
command. Recall that the Unix tool host
performs a DNS look
up of a domain name to resolve it to the IP address, but as we
observed in the previous lesson, a domain name may resolve to
multiple IP addresses in both version 4 and version 6 of the IP
address specification.
The library function to resolve an domain name to an IP address is
getaddrinfo()
and the relevant result structure is an struct
addrinfo
.
struct addrinfo { int ai_flags; int ai_family; int ai_socktype; int ai_protocol; size_t ai_addrlen; struct sockaddr *ai_addr; char *ai_canonname; struct addrinfo *ai_next; };
So far, we've discussed the ai_addr
, which stores the socket
address that we cast to a ipv4 socket, to eventually get the IP
address. An interesting aspect of this structure is that it also a
node within a linked next. The ai_next
field stores a pointer to
the next addrinfo
which stores other resolved to IP address for
the domain. Eventually, ai_next
references NULL, which indicates
the end of the list. We can now iterate over the linked list of
addrinfo
's like so:
struct addrinfo * cur_result, *results, *hints; //... //Convert the hostname to an address if( (s = getaddrinfo(argv[1], NULL, &hints, &result)) != 0){ fprintf(stderr, "getaddrinfo: %s\n",gai_strerror(s)); exit(1); } for(cur_result = result; cur_result != NULL; cur_result = cur_result->ai_next){ //do something with the current result }
Another important aspect of the addrinfo
is the ai_protocol
which describes how the domain name is resolved based on which
application layer protocol we are interested in. For example, often
domains have different IP addresses for web traffic and email, and
this difference is represented in the ai_protocol
field. Since we
are only interested in the primary resolution of the domain, we can
check for that by comparing against the constant IPPROTO_TCP
since
we want IP address accepting TCP connections.
cur_result->ai_protocol == IPPROTO_TCP
The last aspect of the addrinfo structure to consider is that
ai_family
, which describes the kind of address was resolved. This
could either be IPv4 (AFINET) or IPv6 (AFINET6). We are primarily
concerned with IPv4, so you can compare ai_family
to AF_INET
to
ensure you are only access resolution of the right IP address:
cur_result->ai_family == AF_INET
For both of these options, you could specify that choice in the
hint
addrinfo when making the getaddrinfo()
. Then,
getaddrinfo()
will only return results that fit those
requirements.
2.1 Task 1: myhost
For this task, change into the myhost
directory, where you will
find two files:
myhost.c
:myhost
source file which you will completeMakefile
: to compile your program
Your goal is to complete the myhost
program so that it meets the
following specification.
- For a given domain name, it must resolve all the IP addresses associated with protocol 0 for that domain.
- It should print out the results as similarly to the Unix
host
command as possible. - It should only consider IPv4 addresses
Below is some sample output:
#> ./myhost google.com
google.com has address 74.125.228.0
google.com has address 74.125.228.3
google.com has address 74.125.228.8
google.com has address 74.125.228.2
google.com has address 74.125.228.1
google.com has address 74.125.228.9
google.com has address 74.125.228.4
google.com has address 74.125.228.14
google.com has address 74.125.228.6
google.com has address 74.125.228.7
google.com has address 74.125.228.5
#> ./myhost yahoo.com
yahoo.com has address 98.139.183.24
yahoo.com has address 98.138.253.109
yahoo.com has address 206.190.36.45
#> ./myhost microsoft.com
microsoft.com has address 64.4.11.37
microsoft.com has address 65.55.58.201
#> ./myhost www.seas.upenn.edu
www.seas.upenn.edu has address 158.130.68.91
#> ./myhost www.cs.stanford.edu
www.cs.stanford.edu has address 171.64.64.64
#> ./myhost badbabbadbad.bad.bcom
getaddrinfo: nodename nor servname provided, or not known
#> ./myhost
getaddrinfo: nodename nor servname provided, or not known
EXTRA CREDIT (5 points): Add the ability for your myhost
program
to resolve the IPv6 address with the same format as host
. For example:
#> ./myhost www.seas.upenn.edu
www.seas.upenn.edu has address 158.130.68.91
www.seas.upenn.edu has IPv6 address 2607:f470:8:64:5ea5::9
./myhost www.yahoo.com
www.yahoo.com has address 98.138.252.30
www.yahoo.com has address 98.138.253.109
www.yahoo.com has address 98.139.180.149
www.yahoo.com has IPv6 address 2001:4998:f00b:1fe::3000
www.yahoo.com has IPv6 address 2001:4998:f00d:1fe::3001
www.yahoo.com has IPv6 address 2001:4998:f00b:1fe::3001
www.yahoo.com has IPv6 address 2001:4998:f00d:1fe::3000
Check out the man pages for inet_ntop()
and inet_pton()
for some
useful details.
3 Part 2: mywget
In this part of the lab you will implement your own version of the
wget
Unix command line tool, which will download content from web
pages. For example, if you want to download the webpage for this
class, you might do something like this:
wget http://www.usna.edu/Users/cs/aviv/classes/ic221/s14/index.html
And this will download the HTML content at the domain www.usna.edu
and retrieve the documenta t the path
/Users/cs/aviv/classes/ic221/s14/index.html
and save it at the file
name index.html
.
To do all of that, you need to use sockets and know a bit about the HTTP. Let's start with the socket part, since that should be familiar from your experience with Java.
3.1 HTTP GET interface
The only part of the protocol for HTTP you will need to implement is the GET request. A GET request is simply:
GET path HTTP/1.0
which basically request the web server to return the item at the path using version 1.0 of HTTP. If the file at the path exists, the server will respond with:
HTTP/1.1 200 OK
Where the "HTTP/1.1" indicates the protocol and "200" is the code, which in this case is success.
Other codes exist for errors, such as:
const char ERROR_300[]="HTTP/1.1 300 Multiple Choices\n"; const char ERROR_301[]="HTTP/1.1 301 Moved Permanently\n"; const char ERROR_400[]="HTTP/1.1 400 Bad Request\n"; const char ERROR_403[]="HTTP/1.1 403 Forbidden\n"; const char ERROR_404[]="HTTP/1.1 404 Not Found\n"; const char ERROR_500[]="HTTP/1.1 500 Internal Server Error\n";
Based on a response, it is easy to check an error code knowing that that code number occurs 9 bytes into the response.
if(! strncmp(response+9,"300",3)){ fprintf(stderr, "%s", ERROR_300); exit(1); }
If the code is a success (200), then the rest of the document request follows the HTTP headers. In your lab, you'll write those out to the file.
3.2 Task 1: mywget
For this task, change into the mywget
directory, where you will
find two files:
mywget.c
:myget
source file which you will completeMakefile
: to compile your program
Your goal is to complete the mywget
program so that it meets the
following specification.
- Given a domain name or IP address, perform a get request for the specified path by connecting to the server on port 80 (or on the other port provided)
- If the file exist at the path, save a copy of the file to the
basename
of the file. - If the server reports an error code, report the appropriate error code.
- Report other errors with sysetm calls, such as domain not exists and what not
An example, request my home page:
#> ./mywget
ERROR: Require domain and path
mywget domain path [port]
connect to the web server at domain and port, if provided, and request
the file at path. If the file exist, save the file based on the
filename of value in the path
If domain is not reachable, report error
#> ./mywget www.usna.edu /Users/cs/aviv/index.html
HTTP/1.1 200 OK
#>head -30 index.html
HTTP/1.1 200 OK
Date: Tue, 08 Apr 2014 20:31:17 GMT
Server: Apache
X-Powered-By: PHP/5.3.24
Connection: close
Content-Type: text/html
<html>
<head>
<style type="text/css">
p {color:black; font-family:arial; font-size:14px; text-align:justify; width:800px; }
ul {font-family:arial; font-size:14px; text-align:justify; width:800px; margin:10px; padding: 10px;}
li {padding:10px}
table {font-family:arial; text-align:justify; width:800px; margin:10px; padding: 10px;}
a:link {color: #003366;}
a:visited {color: #003366;}
a:hover {text-decoration: underline;}
</style>
<title> Adam J. Aviv</title>
</head>
<body>
<p>
<table>
<tr>
<td width="45%"></td>
(...)
The reason the data gets saved to index.html
is that its the
basename of the path, or the last entry in the path
/Users/cs/aviv/index.html
. For example, if we were retrieving a
different file:
aviv@saddleback: mywget $ ./mywget www.usna.edu /Users/cs/aviv/classes/ic221/s15/cal.html
HTTP/1.1 200 OK
aviv@saddleback: mywget $ head cal.html
HTTP/1.1 200 OK
Date: Mon, 13 Apr 2015 21:07:30 GMT
Server: Apache
X-Powered-By: PHP/5.3.24
Strict-Transport-Security: max-age=15768000;includeSubDomains
Connection: close
Content-Type: text/html
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
You can retrieve the basename of a path with the library function
basename()
(see section 3 of the man page).
Finally, you should be able to detect a variety of error codes and error conditions:
aviv@saddleback: mywget $ ./mywget badbabdabda /index.html
ERROR: getaddrinfo: Name or service not known: badbabdabda
aviv@saddleback: mywget $ ./mywget badbabdabda /index.html
ERROR: getaddrinfo: Name or service not known: badbabdabda
aviv@saddleback: mywget $ ./mywget www.usna.edu /doesnotexist.html
HTTP/1.1 404 Not Found
aviv@saddleback: mywget $ ./mywget www.usna.edu /i
HTTP/1.1 300 Multiple Choices
Extra Credit (5 points): Add functionality so that you can use your
mywget
to connect to services on other ports and save to a given
file. For example:
#> ./mywget 10.53.33.232 batman.txt 6666
#> cat batman.txt
MMMMMMMMMMMMMMMMMMMMM. MMMMMMMMMMMMMMMMMMMMM
`MMMMMMMMMMMMMMMMMMMM M\ /M MMMMMMMMMMMMMMMMMMMM'
`MMMMMMMMMMMMMMMMMMM MMMMMM MMMMMMMMMMMMMMMMMMM'
MMMMMMMMMMMMMMMMMMM-_______MMMMMMMM_______-MMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
.MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM.
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
`MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM'
`MMMMMMMMMMMMMMMMMM'
`MMMMMMMMMM'
MMMMMM
MMMM
MM
#> ./mywget 10.53.33.232 anyfilename.txt 6666
#> cat anyfilename.txt
MMMMMMMMMMMMMMMMMMMMM. MMMMMMMMMMMMMMMMMMMMM
`MMMMMMMMMMMMMMMMMMMM M\ /M MMMMMMMMMMMMMMMMMMMM'
`MMMMMMMMMMMMMMMMMMM MMMMMM MMMMMMMMMMMMMMMMMMM'
MMMMMMMMMMMMMMMMMMM-_______MMMMMMMM_______-MMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
.MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM.
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
`MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM'
`MMMMMMMMMMMMMMMMMM'
`MMMMMMMMMM'
MMMMMM
MMMM
MM
Think about how you might know when data has completed being sent (perhaps, less data is sent then you expected?) To get a full credit, both the extra credit and the other version should work. You must take a different action depending on the command line arguments.