IC221: Systems Programming (SP15)


Home Policy Calendar

Lab 12: Client Socket Programming

Table of Contents

1 Preliminaries

In this lab you will complete a set of C programs to expose you to client socket programming and addressing.

1.1 Lab Learning Goals

In this lab, you will learn the following topics and practice C programming skills.

  1. Using and converting between different address structures
  2. Converting data in and from network byte order
  3. Opening and Connecting a client socket
  4. Reading and Writing from a client socket

1.2 Lab Setup

Run the following command

~aviv/bin/ic221-up

Change into the lab directory

cd ~/ic221/labs/12

All the material you need to complete the lab can be found in the lab directory. All material you will submit, you should place within the lab directory. Throughout this lab, we refer to the lab directory, which you should interpret as the above path.

1.3 Submission Folder

For this lab, all ubmission should be placed in the following folder:

~/ic221/labs/12/

This directory contains 4 sub-directories; examples, timer, term-status, and mini-sh. In the examples directory you will find any source code in this lab document. All lab work should be done in the remaining directories.

  • Only source files found in the folder will be graded.
  • Do not change the names of any source files

Finally, in the top level of the lab directory, you will find a README file. You must complete the README file, and include any additional details that might be needed to complete this lab.

1.4 Compiling your programs with clang and make

You are not required to provide your own Makefiles for this lab.

1.5 README

In the top level of the lab directory, you will find a README file. You must fill out the README file with your name and alpha. Please include a short summary of each of the tasks and any other information you want to provide to the instructor.

1.6 Test Script

You are provided a test script which prints pass/fail information for a set of tests for your programs. Note that passing all the tests does not mean you will receive a perfect score: other tests will be performed on your submission. To run the test script, execute test.sh from the lab directory.

./test.sh

You can comment out individual tests while working on different parts of the lab. Open up the test script and place comments at the bottom where appropriate.

2 Part 1: myhost

In this part of the lab you will implement your own version of the host command. Recall that the Unix tool host performs a DNS look up of a domain name to resolve it to the IP address, but as we observed in the previous lesson, a domain name may resolve to multiple IP addresses in both version 4 and version 6 of the IP address specification.

The library function to resolve an domain name to an IP address is getaddrinfo() and the relevant result structure is an struct addrinfo.

struct addrinfo {
          int              ai_flags;  
          int              ai_family;
          int              ai_socktype; 
          int              ai_protocol;
          size_t           ai_addrlen;
          struct sockaddr *ai_addr;
          char            *ai_canonname;
          struct addrinfo *ai_next;
      };

So far, we've discussed the ai_addr, which stores the socket address that we cast to a ipv4 socket, to eventually get the IP address. An interesting aspect of this structure is that it also a node within a linked next. The ai_next field stores a pointer to the next addrinfo which stores other resolved to IP address for the domain. Eventually, ai_next references NULL, which indicates the end of the list. We can now iterate over the linked list of addrinfo's like so:

struct addrinfo * cur_result, *results, *hints;
//...

//Convert the hostname to an address
if( (s = getaddrinfo(argv[1], NULL, &hints, &result)) != 0){
  fprintf(stderr, "getaddrinfo: %s\n",gai_strerror(s));
  exit(1);
 }

for(cur_result = result; cur_result != NULL; cur_result = cur_result->ai_next){
  //do something with the current result
 }

Another important aspect of the addrinfo is the ai_protocol which describes how the domain name is resolved based on which application layer protocol we are interested in. For example, often domains have different IP addresses for web traffic and email, and this difference is represented in the ai_protocol field. Since we are only interested in the primary resolution of the domain, we can check for that by comparing against the constant IPPROTO_TCP since we want IP address accepting TCP connections.

cur_result->ai_protocol == IPPROTO_TCP

The last aspect of the addrinfo structure to consider is that ai_family, which describes the kind of address was resolved. This could either be IPv4 (AFINET) or IPv6 (AFINET6). We are primarily concerned with IPv4, so you can compare ai_family to AF_INET to ensure you are only access resolution of the right IP address:

cur_result->ai_family == AF_INET

For both of these options, you could specify that choice in the hint addrinfo when making the getaddrinfo(). Then, getaddrinfo() will only return results that fit those requirements.

2.1 Task 1: myhost

For this task, change into the myhost directory, where you will find two files:

  • myhost.c : myhost source file which you will complete
  • Makefile : to compile your program

Your goal is to complete the myhost program so that it meets the following specification.

  1. For a given domain name, it must resolve all the IP addresses associated with protocol 0 for that domain.
  2. It should print out the results as similarly to the Unix host command as possible.
  3. It should only consider IPv4 addresses

Below is some sample output:

#> ./myhost google.com
google.com has address 74.125.228.0
google.com has address 74.125.228.3
google.com has address 74.125.228.8
google.com has address 74.125.228.2
google.com has address 74.125.228.1
google.com has address 74.125.228.9
google.com has address 74.125.228.4
google.com has address 74.125.228.14
google.com has address 74.125.228.6
google.com has address 74.125.228.7
google.com has address 74.125.228.5
#> ./myhost yahoo.com
yahoo.com has address 98.139.183.24
yahoo.com has address 98.138.253.109
yahoo.com has address 206.190.36.45
#> ./myhost microsoft.com
microsoft.com has address 64.4.11.37
microsoft.com has address 65.55.58.201
#> ./myhost www.seas.upenn.edu
www.seas.upenn.edu has address 158.130.68.91
#> ./myhost www.cs.stanford.edu
www.cs.stanford.edu has address 171.64.64.64
#> ./myhost badbabbadbad.bad.bcom
getaddrinfo: nodename nor servname provided, or not known
#> ./myhost 
getaddrinfo: nodename nor servname provided, or not known

EXTRA CREDIT (5 points): Add the ability for your myhost program to resolve the IPv6 address with the same format as host. For example:

#> ./myhost www.seas.upenn.edu
www.seas.upenn.edu has address 158.130.68.91
www.seas.upenn.edu has IPv6 address 2607:f470:8:64:5ea5::9
./myhost www.yahoo.com
www.yahoo.com has address 98.138.252.30
www.yahoo.com has address 98.138.253.109
www.yahoo.com has address 98.139.180.149
www.yahoo.com has IPv6 address 2001:4998:f00b:1fe::3000
www.yahoo.com has IPv6 address 2001:4998:f00d:1fe::3001
www.yahoo.com has IPv6 address 2001:4998:f00b:1fe::3001
www.yahoo.com has IPv6 address 2001:4998:f00d:1fe::3000

Check out the man pages for inet_ntop() and inet_pton() for some useful details.

3 Part 2: mywget

In this part of the lab you will implement your own version of the wget Unix command line tool, which will download content from web pages. For example, if you want to download the webpage for this class, you might do something like this:

wget http://www.usna.edu/Users/cs/aviv/classes/ic221/s14/index.html

And this will download the HTML content at the domain www.usna.edu and retrieve the documenta t the path /Users/cs/aviv/classes/ic221/s14/index.html and save it at the file name index.html.

To do all of that, you need to use sockets and know a bit about the HTTP. Let's start with the socket part, since that should be familiar from your experience with Java.

3.1 HTTP GET interface

The only part of the protocol for HTTP you will need to implement is the GET request. A GET request is simply:

GET path HTTP/1.0

which basically request the web server to return the item at the path using version 1.0 of HTTP. If the file at the path exists, the server will respond with:

HTTP/1.1 200 OK

Where the "HTTP/1.1" indicates the protocol and "200" is the code, which in this case is success.

Other codes exist for errors, such as:

const char ERROR_300[]="HTTP/1.1 300 Multiple Choices\n";
const char ERROR_301[]="HTTP/1.1 301 Moved Permanently\n";
const char ERROR_400[]="HTTP/1.1 400 Bad Request\n";
const char ERROR_403[]="HTTP/1.1 403 Forbidden\n";
const char ERROR_404[]="HTTP/1.1 404 Not Found\n";
const char ERROR_500[]="HTTP/1.1 500 Internal Server Error\n";

Based on a response, it is easy to check an error code knowing that that code number occurs 9 bytes into the response.

if(! strncmp(response+9,"300",3)){
  fprintf(stderr, "%s", ERROR_300);
  exit(1);
}

If the code is a success (200), then the rest of the document request follows the HTTP headers. In your lab, you'll write those out to the file.

3.2 Task 1: mywget

For this task, change into the mywget directory, where you will find two files:

  • mywget.c : myget source file which you will complete
  • Makefile : to compile your program

Your goal is to complete the mywget program so that it meets the following specification.

  1. Given a domain name or IP address, perform a get request for the specified path by connecting to the server on port 80 (or on the other port provided)
  2. If the file exist at the path, save a copy of the file to the basename of the file.
  3. If the server reports an error code, report the appropriate error code.
  4. Report other errors with sysetm calls, such as domain not exists and what not

An example, request my home page:

#> ./mywget 
ERROR: Require domain and path
mywget domain path [port]

connect to the web server at domain and port, if provided, and request
the file at path. If the file exist, save the file based on the
filename of value in the path

If domain is not reachable, report error
#> ./mywget www.usna.edu /Users/cs/aviv/index.html
HTTP/1.1 200 OK
#>head -30 index.html
HTTP/1.1 200 OK
Date: Tue, 08 Apr 2014 20:31:17 GMT
Server: Apache
X-Powered-By: PHP/5.3.24
Connection: close
Content-Type: text/html

<html>
<head> 
<style type="text/css">
p {color:black; font-family:arial; font-size:14px; text-align:justify; width:800px; }
ul {font-family:arial; font-size:14px; text-align:justify; width:800px; margin:10px; padding: 10px;}
li {padding:10px}
table {font-family:arial; text-align:justify; width:800px; margin:10px; padding: 10px;}
a:link    {color: #003366;}
a:visited {color: #003366;}
a:hover   {text-decoration: underline;}
</style>

<title> Adam J. Aviv</title>


</head>

<body>
<p>
<table>

<tr>
<td width="45%"></td>
(...)

The reason the data gets saved to index.html is that its the basename of the path, or the last entry in the path /Users/cs/aviv/index.html. For example, if we were retrieving a different file:

aviv@saddleback: mywget $  ./mywget www.usna.edu /Users/cs/aviv/classes/ic221/s15/cal.html
HTTP/1.1 200 OK
aviv@saddleback: mywget $ head cal.html
HTTP/1.1 200 OK
Date: Mon, 13 Apr 2015 21:07:30 GMT
Server: Apache
X-Powered-By: PHP/5.3.24
Strict-Transport-Security: max-age=15768000;includeSubDomains
Connection: close
Content-Type: text/html

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">

You can retrieve the basename of a path with the library function basename() (see section 3 of the man page).

Finally, you should be able to detect a variety of error codes and error conditions:

aviv@saddleback: mywget $ ./mywget badbabdabda /index.html
ERROR: getaddrinfo: Name or service not known: badbabdabda
aviv@saddleback: mywget $ ./mywget badbabdabda /index.html
ERROR: getaddrinfo: Name or service not known: badbabdabda
aviv@saddleback: mywget $ ./mywget www.usna.edu /doesnotexist.html
HTTP/1.1 404 Not Found
aviv@saddleback: mywget $ ./mywget www.usna.edu /i
HTTP/1.1 300 Multiple Choices

Extra Credit (5 points): Add functionality so that you can use your mywget to connect to services on other ports and save to a given file. For example:

#> ./mywget 10.53.33.232 batman.txt 6666
#> cat batman.txt

MMMMMMMMMMMMMMMMMMMMM.                             MMMMMMMMMMMMMMMMMMMMM
 `MMMMMMMMMMMMMMMMMMMM           M\  /M           MMMMMMMMMMMMMMMMMMMM'
   `MMMMMMMMMMMMMMMMMMM          MMMMMM          MMMMMMMMMMMMMMMMMMM'  
     MMMMMMMMMMMMMMMMMMM-_______MMMMMMMM_______-MMMMMMMMMMMMMMMMMMM    
      MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM    
      MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM    
      MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM    
     .MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM.    
    MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM  
                   `MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM'                
                          `MMMMMMMMMMMMMMMMMM'                    
                              `MMMMMMMMMM'                              
                                 MMMMMM                              
                                  MMMM                                  
                                   MM                           
#> ./mywget 10.53.33.232 anyfilename.txt 6666
#> cat anyfilename.txt

MMMMMMMMMMMMMMMMMMMMM.                             MMMMMMMMMMMMMMMMMMMMM
 `MMMMMMMMMMMMMMMMMMMM           M\  /M           MMMMMMMMMMMMMMMMMMMM'
   `MMMMMMMMMMMMMMMMMMM          MMMMMM          MMMMMMMMMMMMMMMMMMM'  
     MMMMMMMMMMMMMMMMMMM-_______MMMMMMMM_______-MMMMMMMMMMMMMMMMMMM    
      MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM    
      MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM    
      MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM    
     .MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM.    
    MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM  
                   `MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM'                
                          `MMMMMMMMMMMMMMMMMM'                    
                              `MMMMMMMMMM'                              
                                 MMMMMM                              
                                  MMMM                                  
                                   MM

Think about how you might know when data has completed being sent (perhaps, less data is sent then you expected?) To get a full credit, both the extra credit and the other version should work. You must take a different action depending on the command line arguments.