IC221: Systems Programming (SP18)


Home Policy Calendar Units Assignments Resources

Lab 12: Socket Programming

Table of Contents

Preliminaries

Lab Setup

Run the following command

~aviv/bin/ic221-up

Change into the lab directory

cd ~/ic221/lab/12

All the material you need to complete the lab can be found in the lab directory. All material you will submit, you should place within the lab directory. Throughout this lab, we refer to the lab directory, which you should interpret as the above path.

Submission Folder

For this lab, all ubmission should be placed in the following folder:

~/ic221/lab/12

README

In the top level of the lab directory, you will find a README file. You must fill out the README file with your name and alpha. Please include a short summary of each of the tasks and any other information you want to provide to the instructor.

Test Script

You are provided a test script which prints pass/fail information for a set of tests for your programs. Note that passing all the tests does not mean you will receive a perfect score: other tests will be performed on your submission. To run the test script, execute test.sh from the lab directory.

./test.sh

You can comment out individual tests while working on different parts of the lab. Open up the test script and place comments at the bottom where appropriate.

mywget

In this part of the lab you will implement your own version of the wget Unix command line tool, which will download content from web pages. For example, if you want to download the webpage for this class, you might do something like this:

wget http://www.usna.edu/Users/cs/aviv/classes/ic221/s17/index.html

And this will download the HTML content at the domain www.usna.edu and retrieve the documenta t the path /Users/cs/aviv/classes/ic221/s17/index.html and save it at the file name index.html.

To do all of that, you need to use sockets and know a bit about the HTTP. Let's start with the socket part, since that should be familiar from your experience with Java.

HTTP GET interface

The only part of the protocol for HTTP you will need to implement is the GET request. A GET request is simply:

GET path HTTP/1.1\r\n
Host: hostname\r\n
\r\n

The \r\n that end each line are actually part of the protocol and are escape sequences that need to be included. The basics of the request to the web server is indicate which of the hosts we are interested and what file we wish to retrieve.

HTTP/1.1 200 OK

Where the "HTTP/1.1" indicates the protocol and "200" is the code, which in this case is success.

For example, we can simulate such a request using netcat like the following:

aviv@csfaculty: mywget $ echo -e  "GET /~aviv/index.html HTTP/1.1\r\nHost: csmidn.academy.usna.edu\r\n\r\n"   | netcat csmidn.academy.usna.edu 80
HTTP/1.1 200 OK
Date: Wed, 18 Apr 2018 13:22:48 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Sun, 15 Apr 2018 21:56:16 GMT
ETag: "28-569ea2cb36e5d"
Accept-Ranges: bytes
Content-Length: 40
Content-Type: text/html

<html>
  <h1> You did it! </h1>
</html>

If you send a bad request, you'll get an error code, for example a malformatted request.

aviv@csfaculty: mywget $ echo -e  "GET /~aviv/index.html HTTP/1.1"   | netcat csmidn.academy.usna.edu 80
HTTP/1.1 400 Bad Request
Date: Wed, 18 Apr 2018 13:24:59 GMT
Server: Apache/2.4.18 (Ubuntu)
Content-Length: 315
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
<hr>
<address>Apache/2.4.18 (Ubuntu) Server at csmidn.academy.usna.edu Port 80</address>
</body></html>

Other codes exist for errors, such as:

const char ERROR_300[]="HTTP/1.1 300 Multiple Choices\n";
const char ERROR_301[]="HTTP/1.1 301 Moved Permanently\n";
const char ERROR_400[]="HTTP/1.1 400 Bad Request\n";
const char ERROR_403[]="HTTP/1.1 403 Forbidden\n";
const char ERROR_404[]="HTTP/1.1 404 Not Found\n";
const char ERROR_500[]="HTTP/1.1 500 Internal Server Error\n";

Based on a response, it is easy to check an error code knowing that that code number occurs 9 bytes into the response.

if(! strncmp(response+9,"300",3)){
  fprintf(stderr, "%s", ERROR_300);
  exit(1);
}

If the code is a success (200), then the rest of the document request follows the HTTP headers. In your lab, you'll write those out to the file.

Task : mywget

For this task, change into the mywget directory, where you will find two files:

  • mywget.c : myget source file which you will complete
  • Makefile : to compile your program

Your goal is to complete the mywget program so that it meets the following specification.

  1. Given a domain name or IP address, perform a get request for the specified path by connecting to the server on port 80 (or on the other port provided)
  2. If the file exist at the path, save a copy of the file to the basename of the file.
  3. If the server reports an error code, report the appropriate error code.
  4. Report other errors with sysetm calls, such as domain not exists and what not

An example, request of files on yog.cs.usna.edu

aviv@saddleback: mywget $ ./mywget 
ERROR: Require domain and path
mywget domain path [port]

connect to the web server at domain and port, if provided, and request
the file at path. If the file exist, save the file based on the
filename of value in the path

If domain is not reachable, report error

aviv@saddleback: mywget $ ./mywget csmidn.academy.usna.edu  /~aviv/index.html
HTTP/1.1 200 OK
aviv@saddleback: mywget $ cat index.html 
HTTP/1.1 200 OK
Date: Tue, 14 Mar 2017 20:26:53 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Tue, 14 Mar 2017 20:26:12 GMT
ETag: "29-54ab6a390412c"
Accept-Ranges: bytes
Content-Length: 41
Connection: close
Content-Type: text/html

<html>
  <h1> You did it! </h1>

</html>

The reason the data gets saved to index.html is that its the basename of the path, or the last entry in the path /~aviv/index.html. For example, if we were retrieving a different file, e.g., /~aviv/saturn.txt it would be saved to saturn.txt:

aviv@saddleback: mywget $ ./mywget csmidn.academy.usna.edu  /~aviv/saturn.txt
HTTP/1.1 200 OK
aviv@saddleback: mywget $ cat saturn.txt 
HTTP/1.1 200 OK
Date: Tue, 14 Mar 2017 20:29:08 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Tue, 14 Mar 2017 20:28:56 GMT
ETag: "1e2-54ab6ad5b3211"
Accept-Ranges: bytes
Content-Length: 482
Vary: Accept-Encoding
Connection: close
Content-Type: text/plain

                                        _.oo.
                 _.u[[/;:,.         .odMMMMMM'
              .o888UU[[[/;:-.  .o@P^    MMM^
             oN88888UU[[[/;::-.        dP^
            dNMMNN888UU[[[/;:--.   .o@P^
           ,MMMMMMN888UU[[/;::-. o@^
           NNMMMNN888UU[[[/~.o@P^
           888888888UU[[[/o@^-..
          oI8888UU[[[/o@P^:--..
       .@^  YUU[[[/o@^;::---..
     oMP     ^/o@P^;:::---..
  .dMMM    .o@^ ^;::---...
 dMMMMMMM@^`       `^^^^
YMMMUP^
 ^^

You can retrieve the basename of a path with the library function basename() (see section 3 of the man page).

Finally, you should be able to detect a variety of error codes and error conditions:

aviv@saddleback: mywget $ ./mywget badbabdabda /index.html
ERROR: getaddrinfo: Name or service not known: badbabdabda
aviv@saddleback: mywget $ ./mywget badbabdabda /index.html
ERROR: getaddrinfo: Name or service not known: badbabdabda
aviv@saddleback: mywget $ ./mywget yog.cs.usna.edu /doesnotexist.html
HTTP/1.1 404 Not Found
aviv@saddleback: mywget $ ./mywget yog.cs.usna.edu /i
HTTP/1.1 300 Multiple Choices

Extra Credit (5 points): Add functionality so that you can use your mywget to connect to services on other ports and save to a given file. For example:

aviv@saddleback: mywget $ ./mywget csfaculty.academy.usna.edu batman.txt 6666
aviv@saddleback: mywget $ cat batman.txt 

MMMMMMMMMMMMMMMMMMMMM.                             MMMMMMMMMMMMMMMMMMMMM
 `MMMMMMMMMMMMMMMMMMMM           M\  /M           MMMMMMMMMMMMMMMMMMMM'
   `MMMMMMMMMMMMMMMMMMM          MMMMMM          MMMMMMMMMMMMMMMMMMM'  
     MMMMMMMMMMMMMMMMMMM-_______MMMMMMMM_______-MMMMMMMMMMMMMMMMMMM    
      MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM    
      MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM    
      MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM    
     .MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM.    
    MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM  
                   `MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM'                
                          `MMMMMMMMMMMMMMMMMM'                    
                              `MMMMMMMMMM'                              
                                 MMMMMM                              
                                  MMMM                                  
                                   MM               
aviv@saddleback: mywget $ ./mywget csfaculty.academy.usna.edu foobar.txt 6666
aviv@saddleback: mywget $ cat foobar.txt 

MMMMMMMMMMMMMMMMMMMMM.                             MMMMMMMMMMMMMMMMMMMMM
 `MMMMMMMMMMMMMMMMMMMM           M\  /M           MMMMMMMMMMMMMMMMMMMM'
   `MMMMMMMMMMMMMMMMMMM          MMMMMM          MMMMMMMMMMMMMMMMMMM'  
     MMMMMMMMMMMMMMMMMMM-_______MMMMMMMM_______-MMMMMMMMMMMMMMMMMMM    
      MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM    
      MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM    
      MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM    
     .MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM.    
    MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM  
                   `MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM'                
                          `MMMMMMMMMMMMMMMMMM'                    
                              `MMMMMMMMMM'                              
                                 MMMMMM                              
                                  MMMM                                  
                                   MM               

Think about how you might know when data has completed being sent (perhaps, less data is sent then you expected?) To qualify for extra credit, both the extra credit and the other version should work. You will need to take a different action depending on the command line arguments.