Lab 12: Socket Programming
Table of Contents
Preliminaries
Lab Setup
Run the following command
~aviv/bin/ic221-up
Change into the lab directory
cd ~/ic221/lab/12
All the material you need to complete the lab can be found in the lab directory. All material you will submit, you should place within the lab directory. Throughout this lab, we refer to the lab directory, which you should interpret as the above path.
Submission Folder
For this lab, all ubmission should be placed in the following folder:
~/ic221/lab/12
README
In the top level of the lab directory, you will find a README
file. You must fill out the README file with your name and alpha.
Please include a short summary of each of the tasks and any other
information you want to provide to the instructor.
Test Script
You are provided a test script which prints pass/fail information
for a set of tests for your programs. Note that passing all the
tests does not mean you will receive a perfect score: other tests
will be performed on your submission. To run the test script,
execute test.sh
from the lab directory.
./test.sh
You can comment out individual tests while working on different parts of the lab. Open up the test script and place comments at the bottom where appropriate.
mywget
In this part of the lab you will implement your own version of the
wget
Unix command line tool, which will download content from web
pages. For example, if you want to download the webpage for this
class, you might do something like this:
wget http://www.usna.edu/Users/cs/aviv/classes/ic221/s17/index.html
And this will download the HTML content at the domain www.usna.edu
and retrieve the documenta t the path
/Users/cs/aviv/classes/ic221/s17/index.html
and save it at the file
name index.html
.
To do all of that, you need to use sockets and know a bit about the HTTP. Let's start with the socket part, since that should be familiar from your experience with Java.
HTTP GET interface
The only part of the protocol for HTTP you will need to implement is the GET request. A GET request is simply:
GET path HTTP/1.1\r\n Host: hostname\r\n \r\n
The \r\n
that end each line are actually part of the protocol and are
escape sequences that need to be included. The basics of the request to the
web server is indicate which of the hosts we are interested and what file we
wish to retrieve.
HTTP/1.1 200 OK
Where the "HTTP/1.1" indicates the protocol and "200" is the code, which in this case is success.
For example, we can simulate such a request using netcat like the following:
aviv@csfaculty: mywget $ echo -e "GET /~aviv/index.html HTTP/1.1\r\nHost: csmidn.academy.usna.edu\r\n\r\n" | netcat csmidn.academy.usna.edu 80 HTTP/1.1 200 OK Date: Wed, 18 Apr 2018 13:22:48 GMT Server: Apache/2.4.18 (Ubuntu) Last-Modified: Sun, 15 Apr 2018 21:56:16 GMT ETag: "28-569ea2cb36e5d" Accept-Ranges: bytes Content-Length: 40 Content-Type: text/html <html> <h1> You did it! </h1> </html>
If you send a bad request, you'll get an error code, for example a malformatted request.
aviv@csfaculty: mywget $ echo -e "GET /~aviv/index.html HTTP/1.1" | netcat csmidn.academy.usna.edu 80 HTTP/1.1 400 Bad Request Date: Wed, 18 Apr 2018 13:24:59 GMT Server: Apache/2.4.18 (Ubuntu) Content-Length: 315 Connection: close Content-Type: text/html; charset=iso-8859-1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>400 Bad Request</title> </head><body> <h1>Bad Request</h1> <p>Your browser sent a request that this server could not understand.<br /> </p> <hr> <address>Apache/2.4.18 (Ubuntu) Server at csmidn.academy.usna.edu Port 80</address> </body></html>
Other codes exist for errors, such as:
const char ERROR_300[]="HTTP/1.1 300 Multiple Choices\n"; const char ERROR_301[]="HTTP/1.1 301 Moved Permanently\n"; const char ERROR_400[]="HTTP/1.1 400 Bad Request\n"; const char ERROR_403[]="HTTP/1.1 403 Forbidden\n"; const char ERROR_404[]="HTTP/1.1 404 Not Found\n"; const char ERROR_500[]="HTTP/1.1 500 Internal Server Error\n";
Based on a response, it is easy to check an error code knowing that that code number occurs 9 bytes into the response.
if(! strncmp(response+9,"300",3)){ fprintf(stderr, "%s", ERROR_300); exit(1); }
If the code is a success (200), then the rest of the document request follows the HTTP headers. In your lab, you'll write those out to the file.
Task : mywget
For this task, change into the mywget
directory, where you will
find two files:
mywget.c
:myget
source file which you will completeMakefile
: to compile your program
Your goal is to complete the mywget
program so that it meets the
following specification.
- Given a domain name or IP address, perform a get request for the specified path by connecting to the server on port 80 (or on the other port provided)
- If the file exist at the path, save a copy of the file to the
basename
of the file. - If the server reports an error code, report the appropriate error code.
- Report other errors with sysetm calls, such as domain not exists and what not
An example, request of files on yog.cs.usna.edu
aviv@saddleback: mywget $ ./mywget
ERROR: Require domain and path
mywget domain path [port]
connect to the web server at domain and port, if provided, and request
the file at path. If the file exist, save the file based on the
filename of value in the path
If domain is not reachable, report error
aviv@saddleback: mywget $ ./mywget csmidn.academy.usna.edu /~aviv/index.html
HTTP/1.1 200 OK
aviv@saddleback: mywget $ cat index.html
HTTP/1.1 200 OK
Date: Tue, 14 Mar 2017 20:26:53 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Tue, 14 Mar 2017 20:26:12 GMT
ETag: "29-54ab6a390412c"
Accept-Ranges: bytes
Content-Length: 41
Connection: close
Content-Type: text/html
<html>
<h1> You did it! </h1>
</html>
The reason the data gets saved to index.html
is that its the
basename of the path, or the last entry in the path
/~aviv/index.html
. For example, if we were retrieving a different
file, e.g., /~aviv/saturn.txt
it would be saved to saturn.txt
:
aviv@saddleback: mywget $ ./mywget csmidn.academy.usna.edu /~aviv/saturn.txt
HTTP/1.1 200 OK
aviv@saddleback: mywget $ cat saturn.txt
HTTP/1.1 200 OK
Date: Tue, 14 Mar 2017 20:29:08 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Tue, 14 Mar 2017 20:28:56 GMT
ETag: "1e2-54ab6ad5b3211"
Accept-Ranges: bytes
Content-Length: 482
Vary: Accept-Encoding
Connection: close
Content-Type: text/plain
_.oo.
_.u[[/;:,. .odMMMMMM'
.o888UU[[[/;:-. .o@P^ MMM^
oN88888UU[[[/;::-. dP^
dNMMNN888UU[[[/;:--. .o@P^
,MMMMMMN888UU[[/;::-. o@^
NNMMMNN888UU[[[/~.o@P^
888888888UU[[[/o@^-..
oI8888UU[[[/o@P^:--..
.@^ YUU[[[/o@^;::---..
oMP ^/o@P^;:::---..
.dMMM .o@^ ^;::---...
dMMMMMMM@^` `^^^^
YMMMUP^
^^
You can retrieve the basename of a path with the library function
basename()
(see section 3 of the man page).
Finally, you should be able to detect a variety of error codes and error conditions:
aviv@saddleback: mywget $ ./mywget badbabdabda /index.html
ERROR: getaddrinfo: Name or service not known: badbabdabda
aviv@saddleback: mywget $ ./mywget badbabdabda /index.html
ERROR: getaddrinfo: Name or service not known: badbabdabda
aviv@saddleback: mywget $ ./mywget yog.cs.usna.edu /doesnotexist.html
HTTP/1.1 404 Not Found
aviv@saddleback: mywget $ ./mywget yog.cs.usna.edu /i
HTTP/1.1 300 Multiple Choices
Extra Credit (5 points): Add functionality so that you can use your
mywget
to connect to services on other ports and save to a given
file. For example:
aviv@saddleback: mywget $ ./mywget csfaculty.academy.usna.edu batman.txt 6666
aviv@saddleback: mywget $ cat batman.txt
MMMMMMMMMMMMMMMMMMMMM. MMMMMMMMMMMMMMMMMMMMM
`MMMMMMMMMMMMMMMMMMMM M\ /M MMMMMMMMMMMMMMMMMMMM'
`MMMMMMMMMMMMMMMMMMM MMMMMM MMMMMMMMMMMMMMMMMMM'
MMMMMMMMMMMMMMMMMMM-_______MMMMMMMM_______-MMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
.MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM.
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
`MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM'
`MMMMMMMMMMMMMMMMMM'
`MMMMMMMMMM'
MMMMMM
MMMM
MM
aviv@saddleback: mywget $ ./mywget csfaculty.academy.usna.edu foobar.txt 6666
aviv@saddleback: mywget $ cat foobar.txt
MMMMMMMMMMMMMMMMMMMMM. MMMMMMMMMMMMMMMMMMMMM
`MMMMMMMMMMMMMMMMMMMM M\ /M MMMMMMMMMMMMMMMMMMMM'
`MMMMMMMMMMMMMMMMMMM MMMMMM MMMMMMMMMMMMMMMMMMM'
MMMMMMMMMMMMMMMMMMM-_______MMMMMMMM_______-MMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
.MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM.
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
`MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM'
`MMMMMMMMMMMMMMMMMM'
`MMMMMMMMMM'
MMMMMM
MMMM
MM
Think about how you might know when data has completed being sent (perhaps, less data is sent then you expected?) To qualify for extra credit, both the extra credit and the other version should work. You will need to take a different action depending on the command line arguments.