Project 01: wc
: The Word Count Program
Table of Contents
Project Preliminaries
Project Learning Goals
The goal of this project are:
- To write C programs without framework
- Work with command line arguments
- Read from files and stdin to complete tasks
- Write functions that handle structured data
Project Grading and Due Date
This project is graded out of 100 points and is due on Mon. 20 Feb at 2359. Late submissions will not be allowed.
We will apply the following grading rubric to this project
- 60% Complete a
wc
such that it prints only the total number of words usingfscanf()
reading from a file specified on the command line. - 65% Complete a
wc
such that it prints the total number of words, characters, and lines usingfscanf()
reading from a file specified on the command line - 75% Complete a
wc
such that it prints the total number of words, characters, and lines usingfscanf()
reading from any number of files specified on the command line, reporting the total across all files at the end. - 85% Complete a
wc
such that it prints the total number of words, characters, and lines usingfgetc()
reading from any number of files specified on the command line, reporting the total across all files at the end. - 95% Complete a
wc
such that it prints the total number of words, characters, and lines based on command line arguments usingfgetc()
reading from any number of files specified on the command line, reporting the total across all files at the end. - 100% Complete a
wc
such that it prints the total number of words, characters, and lines based on command line arguments usingfgetc()
reading from any number of files or stdin on the command line, reporting the total across all files at the end.
Additional Requirements:
- You must provide a Manual Page like entry for your
program. Place your description in
man.txt
- You must provide a
README
file for your program describing tasks completed and processes used. This is also the place to provide additional details to your grader.
Project Setup:
- Run the following command in your terminal
~aviv/bin/ic221-up
- Then change into the following directory
cd ic221/proj/01
- You will find all the material you need to complete this lab in that directory.
- During the course of this lab, we will refer to the
ic221/proj/01
as the project directory
Project Submission
To submit this lab you will place all relevant content into your lab directory:
ic221/proj/01
Then issue the submission script
~aviv/bin/ic221-submit
Select the option for proj/01
, and
confirm. If you see SUCCESS
at the
end. You may submit multiple times up until the submission deadline.
Only your final submission will be considered for grading.
Sample Solution
You can find a working version of the program on the lab machines here:
~aviv/ic221-proj/wc
You can compare your solution to this one
Project Description
In this project, you will reimplement the command line wc
utility,
short for "word count." The wc
utility, as its name implies, is
used to count the number of words in a file, but it can also count
the number of characters and lines.
As an example, here is some sample output of using a 100% solution
of wc
:
$ ./wc dickens.txt dickens.txt 19202 161009 936251 $ ./wc A.txt A.txt 1 3960 201961 $ ./wc random.txt random.txt 69765 295983 6795742
The first number is the number of lines, the second is the number words, and the third is the number of characters. If multiple files are specified, the totals in each category are also reported along with individual file results:
$ ./wc dickens.txt A.txt dickens.txt 19202 161009 936251 A.txt 1 3960 201961 total 19203 164969 1138212 $ ./wc A.txt random.txt A.txt 1 3960 201961 random.txt 69765 295983 6795742 total 69766 299943 6997703
If no files are specified, then wc
should read from the stdn
:
$ cat dickens.txt | ./wc -stdin- 19202 161009 936251
Yes, you should indicate "-stdin-" for the filename in this case. A
user can also indicate that they wish to read from stdin
using the
+
symbol as a file name, like in the following:
$ cat dickens.txt | ./wc A.txt + A.txt A.txt 1 3960 201961 -stdin- 19202 161009 936251 A.txt 1 3960 201961 total 19204 168929 1340173
Finally, prior to the list of files, optional command line arguments can be provided to limit the output to just reporting the number of lines, number of words, or number of characters:
-l
print number of lines-w
print number of words-c
print number of characters
For example:
$ cat dickens.txt | ./wc -l A.txt + A.txt A.txt 1 -stdin- 19202 A.txt 1 total 19204 $ cat dickens.txt | ./wc -w A.txt + A.txt A.txt 3960 -stdin- 161009 A.txt 3960 total 168929 $ cat dickens.txt | ./wc -c A.txt + A.txt A.txt 201961 -stdin- 936251 A.txt 201961 total 1340173
Command line arguments can be combined as well, like so:
$ cat dickens.txt | ./wc -c -l A.txt + A.txt A.txt 1 201961 -stdin- 19202 936251 A.txt 1 201961 total 19204 1340173
But, the output is always reported in line, word, character order despite the order of the command lines.
How to count
There are two methods for how you can choose to count, however, for
full credit you must using fgetc()
. But, let's start with a
simpler method of using fscanf()
.
You can use fscanf()
to read a file multiple times with different
format characters to determine line, word, and character counts. For
words, your can use the "%s" format which will recognize word
boundaries, but you will need to specify a buffer large enough to
store the resulting word which may fail for large words (yes, we
will test with some odd files!). You could then read lines and chars
by using the "%c" format to count characters and detect newline
symbols.
A more efficient method is to use fgetc()
which reads from the
specified file one character a time. The challenge with this method
is then you need to a way to detect word boundaries. To do that, you
should employ the ctype.h
library and the isspace()
function. Using this method, you should be able to make a single
pass through a file and perform all your counts.
Parsing Command Lines Options
A full solution to this project must be able to handle command
lines. For this, you could use the getopt.h
library, or you can
program a simpler parsing routine. This is your choice.
The parsing requirements for command line options are as follows:
- Command line options must come before the list of files. If a command line option apears within the list of files, it is treated like a file name.
- Command line options must begin with a
-
(tack/hyphen). - You can report an error on unknown options.
- Once you reach the first command line argument without a
-
, you can assume the list of files have started.
Error Conditions
You are required to detect error on user provided input. There are two main categories:
- Unknown file name: You should report the error, but continue to proceed with processing remaining files.
- Unknown command line argument: You should report the error, and not continue and return.
ALL ERROR REPORTING MUST BE DONE to stderr
.
Here some examples of condition (1) errors:
$ ./wc doesnotexist.txt ERROR: file 'doesnotexist.txt' cannot be opened $ ./wc doesnotexist.txt 2> /dev/null #no output, since redirect /dev/null $ ./wc dickens.txt doesnotexist.txt dickens.txt dickens.txt 19202 161009 936251 ERROR: file 'doesnotexist.txt' cannot be opened dickens.txt 19202 161009 936251 total 38404 322018 1872502 $ ./wc dickens.txt doesnotexist.txt dickens.txt 2>/dev/null #error doesn't apear in list dickens.txt 19202 161009 936251 dickens.txt 19202 161009 936251 total 38404 322018 1872502
Here some examples of condition (2) errors:
$ ./wc -p dickens.txt ERROR: unkown option '-p' $ ./wc -p -l dickens.txt ERROR: unkown option '-p' $ ./wc -l -p dickens.txt ERROR: unkown option '-p'
Important, a -
by itself could be treated like a file name, so
could be consider it a condition (1) error.
$ ./wc - dickens.txt ERROR: file '-' cannot be opened dickens.txt 19202 161009 936251 total 19202 161009 936251
However, you could also consider it a condition (2) error and do a hard stop. You should choose what is natural for your program and stick with it.