Lab 06: Fork-Exec-Wait, Repeat!
Table of Contents
Preliminaries
In this lab you will complete a set of C programs that will expose you to termination status, fork-exec-wait loops, and tokenizing strings. There are 3 tasks, you will likely complete only task 1 in lab, and begin with task 2. You will need to finish the remaining taks outside of lab.
Lab Learning Goals
In this lab, you will learn the following topics and practice C programming skills.
fork()
,exec
,wait()
- termination status,
IFEXITED()
,IFSIGNALED()
- Timing execution
- String tokenization
- Constructing
argv
arrays
Lab Setup
Run the following command
~aviv/bin/ic221-up
Change into the lab directory
cd ~/ic221/labs/06
All the material you need to complete the lab can be found in the lab directory. All material you will submit, you should place within the lab directory. Throughout this lab, we refer to the lab directory, which you should interpret as the above path.
Submission Folder
For this lab, all ubmission should be placed in the following folder:
~/ic221/labs/06/
This directory contains 4 sub-directories; examples
, timer
,
term-status
, and mini-sh
. In the examples
directory you will
find any source code in this lab document. All lab work should be
done in the remaining directories.
- Only source files found in the folder will be graded.
- Do not change the names of any source files
Finally, in the top level of the lab directory, you will find a
README
file. You must complete the README
file, and include any
additional details that might be needed to complete this lab.
Compiling your programs with clang
and make
You are required to provide your own Makefiles for this lab. Each of
the source folders, timer
, mini-sh
, and XXXX, must have a
Makefile. We should be able to compile your programs by typing
make
in each source directory. The following executables should be
generated:
foursons
: executable forterm-status
directorytimer
: execute fortimer
directorymini-sh
: execute formini-sh
directory
When compiling mini-sh
, you will need to link against the
readline
library. We have provided you an example of this
compilation process for compiling token-sh
README
In the top level of the lab directory, you will find a README
file. You must fill out the README file with your name and alpha.
Please include a short summary of each of the tasks and any other
information you want to provide to the instructor.
Test Script
You are provided a test script which prints pass/fail information
for a set of tests for your programs. Note that passing all the
tests does not mean you will receive a perfect score: other tests
will be performed on your submission. To run the test script,
execute test.sh
from the lab directory.
./test.sh
You can comment out individual tests while working on different parts of the lab. Open up the test script and place comments at the bottom where appropriate:
#**************************** # Comment out tests you aren't working on #*************************** test_term-status test_timer test_mini-sh test_makefile
Part 1: Children, Parents and Process ID
In the last lesson, we briefly discussed how a program loads into a
process. We use the exec
family of system calls, which will replace
the current program image with the one provided to exec
. What we
failed to discuss was how to create a new process. To do that, we must
fork()
.
fork()
With the exception of two O.S. processes, the kernel and init process,
all process are spawned from another process. The procedure is called
forking: An exact copy of the process, memory values and open
resources, is produced. The original process that forked, is called
the parent, while the newly created, duplicate process is called the
child. Let's look at an example of a process forking using the
fork()
system call:
#include <unistd.h> #include <stdio.h> #include <stdlib.h> int main(){ pid_t c_pid; c_pid = fork(); //duplicate if( c_pid == 0 ){ //child: The return of fork() is zero printf("Child: I'm the child: %d\n", c_pid); }else if (c_pid > 0){ //parent: The return of fork() is the process of id of the child printf("Parent: I'm the parent: %d\n", c_pid); }else{ //error: The return of fork() is negative perror("fork failed"); _exit(2); //exit failure, hard } return 0; //success }
The fork()
system call is unlike any other function call you've seen
so far. It returns twice, once in the parent and once in child, and it
returns different values in the parent and the child. To follow the
logic, you first need to realize that once fork()
is called, the
Operating System is creating a whole new process which is an exact
copy of the original process. At this point, fork()
still hasen't
returned because the O.S. is context switched in, and now it must
return twice, once in the child process and once in the parent
The parent and child are identified and separated logically using a
process id (or pid), a unique identifier assigned by the O.S. to
distinguish different processes. After a call to fork()
, the
parent's return value is the process id of the newly created child
process. The child, however, has a return value of 0. On error,
fork()
, returns -1. Then you should bail with _exit()
because
something terrible happened.
One nice way to see a visual of the parent process relationship is
using the bash command pstree
:
#> pstree -ah init ├─NetworkManager │ ├─dhclient -d -4 -sf /usr/lib/NetworkManager/nm-dhcp-client.action -pf /var/run/sendsigs.omit.d/network-manager.dhclient-eth0.pid -lf... │ └─2*[{NetworkManager}] ├─accounts-daemon │ └─{accounts-daemon} (...)
At the top is the init
process, which is the parent of all
proces. Somewhere down tree is my login shell
(...) ├─sshd -D │ └─sshd │ └─sshd │ └─bash │ ├─emacs get_exitstatus.c │ ├─emacs foursons.c │ ├─emacs Makefile │ ├─emacs get_exitstatus.c │ ├─emacs fork_exec_wait.c │ ├─emacs mail_reports.py │ └─pstree -ah (...)
And you can see that the process of getting to a bash
shell via ssh
requires a number of forks and child process.
getpid()
and getppid()
In the above example, with fork()
, the parent can learn the process
id of the child, but the child doesn't know its own process id (or
pid) after the fork nor does it know its parents process id. For
that matter, the parent doesn't know its own process id either. There
are two system calls to retrieve this information:
//retrieve the current process id pid_t getpid(void); //retrieve the parent's process id pid_t getppid(void);
There is no way for a process to directly retrieve its child pid
because any process may have multiple children. Instad, a process must
maintain that information directly since the child's pid is returned
from a fork()
. Here is a sample program that prints the current
process id and the parent's process id.
#include <unistd.h> #include <stdio.h> #include <stdlib.h> int main(){ pid_t pid, ppid; //get the process'es pid pid = getpid(); //get the parrent of this process'es pid ppid = getppid(); printf("My pid is: %d\n",pid); printf("My parent's pid is %d\n", ppid); return 0; }
If we run this program a bunch of times, we will see output like this:
#> ./get_pid_ppid My pid is: 14307 My parent's pid is 13790 #> ./get_pid_ppid My pid is: 14308 My parent's pid is 13790 #> ./get_pid_ppid My pid is: 14309 My parent's pid is 13790
Every time the program runs, it has a different process id (or
pid). Every process must have a unique pid, and the O.S. applies a
policy for reusing process id's as processes terminate. But, the
parent's pid is the same. If you think for a second, this makes sense:
What's the parent of the program? The shell! We can see this by
echo
'ing $$
, which is special bash variable that stores the pid of
the shell:
#> echo $$ 13790
Whenever you execute a program on the shell, what's really going on is
the shell is forking, and the new child is exec
'ing the new
program. One thing to consider, though, is that when a process forks,
the parent and the child continue executing in parallel: Why doesn't
the shell come back immediately and ask the user to enter a new
command? The shell instead waits for the child to finish process
before prompting again, and there is a system call called wait()
to
just do that.
Waiting on a child with wait()
The wait()
system call is used by a parent process to wait for the
status of the child to change. A status change can occur for a number
of reasons, the program stopped or continued, but we'll only concern
ourselves with the most common status change: the program terminated
or exited.
#include <sys/types.h> #include <sys/wait.h> pid_t wait(int *status);
wait()
requires you include the sys/types.h
and sys/wait.h
header files. It returns the pid of the child process that terminated
(or -1 if the process has no children), and wait()
takes an integer
pointer as an argument. At that memory address, it will set the
termination status of the child process. As mentioned in the
previous lesson, part of the termination status is the exit status,
but it also contains other information for how a program terminated,
like if it had a SEGFAULT
.
To learn about the exit status of a program we can use the macros from
sys/wait.h
which check the termination status and return the exit
status. From the main page:
WIFEXITED(status) returns true if the child terminated normally, that is, by calling exit(3) or _exit(2), or by returning from main(). WEXITSTATUS(status) returns the exit status of the child. This consists of the least significant 8 bits of the status argument that the child specified in a call to exit(3) or _exit(2) or as the argument for a return statement in main(). This macro should only be employed if WIFEXITED returned true.
There are other checks of the termination status, and refer to the manual page for more detail. Below is some example code for checking the exit status of forked child. You can see that the child delays its exit by 2 seconds with a call to sleep.
#include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/wait.h> int main(){ pid_t c_pid, pid; int status; c_pid = fork(); //duplicate if( c_pid == 0 ){ //child pid = getpid(); printf("Child: %d: I'm the child\n", pid, c_pid); printf("Child: sleeping for 2-seconds, then exiting with status 12\n"); //sleep for 2 seconds sleep(2); //exit with statys 12 exit(12); }else if (c_pid > 0){ //parent //waiting for child to terminate pid = wait(&status); if ( WIFEXITED(status) ){ printf("Parent: Child exited with status: %d\n", WEXITSTATUS(status)); } }else{ //error: The return of fork() is negative perror("fork failed"); _exit(2); //exit failure, hard } return 0; //success }
Task 1 foursons
For this task, change into the term-status
directory from the lab
directory. In there, you'll find a program named foursons.c
,
which is an analogy of The Four Sons of the Jewish festival of
Passover, in computer science terms. The program will fork four
child process, each terminating in a different way:
- The Wise Son: sleeps for 1 second and the exits with status 16
- The Simple Son: derferences NULL and causes a segfault
- The Wicked Son: sends itself a terminating
SIGABRT
signal - The Son Who Doesn't Know How To Ask Questions: Just exits with status 0
Your task is to write the wait()
routine for the parent, which
will wait for each child, print out the child number and the process
id, and also check the termination status of the child. The expected
output should look something like this:
Child 0: 15030: Sleeping 1 second, Exiting status 16
Child 1: 15031: I'm going to try and dereference a NULL pointer, I wonder what happens?!?
Child 2: 15032: I'm going to shoot myself with a SIGABRT because I can
Child 3: 15033: Ummm ....
Parent: Child 3: 15033 terminated and exited with status 0
Parent: Child 1: 15031 terminated and exited due to signal Segmentation fault and also core dumped
Parent: Child 2: 15032 terminated and exited due to signal Aborted and also core dumped
Parent: Child 0: 15030 terminated and exited with status 16
Hints: Open up the man page for wait()
and you'll find the
following macros, which you will need to use
WIFEXITED(status)
: returns true if child terminated due to exit or return from mainWEXITSTATUS(status)
: returns the exit status numberWIFSIGNALED(status)
: returns true if the child terminated due to termination signal, like SEGFAULT.WTERMSIG(stuats)
: returns the signal number that caused the termination.
To print the signal number, use the library function strsignal()
from string.h
. You can use it like so print the signal
information:
printf("Signal: %s", strsignal( WTERMSIG(status) ) );
Part 2: Timing the Execution of a Process
Now that you have a basic sense of how to fork()
and also wait()
for a process to complete, let's connect the dots and actually have
the child process do something interesting, like execute another
program.
execv()
and execvp()
Recall from that last lesson how we can use exec
family of system
calls to execute a running process. Here is that code to run ls
with some arguments as a refresher:
#include <unistd.h> #include <stdlib.h> #include <stdio.h> int main(int argc, char * argv[]){ //arguments for ls, will run: ls -l /bin char * ls_args[4] = { "/bin/ls", "-l", "/bin", NULL} ; //execute ls execv( ls_args[0], ls_args); //only get here if exec failed perror("execv failed"); return 2; //return error status }
In the example, we use the execv()
system call, which taks an
argv
array of argument strings, with a null terminator. We can
visualize this argument array like so:
.-----. ls_args -> | .--+--> "/bin/ls" |-----| | .--+--> "-l" |-----| | .--+--> "/bin" |-----| | .--+--> NULL '-----'
One thing you might notice is that you use the full path to the
ls
program, rather then just the name of the program. Recall from
our discussion of bash
that when we type a command, like ls
, we
actually search the bin
directories for that program. This is
described as searching the path. While execv()
does not search
the path, execvp()
does, and you can go ahead and use that
version of exec
from now on. The above program changes like such:
#include <unistd.h> #include <stdlib.h> #include <stdio.h> int main(int argc, char * argv[]){ //arguments for ls, will run: ls -l /bin char * ls_args[4] = { "ls", "-l", "/bin", NULL} ; // ^ // | //execute ls execvp( ls_args[0], ls_args); //only get here if exec failed perror("execvp failed"); return 2; //return error status }
Fork-Exec-Wait
While it is reasonable to call exec
from a program, it is usually
done after a fork. In this way, the child process then executes the
code of the exec
's process. The parent can then wait for the
child to finish. For example, here's how we might run ls
from the
child and wait for it to finish in the parent:
#include <unistd.h> #include <stdlib.h> #include <stdio.h> #include <sys/types.h> #include <sys/wait.h> int main(int argc, char * argv[]){ //arguments for ls, will run: ls -l /bin char * ls_args[4] = { "ls", "-l", "/bin", NULL} ; pid_t c_pid, pid; int status; c_pid = fork(); if (c_pid == 0){ /* CHILD */ printf("Child: executing ls\n"); //execute ls execvp( ls_args[0], ls_args); //only get here if exec failed perror("execve failed"); }else if (c_pid > 0){ /* PARENT */ if( (pid = wait(&status)) < 0){ perror("wait"); _exit(1); } printf("Parent: finished\n"); }else{ perror("fork failed"); _exit(1); } return 0; //return success }
Task 2 timer
In this task, change into the timer
directory from the lab
directory. In there, you will find a source file timer.c
, and
your task is to complete the program. The timer
program will take
another program as command line arguments, it will then fork and
execute that program, record the amount of time it takes to
execute, and then print the result afterwards.
You should use the gettimeofday()
system call to retrieve the
current time since programs often execute at the milisecond
level. To make the subtraction of struct timeval
's more sensible,
I have provide you with a function (from the gnu
website) which,
given two time val's, will take the difference and store the result
in result
. You should use the following print format to output
the resulting timestamp:
printf("Run Time: %ld.%04ld (s)\n", diff.tv_sec, diff.tv_usec/1000);
Here is a sample run of the timer program, and, of course, runtimes
will vary based on your computer performance. Using sleep
as a
baseline is a good test.
#> ./timer ls
Makefile timer timer.c timer.c~
Run Time: 0.0001 (s)
#> ./timer sleep 2
Run Time: 2.0001 (s)
#> ./timer sleep 5
Run Time: 5.0002 (s)
#> ./timer find ..
..
../mini-sh
../mini-sh/mini-sh.c
../mini-sh/Makefile
../timer
../timer/Makefile
../timer/timer.c~
../timer/timer.c
../timer/timer
../term-status
../term-status/Makefile
../term-status/foursons.c~
../term-status/foursons.c
Run Time: 0.0010 (s)
Hint: You will need to construct an argv
array for exec
using
the command line arguments to timer
program. While this might
seem challenging at first, consider that the difference between the
two argv
's is just one index. For example, consider the argv
for the last run of timer
above:
.-----.
argv -> | .--+--> "./timer"
|-----|
| .--+--> "find"
|-----|
| .--+--> ".."
|-----|
| .--+--> NULL
'-----'
Why not just set the argv
to exec
to start one index down using
pointer arithmetic? Then you have exactly what you need.
.-----.
| .--+--> "./timer"
|-----|
argv+1 -> | .--+--> "find"
|-----|
| .--+--> ".."
|-----|
| .--+--> NULL
'-----'
Part 3: A mini-shell
Believe it or not, you now have all the pieces necessary to implement
a very basic shell. Think about it: A shell is just a fork-exec-wait
loop. It prompts the user for input, then forks, tries to execute the
input as a command, and then waits for that command to finish. The
challenging part is constructing the argv
array needed for exec
based on the input.
String Tokenizer and Constructing an argv[]
To construct an argv
array from an arbitrary string, we need to
first split the string up based on a separator, such as
whitespace. In C, this process is called string tokenization. The
string library has a function strtok()
to perform the
tokenization, but it can be a little cumbersome.
Here is some sample code to start with:
//retrieve first token from line, seperated using " " tok = strtok(line, " "); i = 0; printf("%d: %s\n", i, tok); //continue to retrieve tokens until NULL returned while( (tok = strtok(NULL, " ")) != NULL){ i++; printf("%d: %s\n", i, tok); }
Upon the first call to strtok()
, you provide the string to be
tokenized and the separator. In this case, that's the variable
line
and the separator is " ". This will return the first token
in line
. To continue to tokenize the same line, subsequent calls
to strtok()
take NULL
for the string but still take the
separator as the argument. You can keep retrieving tokens in this
way until no more are available, and then NULL
is returned.
In the mini-sh
directory, we've provided you with the token-sh
program that can help guide you through tokenization. The challenge
for you is to save each token in a argv
array that you can use
in exec
.
Task 3 mini-sh
For this task, you will write a mini-shell, mini-sh
, that will
continually prompt the user for a command, execute that command,
timing the length of execution, and including that length in the
next prompt. It is very much timer
program in a loop, but now you
have to build an argv
.
To get started, change into the mini-sh
directory and open the
mini-sh.c
program. You'll find that the looping and prompting has
been provided for you. Your task is to complete the following parts:
- Tokenize
line
to construct anargv
, stored incmd_argv
. Note thatcmd_argv
is declared with 256 slots. So you can only support commands with at most 256 arguments. - Fork-exec-wait result and record the time execution in
diff
. You should useexecvp
to execute in the child, and you shouldwait()
in the parent before continuing the loop.
Finally, be very careful to always call _exit()
after an exec
in the child. The exec
may fail, for example, because the command
doesn't exist. In thtose case, it's very important to for the child
to exit immediately, otherwise, you will now have 2 shells, one
for the parent and one for the child. Then the next time through
the loop, you'll have 3 shells, and so on.