IC221: Systems Programming (SP14)


Home Policy Calendar Syllabus Resources Piazza

Lab 06: Fork-Exec-Wait, Repeat!

Table of Contents

Preliminaries

In this lab you will complete a set of C programs that will expose you to termination status, fork-exec-wait loops, and tokenizing strings. There are 3 tasks, you will likely complete only task 1 in lab, and begin with task 2. You will need to finish the remaining taks outside of lab.

Lab Learning Goals

In this lab, you will learn the following topics and practice C programming skills.

  1. fork(), exec, wait()
  2. termination status, IFEXITED(), IFSIGNALED()
  3. Timing execution
  4. String tokenization
  5. Constructing argv arrays

Lab Setup

Run the following command

~aviv/bin/ic221-up

Change into the lab directory

cd ~/ic221/labs/06

All the material you need to complete the lab can be found in the lab directory. All material you will submit, you should place within the lab directory. Throughout this lab, we refer to the lab directory, which you should interpret as the above path.

Submission Folder

For this lab, all ubmission should be placed in the following folder:

~/ic221/labs/06/

This directory contains 4 sub-directories; examples, timer, term-status, and mini-sh. In the examples directory you will find any source code in this lab document. All lab work should be done in the remaining directories.

  • Only source files found in the folder will be graded.
  • Do not change the names of any source files

Finally, in the top level of the lab directory, you will find a README file. You must complete the README file, and include any additional details that might be needed to complete this lab.

Compiling your programs with clang and make

You are required to provide your own Makefiles for this lab. Each of the source folders, timer, mini-sh, and XXXX, must have a Makefile. We should be able to compile your programs by typing make in each source directory. The following executables should be generated:

  • foursons : executable for term-status directory
  • timer : execute for timer directory
  • mini-sh : execute for mini-sh directory

When compiling mini-sh, you will need to link against the readline library. We have provided you an example of this compilation process for compiling token-sh

README

In the top level of the lab directory, you will find a README file. You must fill out the README file with your name and alpha. Please include a short summary of each of the tasks and any other information you want to provide to the instructor.

Test Script

You are provided a test script which prints pass/fail information for a set of tests for your programs. Note that passing all the tests does not mean you will receive a perfect score: other tests will be performed on your submission. To run the test script, execute test.sh from the lab directory.

./test.sh

You can comment out individual tests while working on different parts of the lab. Open up the test script and place comments at the bottom where appropriate:

#****************************
# Comment out tests you aren't working on
#***************************
test_term-status
test_timer
test_mini-sh
test_makefile

Part 1: Children, Parents and Process ID

In the last lesson, we briefly discussed how a program loads into a process. We use the exec family of system calls, which will replace the current program image with the one provided to exec. What we failed to discuss was how to create a new process. To do that, we must fork().

fork()

With the exception of two O.S. processes, the kernel and init process, all process are spawned from another process. The procedure is called forking: An exact copy of the process, memory values and open resources, is produced. The original process that forked, is called the parent, while the newly created, duplicate process is called the child. Let's look at an example of a process forking using the fork() system call:

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

int main(){

  pid_t c_pid;


  c_pid = fork(); //duplicate                                                                                                                                                


  if( c_pid == 0 ){
    //child: The return of fork() is zero                                                                                                                                    

    printf("Child: I'm the child: %d\n", c_pid);

  }else if (c_pid > 0){
    //parent: The return of fork() is the process of id of the child                                                                                                         

    printf("Parent: I'm the parent: %d\n", c_pid);

  }else{
    //error: The return of fork() is negative                                                                                                                                

    perror("fork failed");
    _exit(2); //exit failure, hard                                                                                                                                           

  }

  return 0; //success                                                                                                                                                        
}

The fork() system call is unlike any other function call you've seen so far. It returns twice, once in the parent and once in child, and it returns different values in the parent and the child. To follow the logic, you first need to realize that once fork() is called, the Operating System is creating a whole new process which is an exact copy of the original process. At this point, fork() still hasen't returned because the O.S. is context switched in, and now it must return twice, once in the child process and once in the parent

The parent and child are identified and separated logically using a process id (or pid), a unique identifier assigned by the O.S. to distinguish different processes. After a call to fork(), the parent's return value is the process id of the newly created child process. The child, however, has a return value of 0. On error, fork(), returns -1. Then you should bail with _exit() because something terrible happened.

One nice way to see a visual of the parent process relationship is using the bash command pstree:

#> pstree -ah
init
  ├─NetworkManager
  │   ├─dhclient -d -4 -sf /usr/lib/NetworkManager/nm-dhcp-client.action -pf /var/run/sendsigs.omit.d/network-manager.dhclient-eth0.pid -lf...
  │   └─2*[{NetworkManager}]
  ├─accounts-daemon
  │   └─{accounts-daemon}
(...)

At the top is the init process, which is the parent of all proces. Somewhere down tree is my login shell

(...)
├─sshd -D
  │   └─sshd   
  │       └─sshd    
  │           └─bash
  │               ├─emacs get_exitstatus.c
  │               ├─emacs foursons.c
  │               ├─emacs Makefile
  │               ├─emacs get_exitstatus.c
  │               ├─emacs fork_exec_wait.c
  │               ├─emacs mail_reports.py
  │               └─pstree -ah
(...)

And you can see that the process of getting to a bash shell via ssh requires a number of forks and child process.

getpid() and getppid()

In the above example, with fork(), the parent can learn the process id of the child, but the child doesn't know its own process id (or pid) after the fork nor does it know its parents process id. For that matter, the parent doesn't know its own process id either. There are two system calls to retrieve this information:

//retrieve the current process id
pid_t getpid(void);

//retrieve the parent's process id
pid_t getppid(void);

There is no way for a process to directly retrieve its child pid because any process may have multiple children. Instad, a process must maintain that information directly since the child's pid is returned from a fork(). Here is a sample program that prints the current process id and the parent's process id.

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

int main(){
  pid_t pid, ppid;

  //get the process'es pid
  pid = getpid();

  //get the parrent of this process'es pid
  ppid = getppid();


  printf("My pid is: %d\n",pid);
  printf("My parent's pid is %d\n", ppid);

  return 0;
}

If we run this program a bunch of times, we will see output like this:

#> ./get_pid_ppid 
My pid is: 14307
My parent's pid is 13790

#> ./get_pid_ppid 
My pid is: 14308
My parent's pid is 13790

#> ./get_pid_ppid 
My pid is: 14309
My parent's pid is 13790

Every time the program runs, it has a different process id (or pid). Every process must have a unique pid, and the O.S. applies a policy for reusing process id's as processes terminate. But, the parent's pid is the same. If you think for a second, this makes sense: What's the parent of the program? The shell! We can see this by echo'ing $$, which is special bash variable that stores the pid of the shell:

#> echo $$
13790

Whenever you execute a program on the shell, what's really going on is the shell is forking, and the new child is exec'ing the new program. One thing to consider, though, is that when a process forks, the parent and the child continue executing in parallel: Why doesn't the shell come back immediately and ask the user to enter a new command? The shell instead waits for the child to finish process before prompting again, and there is a system call called wait() to just do that.

Waiting on a child with wait()

The wait() system call is used by a parent process to wait for the status of the child to change. A status change can occur for a number of reasons, the program stopped or continued, but we'll only concern ourselves with the most common status change: the program terminated or exited.

#include <sys/types.h>
#include <sys/wait.h>

pid_t wait(int *status);

wait() requires you include the sys/types.h and sys/wait.h header files. It returns the pid of the child process that terminated (or -1 if the process has no children), and wait() takes an integer pointer as an argument. At that memory address, it will set the termination status of the child process. As mentioned in the previous lesson, part of the termination status is the exit status, but it also contains other information for how a program terminated, like if it had a SEGFAULT.

To learn about the exit status of a program we can use the macros from sys/wait.h which check the termination status and return the exit status. From the main page:

WIFEXITED(status)
       returns true if the child terminated normally, that is, 
       by calling exit(3) or _exit(2), or by returning from main().

WEXITSTATUS(status)
       returns  the  exit  status of the child.  This consists of the least significant 
       8 bits of the status argument that the child specified in a call to exit(3) or _exit(2) or as the
       argument for a return statement in main().  
       This macro should only be employed if WIFEXITED returned true.

There are other checks of the termination status, and refer to the manual page for more detail. Below is some example code for checking the exit status of forked child. You can see that the child delays its exit by 2 seconds with a call to sleep.

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

#include <sys/types.h>
#include <sys/wait.h>

int main(){

  pid_t c_pid, pid;
  int status;

  c_pid = fork(); //duplicate                                                                                                                                                


  if( c_pid == 0 ){
    //child
    pid = getpid();

    printf("Child: %d: I'm the child\n", pid, c_pid);
    printf("Child: sleeping for 2-seconds, then exiting with status 12\n");

    //sleep for 2 seconds
    sleep(2);

    //exit with statys 12
    exit(12);

  }else if (c_pid > 0){
    //parent

    //waiting for child to terminate
    pid = wait(&status);

    if ( WIFEXITED(status) ){
      printf("Parent: Child exited with status: %d\n", WEXITSTATUS(status));
    }


  }else{
    //error: The return of fork() is negative                                                                                                                                

    perror("fork failed");
    _exit(2); //exit failure, hard                                                                                                                                           

  }

  return 0; //success                                                                                                                                                        
}

Task 1 foursons

For this task, change into the term-status directory from the lab directory. In there, you'll find a program named foursons.c, which is an analogy of The Four Sons of the Jewish festival of Passover, in computer science terms. The program will fork four child process, each terminating in a different way:

  1. The Wise Son: sleeps for 1 second and the exits with status 16
  2. The Simple Son: derferences NULL and causes a segfault
  3. The Wicked Son: sends itself a terminating SIGABRT signal
  4. The Son Who Doesn't Know How To Ask Questions: Just exits with status 0

Your task is to write the wait() routine for the parent, which will wait for each child, print out the child number and the process id, and also check the termination status of the child. The expected output should look something like this:

Child 0: 15030: Sleeping 1 second, Exiting status 16
Child 1: 15031: I'm going to try and dereference a NULL pointer, I wonder what happens?!? 
Child 2: 15032: I'm going to shoot myself with a SIGABRT because I can
Child 3: 15033: Ummm ....
Parent: Child 3: 15033 terminated and exited with status 0
Parent: Child 1: 15031 terminated and exited due to signal Segmentation fault and also core dumped
Parent: Child 2: 15032 terminated and exited due to signal Aborted and also core dumped
Parent: Child 0: 15030 terminated and exited with status 16

Hints: Open up the man page for wait() and you'll find the following macros, which you will need to use

  • WIFEXITED(status) : returns true if child terminated due to exit or return from main
  • WEXITSTATUS(status) : returns the exit status number
  • WIFSIGNALED(status) : returns true if the child terminated due to termination signal, like SEGFAULT.
  • WTERMSIG(stuats) : returns the signal number that caused the termination.

To print the signal number, use the library function strsignal() from string.h. You can use it like so print the signal information:

printf("Signal: %s", strsignal( WTERMSIG(status) ) );

Part 2: Timing the Execution of a Process

Now that you have a basic sense of how to fork() and also wait() for a process to complete, let's connect the dots and actually have the child process do something interesting, like execute another program.

execv() and execvp()

Recall from that last lesson how we can use exec family of system calls to execute a running process. Here is that code to run ls with some arguments as a refresher:

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char * argv[]){

  //arguments for ls, will run: ls  -l /bin                                                                                                                                  
  char * ls_args[4] = { "/bin/ls", "-l", "/bin", NULL} ;

  //execute ls                                                                                                                                                               
  execv( ls_args[0], ls_args);

  //only get here if exec failed                                                                                                                                             
  perror("execv failed");

  return 2; //return error status                                                                                                                                            
}

In the example, we use the execv() system call, which taks an argv array of argument strings, with a null terminator. We can visualize this argument array like so:

            .-----.
ls_args ->  |  .--+--> "/bin/ls"
            |-----|
            |  .--+--> "-l"
            |-----|
            |  .--+--> "/bin"
            |-----|
            |  .--+--> NULL
            '-----'

One thing you might notice is that you use the full path to the ls program, rather then just the name of the program. Recall from our discussion of bash that when we type a command, like ls, we actually search the bin directories for that program. This is described as searching the path. While execv() does not search the path, execvp() does, and you can go ahead and use that version of exec from now on. The above program changes like such:

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char * argv[]){

  //arguments for ls, will run: ls  -l /bin                                                                                                                                  
  char * ls_args[4] = { "ls", "-l", "/bin", NULL} ;
  //                     ^
  //                     |

  //execute ls                                                                                                                                                               
  execvp( ls_args[0], ls_args);

  //only get here if exec failed                                                                                                                                             
  perror("execvp failed");

  return 2; //return error status                                                                                                                                            
}

Fork-Exec-Wait

While it is reasonable to call exec from a program, it is usually done after a fork. In this way, the child process then executes the code of the exec's process. The parent can then wait for the child to finish. For example, here's how we might run ls from the child and wait for it to finish in the parent:

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

#include <sys/types.h>
#include <sys/wait.h>

int main(int argc, char * argv[]){
  //arguments for ls, will run: ls  -l /bin                                                                                                                                  
  char * ls_args[4] = { "ls", "-l", "/bin", NULL} ;
  pid_t c_pid, pid;
  int status;

  c_pid = fork();

  if (c_pid == 0){
    /* CHILD */

    printf("Child: executing ls\n");

    //execute ls                                                                                                                                                               
    execvp( ls_args[0], ls_args);
    //only get here if exec failed                                                                                                                                             
    perror("execve failed");
  }else if (c_pid > 0){
    /* PARENT */

    if( (pid = wait(&status)) < 0){
      perror("wait");
      _exit(1);
    }

    printf("Parent: finished\n");

  }else{
    perror("fork failed");
    _exit(1);
  }

  return 0; //return success
}

Task 2 timer

In this task, change into the timer directory from the lab directory. In there, you will find a source file timer.c, and your task is to complete the program. The timer program will take another program as command line arguments, it will then fork and execute that program, record the amount of time it takes to execute, and then print the result afterwards.

You should use the gettimeofday() system call to retrieve the current time since programs often execute at the milisecond level. To make the subtraction of struct timeval's more sensible, I have provide you with a function (from the gnu website) which, given two time val's, will take the difference and store the result in result. You should use the following print format to output the resulting timestamp:

printf("Run Time: %ld.%04ld (s)\n", diff.tv_sec, diff.tv_usec/1000);

Here is a sample run of the timer program, and, of course, runtimes will vary based on your computer performance. Using sleep as a baseline is a good test.

#> ./timer ls
Makefile  timer  timer.c  timer.c~
Run Time: 0.0001 (s)
#> ./timer sleep 2
Run Time: 2.0001 (s)
#> ./timer sleep 5
Run Time: 5.0002 (s)
#> ./timer find ..
..
../mini-sh
../mini-sh/mini-sh.c
../mini-sh/Makefile
../timer
../timer/Makefile
../timer/timer.c~
../timer/timer.c
../timer/timer
../term-status
../term-status/Makefile
../term-status/foursons.c~
../term-status/foursons.c
Run Time: 0.0010 (s)

Hint: You will need to construct an argv array for exec using the command line arguments to timer program. While this might seem challenging at first, consider that the difference between the two argv's is just one index. For example, consider the argv for the last run of timer above:

         .-----.
argv ->  |  .--+--> "./timer"
         |-----|
         |  .--+--> "find"
         |-----|
         |  .--+--> ".."
         |-----|
         |  .--+--> NULL
         '-----'

Why not just set the argv to exec to start one index down using pointer arithmetic? Then you have exactly what you need.

           .-----.
           |  .--+--> "./timer"
           |-----|
argv+1 ->  |  .--+--> "find"
           |-----|
           |  .--+--> ".."
           |-----|
           |  .--+--> NULL
           '-----'

Part 3: A mini-shell

Believe it or not, you now have all the pieces necessary to implement a very basic shell. Think about it: A shell is just a fork-exec-wait loop. It prompts the user for input, then forks, tries to execute the input as a command, and then waits for that command to finish. The challenging part is constructing the argv array needed for exec based on the input.

String Tokenizer and Constructing an argv[]

To construct an argv array from an arbitrary string, we need to first split the string up based on a separator, such as whitespace. In C, this process is called string tokenization. The string library has a function strtok() to perform the tokenization, but it can be a little cumbersome.

Here is some sample code to start with:

//retrieve first token from line, seperated using " " 
   tok = strtok(line, " ");

   i = 0;
   printf("%d: %s\n", i, tok);

   //continue to retrieve tokens until NULL returned
   while( (tok = strtok(NULL, " ")) != NULL){
     i++;
     printf("%d: %s\n", i, tok);
   }

Upon the first call to strtok(), you provide the string to be tokenized and the separator. In this case, that's the variable line and the separator is " ". This will return the first token in line. To continue to tokenize the same line, subsequent calls to strtok() take NULL for the string but still take the separator as the argument. You can keep retrieving tokens in this way until no more are available, and then NULL is returned.

In the mini-sh directory, we've provided you with the token-sh program that can help guide you through tokenization. The challenge for you is to save each token in a argv array that you can use in exec.

Task 3 mini-sh

For this task, you will write a mini-shell, mini-sh, that will continually prompt the user for a command, execute that command, timing the length of execution, and including that length in the next prompt. It is very much timer program in a loop, but now you have to build an argv.

To get started, change into the mini-sh directory and open the mini-sh.c program. You'll find that the looping and prompting has been provided for you. Your task is to complete the following parts:

  1. Tokenize line to construct an argv, stored in cmd_argv. Note that cmd_argv is declared with 256 slots. So you can only support commands with at most 256 arguments.
  2. Fork-exec-wait result and record the time execution in diff. You should use execvp to execute in the child, and you should wait() in the parent before continuing the loop.

Finally, be very careful to always call _exit() after an exec in the child. The exec may fail, for example, because the command doesn't exist. In thtose case, it's very important to for the child to exit immediately, otherwise, you will now have 2 shells, one for the parent and one for the child. Then the next time through the loop, you'll have 3 shells, and so on.