IC221: Systems Programming (SP18)


Home Policy Calendar Units Assignments Resources

Lab 8: Pipeline Unrolling

Table of Contents

Preliminaries

In this lab you will complete a set of C programs that will expose you to the set of system calls that are used in shell pipelines, such as process grouping, pipes, and file descriptor duplication. There are 2 tasks, and you will likely complete neither of the tasks in lab, You will need to finish the remaining tasks outside of the lab period.

Lab Learning Goals

In this lab, you will learn the following topics and practice C programming skills.

  1. Unroll a pipeline with all children in the same process group
  2. Use the setpgid() and getpgid() system calls
  3. Basic pipeline parsing with strtok()
  4. Inter-process communication via pipe() and dup()
  5. Setting up a pre-parsed pipeline with inter-process communcation
  6. Use of preprocessor and #if for compilation

Lab Setup

Run the following command

~aviv/bin/ic221-up

Change into the lab directory

cd ~/ic221/lab/08

All the material you need to complete the lab can be found in the lab directory. All material you will submit, you should place within the lab directory. Throughout this lab, we refer to the lab directory, which you should interpret as the above path.

Submission Folder

For this lab, all ubmission should be placed in the following folder:

~/ic221/labs/08

In the top level of the lab directory, you will find a README file. You must complete the README file, and include any additional details that might be needed to complete this lab.

Compiling your programs with gcc and make

You are not required to provide your own Makefiles for this lab.

README

In the top level of the lab directory, you will find a README file. You must fill out the README file with your name and alpha. Please include a short summary of each of the tasks and any other information you want to provide to the instructor.

Test Script

You are provided a test script which prints pass/fail information for a set of tests for your programs. Note that passing all the tests does not mean you will receive a perfect score: other tests will be performed on your submission. To run the test script, execute test.sh from the lab directory.

./test.sh

You can comment out individual tests while working on different parts of the lab. Open up the test script and place comments at the bottom where appropriate.

PART 1: The Sleeping Pipeline

In the previous lessons, we discussed the following pipeline in detail:

sleep 10 | sleep 20 | sleep 30 | sleep 50

We identified that the entire pipeline runs as single job in the shell, and each part of the pipeline is a seperate process running in parallel within a process group. The process group is important for terminal signaling, and foreground process are identified based on process group.

In this part of the lab, you implement a standard pipeline unrolling, which is the process of parsing a pipeline based on | and forking each of the children in their own process group. The unrolling code will only concern itself with the individual forks, and not inter-process communication; that's the next part.

Task 1 sleep-unroll (40 points)

For this task, change into the unroll directory in the lab directory.In there, you will find the source file sleep-unroll.c, which you will complete. Your sleep-unroll program must meet the following specification:

  1. It must fork each part of the pipeline as its own process with proper arguments.
  2. All processes in the pipeline must be in their own process group identified by the pid of the first forked child process. This process group should be different than the parent's process group.
  3. Your program must be able to execute a set of sleep commands, at a mininum, but should be general enough to execute other commands.

A general outline of the algorithm you need to complete is provided in the source file. Parsing code for tokenizing a string based on "|" is provided, but you must complete the parsing of individual commands yourself. We recomend you use a do-while loop, which will simplify some process, but you may use other strategies if you like. Also, we recommend you take advantage of the 0 argument options to setpgid(), which can help simplify logic.

Below is some sample runs of the program to compare against. I use the command line tool time to demonstrate how long a pipeline should take; notice that the pipeline is specified as the first argument to sleep-unroll and must be quote designated:

#> ./sleep-unroll "sleep 1"
#> time ./sleep-unroll "sleep 1"

real	0m1.010s
user	0m0.003s
sys	0m0.005s
#> time ./sleep-unroll "sleep 1 | sleep 3"

real	0m3.011s
user	0m0.004s
sys	0m0.006s
#> time ./sleep-unroll "sleep 1 | sleep 3 | sleep 2"

real	0m3.012s
user	0m0.006s
sys	0m0.009s
#> time ./sleep-unroll "sleep 1 | BAD | sleep 2"
exec: No such file or directory

real	0m2.012s
user	0m0.004s
sys	0m0.008s
#> time ./sleep-unroll "head -c 5 /dev/zero"

real	0m0.013s
user	0m0.002s
sys	0m0.005s
#> ./sleep-unroll "sleep 200 | sleep 300 | sleep 400" &
[1] 17012
#> ps -o pid,pgid,args
  PID  PGID ARGS
17012 17012 ./sleep-unroll sleep 200 
17016 17016 sleep 200
17017 17016 sleep 300
17018 17016 sleep 400
(...)
#> fg
./sleep-unroll "sleep 200 | sleep 300 | sleep 400"
^C
#> ps -o pid,pgid,args
  PID  PGID ARGS
17016 17016 sleep 200
17017 17016 sleep 300
17018 17016 sleep 400
(...)
#> killall sleep
#> ps -o pid,pgid,args
  PID  PGID ARGS
(... NO sleep running ...)

PART 2: Baby Pipes

In this part of the lab, you will focus on the inter-process communication of the pipeline by unrolling different length pipelines of commands in a program called babypipe.

To complete this task, you will need to think about how to unroll a pipeline such that there exists a pipe between each of the processes. This requires a sequence of calls to pipe() to and dup2() for duplicating the write/read end of the pipes to stdin/stdout of each of the processes appropriately. This will require careful attention to the start, end and middle of the pipeline.

Here is some sample code demonstrating a one step pipeline, child to parent, with an exec. You should be able to adapt this code to complete the lab:

/*examples/one-step.c*/
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>

int main(){

  // cat sample-db | grep MD
  char * cmd1[] = {"cat", "sample-db.csv", NULL};
  char * cmd2[] = {"grep", "MD", NULL};

  int pipe_fd[2];
  int status;

  pid_t cpid1,cpid2;

  //open the pipe in parent
  if(pipe(pipe_fd) < 0){
    perror("pipe");
    _exit(1);
  }

  cpid1 = fork();
  if( cpid1 ==  0){
    /* CHILD 1*/

    //redirect output to the pipe
    close(pipe_fd[0]); //close read end of pipe    
    close(1);//close stdout
    dup2(pipe_fd[1], 1); //duplicate pipe to stdout

    //execute the comand
    execvp(cmd1[0], cmd1);
    perror("exec");
    _exit(1);
  }else if(cpid1 < 0){
    /* ERROR */
    perror("fork");
    _exit(1);
  }

  cpid2 = fork();
  if(cpid2 == 0){
    /* CHILD 2*/

    //redirect input from the pipe
    close(pipe_fd[1]); //close write end of pipe
    close(0); //close stdin
    dup2(pipe_fd[0],0); //duplicate pipe to stdin

    //execute the comand
    execvp(cmd2[0], cmd2);
    perror("exec");
    _exit(1);
  }else if(cpid2 < 0){
    /* ERROR */
    perror("fork");
    _exit(1);
  }

  /* PARENT */

  //widow pipe by closing write end of pipe
  close(pipe_fd[1]);

  //wait on all children
  while(wait(&status) > 0 );

  //sucess
  return 0;
}

The program above generates a one-step pipe line for the command:

cat sample-db | grep MD

And if you execute it, you get the desired result:

#>./one-step 
Kris,Marrier,King, Christopher A Esq,228 Runamuck Pl #2808,Baltimore,Baltimore City,MD,21224,410-655-8723,410-804-4694,kris@gmail.com,http://www.kingchristopheraesq.com
Ezekiel,Chui,Sider, Donald C Esq,2 Cedar Ave #84,Easton,Talbot,MD,21601,410-669-1642,410-235-8738,ezekiel@chui.com,http://www.siderdonaldcesq.com
Ilene,Eroman,Robinson, William J Esq,2853 S Central Expy,Glen Burnie,Anne Arundel,MD,21061,410-914-9018,410-937-4543,ilene.eroman@hotmail.com,http://www.robinsonwilliamjesq.com
Fernanda,Jillson,Shank, Edward L Esq,60480 Old Us Highway 51,Preston,Caroline,MD,21655,410-387-5260,410-724-6472,fjillson@aol.com,http://www.shankedwardlesq.com
Sylvia,Cousey,Berg, Charles E,287 Youngstown Warren Rd,Hampstead,Carroll,MD,21074,410-209-9545,410-863-8263,sylvia_cousey@cousey.org,http://www.bergcharlese.com
Loreta,Timenez,Kaminski, Katherine Andritsaki,47857 Coney Island Ave,Clinton,Prince Georges,MD,20735,301-696-6420,301-392-6698,loreta.timenez@hotmail.com,http://www.kaminskikatherineandritsaki.com
Kaitlyn,Ogg,Garrison, Paul E Esq,2 S Biscayne Blvd,Baltimore,Baltimore City,MD,21230,410-665-4903,410-773-3862,kaitlyn.ogg@gmail.com,http://www.garrisonpauleesq.com
Izetta,Dewar,Lisatoni, Jean Esq,2 W Scyene Rd #3,Baltimore,Baltimore City,MD,21217,410-473-1708,410-522-7621,idewar@dewar.com,http://www.lisatonijeanesq.com

The key to properly setting up a pipe between processes is widow'ing the pipe. This is where one end of the pipe is closed, normally the end that is not in use. For example:

//redirect input from the pipe
 close(pipe_fd[1]); //close write end of pipe
 close(0); //close stdin
 dup2(pipe_fd[0],0); //duplicate pipe to stdin

Here, the write end of the pipe is closed in the second child since it is not in use. This makes the read end of the pipe a widow in the second child, which means, once the write end of the pipe is closed in the first child, EOF is generated. All instances of the pipe must be widowed, this includes the copy that exists in the parent, and if the pipe isn't widowed, the entire pipeline will hang.

Unrolling arbitrary length pipelines

The above example demonstrates just the end of the pipes, but you will need to be able to handle arbitrary length pipelines. There are three main cases for arbitrary pipes you must consider:

  1. The process is the start of the pipeline: You should not duplicate the stdin of the pipeline, but should duplicate over the stdout.
  2. The process is the end of the pipeline: You should duplicate the stdin of the pipeline, but not duplicate the stdout of the pipeline.
  3. The process is in the middle of the pipeline: You shoudld duplicate both the stdin and stdout of the pipeline.

Consider the visual below:



       pipe_to_next      pipe_to_next      pipe_to_next
 proc_0            proc_1            proc_2                 proc_n
.--------.        .--------.        .--------.             .--------.
|        |    .---|        |    .---|        |    .--- ... |        |
| stdin  | ._/ .__| stdin  | ._/ .__| stdin  | ._/ .__     | stdin  |
|        |/ ._/   |        |/ ._/   |        |/ ._/        |        |
| stdout |_/      | stdout |_/      | stdout |_/           | stdout |
'--------'        '--------'        '--------'             '--------.
        pipe_to_prev      pipe_to_prev              pipe_to_prev

In the first and last process, proc_0 and proc_n, each only have one of there standard file descriptors duplicated. The middle processes, proc_1 and proc_2 have both stdin and stdout duplicated. In your code, you'll need to carefully test for these cases to make sure you duplicate right.

You should also note that the labels pipe_to_next and pipe_to_prev are relative to the process. For proc_1 the pipe_to_next is proc_2's pipe_to_prev. This is called shifting the pipes. In the parent, after forking a child, you'll need to set the previous pipe to the next pipe, so you can create a new next pipe.

pipe_to_prev[0] = pipe_to_next[0];
pipe_to_prev[1] = pipe_to_next[1];

Widowing Pipes everywhere

Finally, one last note about widowing pipes. You have to widow all instances of the pipe. You will be creating pipes in the parent before you fork a child, which means that you'll have instances of the pipe in both the parent and child. As a result, it could be the case that you widowed everything properly in the child, but the fact that the parent could still write to the pipe, will cause your program to hang. As a result, you need to make sure you always widow the write end of the pipe in the parent as well:

close(pipe_to_prev[1]);

Task 2 babypipe (60 points)

For this task, change into the babypipe directory in the lab directory. In there, you will find the source file babypipe.c, which you will complete. Your program must meet the following specification:

  1. It must use pipe() and dup2() to set up the pipeline for inter-process communication between the child process.
  2. All process must run in parallel, but you are not required to have all process run in a seperate group.
  3. The first process in the pipeline must leave the stdin file descriptor unaltered, and the last process in the pipeline must leave the stdout file descriptor unaltered.

There are six pre-parsed pipelines that can be used with different pre-compiler directives. To compile each of the pipeline choices you can use:

  • make 0 compile with the following pipeline:

    cat /etc/passwd | cut -d : -f 7 | sort | uniq 
    
  • make 1 compile with the following pipeline:

    cat sample-db.csv | cut -d , -f 8 | sort | uniq | wc -l
    
  • make 2 compile with the following pipeline:

    cat sample-db.csv | cut -d , -f 10 | cut -d - -f 1 | sort | uniq |wc -l
    
  • make 3 compile with the following pipeline:

    cat | cut -d , -f 10 | cut -d - -f 1 | sort | uniq |wc -l
    
  • make 4 compile with the following piepline:

    cat sample-db.csv | tail
    
  • make 5 compile with the following pipeline:

    head sample-db.csv
    

You are required to cover all the se case, be careful of edge cases. Here are sample compilation and executions of the program to compare against:

$ ./babypipe
/bin/sh
/usr/bin/false
/bin/nologin
/bin/bash
/bin/csh
$ make 1
gcc -g -Wall -DPIPE=1 babypipe.c -o babypipe
$ ./babypipe
      32
$ make 2
gcc -g -Wall -DPIPE=2 babypipe.c -o babypipe
$ ./babypipe
      87
$ make 3
gcc -g -Wall -DPIPE=3 babypipe.c -o babypipe
$ cat sample-db.csv | ./babypipe
      87
#> make clean