Lab 8: Pipeline Unrolling
Table of Contents
Preliminaries
In this lab you will complete a set of C programs that will expose you to the set of system calls that are used in shell pipelines, such as process grouping, pipes, and file descriptor duplication. There are 2 tasks, and you will likely complete neither of the tasks in lab, You will need to finish the remaining tasks outside of the lab period.
Lab Learning Goals
In this lab, you will learn the following topics and practice C programming skills.
- Unroll a pipeline with all children in the same process group
- Use the
setpgid()
andgetpgid()
system calls - Basic pipeline parsing with
strtok()
- Inter-process communication via
pipe()
anddup()
- Setting up a pre-parsed pipeline with inter-process communcation
- Use of preprocessor and
#if
for compilation
Lab Setup
Run the following command
~aviv/bin/ic221-up
Change into the lab directory
cd ~/ic221/lab/08
All the material you need to complete the lab can be found in the lab directory. All material you will submit, you should place within the lab directory. Throughout this lab, we refer to the lab directory, which you should interpret as the above path.
Submission Folder
For this lab, all ubmission should be placed in the following folder:
~/ic221/labs/08
In the top level of the lab directory, you will find a
README
file. You must complete the README
file, and include any
additional details that might be needed to complete this lab.
Compiling your programs with gcc
and make
You are not required to provide your own Makefiles for this lab.
README
In the top level of the lab directory, you will find a README
file. You must fill out the README file with your name and alpha.
Please include a short summary of each of the tasks and any other
information you want to provide to the instructor.
Test Script
You are provided a test script which prints pass/fail information
for a set of tests for your programs. Note that passing all the
tests does not mean you will receive a perfect score: other tests
will be performed on your submission. To run the test script,
execute test.sh
from the lab directory.
./test.sh
You can comment out individual tests while working on different parts of the lab. Open up the test script and place comments at the bottom where appropriate.
PART 1: The Sleeping Pipeline
In the previous lessons, we discussed the following pipeline in detail:
sleep 10 | sleep 20 | sleep 30 | sleep 50
We identified that the entire pipeline runs as single job in the shell, and each part of the pipeline is a seperate process running in parallel within a process group. The process group is important for terminal signaling, and foreground process are identified based on process group.
In this part of the lab, you implement a standard pipeline
unrolling, which is the process of parsing a pipeline based on |
and forking each of the children in their own process group. The
unrolling code will only concern itself with the individual forks,
and not inter-process communication; that's the next part.
Task 1 sleep-unroll
(40 points)
For this task, change into the unroll
directory in the lab
directory.In there, you will find the source file sleep-unroll.c
,
which you will complete. Your sleep-unroll
program must meet the
following specification:
- It must fork each part of the pipeline as its own process with proper arguments.
- All processes in the pipeline must be in their own process group identified by the pid of the first forked child process. This process group should be different than the parent's process group.
- Your program must be able to execute a set of sleep commands, at a mininum, but should be general enough to execute other commands.
A general outline of the algorithm you need to complete is provided
in the source file. Parsing code for tokenizing a string based on
"|" is provided, but you must complete the parsing of individual
commands yourself. We recomend you use a do-while
loop, which will
simplify some process, but you may use other strategies if you
like. Also, we recommend you take advantage of the 0
argument
options to setpgid()
, which can help simplify logic.
Below is some sample runs of the program to compare against. I use
the command line tool time
to demonstrate how long a pipeline
should take; notice that the pipeline is specified as the first
argument to sleep-unroll
and must be quote designated:
#> ./sleep-unroll "sleep 1"
#> time ./sleep-unroll "sleep 1"
real 0m1.010s
user 0m0.003s
sys 0m0.005s
#> time ./sleep-unroll "sleep 1 | sleep 3"
real 0m3.011s
user 0m0.004s
sys 0m0.006s
#> time ./sleep-unroll "sleep 1 | sleep 3 | sleep 2"
real 0m3.012s
user 0m0.006s
sys 0m0.009s
#> time ./sleep-unroll "sleep 1 | BAD | sleep 2"
exec: No such file or directory
real 0m2.012s
user 0m0.004s
sys 0m0.008s
#> time ./sleep-unroll "head -c 5 /dev/zero"
real 0m0.013s
user 0m0.002s
sys 0m0.005s
#> ./sleep-unroll "sleep 200 | sleep 300 | sleep 400" &
[1] 17012
#> ps -o pid,pgid,args
PID PGID ARGS
17012 17012 ./sleep-unroll sleep 200
17016 17016 sleep 200
17017 17016 sleep 300
17018 17016 sleep 400
(...)
#> fg
./sleep-unroll "sleep 200 | sleep 300 | sleep 400"
^C
#> ps -o pid,pgid,args
PID PGID ARGS
17016 17016 sleep 200
17017 17016 sleep 300
17018 17016 sleep 400
(...)
#> killall sleep
#> ps -o pid,pgid,args
PID PGID ARGS
(... NO sleep running ...)
PART 2: Baby Pipes
In this part of the lab, you will focus on the inter-process
communication of the pipeline by unrolling different length
pipelines of commands in a program called babypipe
.
To complete this task, you will need to think about how to unroll a
pipeline such that there exists a pipe between each of the
processes. This requires a sequence of calls to pipe()
to and
dup2()
for duplicating the write/read end of the pipes to
stdin/stdout of each of the processes appropriately. This will
require careful attention to the start, end and middle of the
pipeline.
Here is some sample code demonstrating a one step pipeline, child to parent, with an exec. You should be able to adapt this code to complete the lab:
/*examples/one-step.c*/
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
// cat sample-db | grep MD
char * cmd1[] = {"cat", "sample-db.csv", NULL};
char * cmd2[] = {"grep", "MD", NULL};
int pipe_fd[2];
int status;
pid_t cpid1,cpid2;
//open the pipe in parent
if(pipe(pipe_fd) < 0){
perror("pipe");
_exit(1);
}
cpid1 = fork();
if( cpid1 == 0){
/* CHILD 1*/
//redirect output to the pipe
close(pipe_fd[0]); //close read end of pipe
close(1);//close stdout
dup2(pipe_fd[1], 1); //duplicate pipe to stdout
//execute the comand
execvp(cmd1[0], cmd1);
perror("exec");
_exit(1);
}else if(cpid1 < 0){
/* ERROR */
perror("fork");
_exit(1);
}
cpid2 = fork();
if(cpid2 == 0){
/* CHILD 2*/
//redirect input from the pipe
close(pipe_fd[1]); //close write end of pipe
close(0); //close stdin
dup2(pipe_fd[0],0); //duplicate pipe to stdin
//execute the comand
execvp(cmd2[0], cmd2);
perror("exec");
_exit(1);
}else if(cpid2 < 0){
/* ERROR */
perror("fork");
_exit(1);
}
/* PARENT */
//widow pipe by closing write end of pipe
close(pipe_fd[1]);
//wait on all children
while(wait(&status) > 0 );
//sucess
return 0;
}
The program above generates a one-step pipe line for the command:
cat sample-db | grep MD
And if you execute it, you get the desired result:
#>./one-step Kris,Marrier,King, Christopher A Esq,228 Runamuck Pl #2808,Baltimore,Baltimore City,MD,21224,410-655-8723,410-804-4694,kris@gmail.com,http://www.kingchristopheraesq.com Ezekiel,Chui,Sider, Donald C Esq,2 Cedar Ave #84,Easton,Talbot,MD,21601,410-669-1642,410-235-8738,ezekiel@chui.com,http://www.siderdonaldcesq.com Ilene,Eroman,Robinson, William J Esq,2853 S Central Expy,Glen Burnie,Anne Arundel,MD,21061,410-914-9018,410-937-4543,ilene.eroman@hotmail.com,http://www.robinsonwilliamjesq.com Fernanda,Jillson,Shank, Edward L Esq,60480 Old Us Highway 51,Preston,Caroline,MD,21655,410-387-5260,410-724-6472,fjillson@aol.com,http://www.shankedwardlesq.com Sylvia,Cousey,Berg, Charles E,287 Youngstown Warren Rd,Hampstead,Carroll,MD,21074,410-209-9545,410-863-8263,sylvia_cousey@cousey.org,http://www.bergcharlese.com Loreta,Timenez,Kaminski, Katherine Andritsaki,47857 Coney Island Ave,Clinton,Prince Georges,MD,20735,301-696-6420,301-392-6698,loreta.timenez@hotmail.com,http://www.kaminskikatherineandritsaki.com Kaitlyn,Ogg,Garrison, Paul E Esq,2 S Biscayne Blvd,Baltimore,Baltimore City,MD,21230,410-665-4903,410-773-3862,kaitlyn.ogg@gmail.com,http://www.garrisonpauleesq.com Izetta,Dewar,Lisatoni, Jean Esq,2 W Scyene Rd #3,Baltimore,Baltimore City,MD,21217,410-473-1708,410-522-7621,idewar@dewar.com,http://www.lisatonijeanesq.com
The key to properly setting up a pipe between processes is widow'ing the pipe. This is where one end of the pipe is closed, normally the end that is not in use. For example:
//redirect input from the pipe
close(pipe_fd[1]); //close write end of pipe
close(0); //close stdin
dup2(pipe_fd[0],0); //duplicate pipe to stdin
Here, the write end of the pipe is closed in the second child since it is not in use. This makes the read end of the pipe a widow in the second child, which means, once the write end of the pipe is closed in the first child, EOF is generated. All instances of the pipe must be widowed, this includes the copy that exists in the parent, and if the pipe isn't widowed, the entire pipeline will hang.
Unrolling arbitrary length pipelines
The above example demonstrates just the end of the pipes, but you will need to be able to handle arbitrary length pipelines. There are three main cases for arbitrary pipes you must consider:
- The process is the start of the pipeline: You should not duplicate the stdin of the pipeline, but should duplicate over the stdout.
- The process is the end of the pipeline: You should duplicate the stdin of the pipeline, but not duplicate the stdout of the pipeline.
- The process is in the middle of the pipeline: You shoudld duplicate both the stdin and stdout of the pipeline.
Consider the visual below:
pipe_to_next pipe_to_next pipe_to_next proc_0 proc_1 proc_2 proc_n .--------. .--------. .--------. .--------. | | .---| | .---| | .--- ... | | | stdin | ._/ .__| stdin | ._/ .__| stdin | ._/ .__ | stdin | | |/ ._/ | |/ ._/ | |/ ._/ | | | stdout |_/ | stdout |_/ | stdout |_/ | stdout | '--------' '--------' '--------' '--------. pipe_to_prev pipe_to_prev pipe_to_prev
In the first and last process, proc_0
and proc_n
, each
only have one of there standard file descriptors duplicated. The
middle processes, proc_1
and proc_2
have both stdin and stdout
duplicated. In your code, you'll need to carefully test for these
cases to make sure you duplicate right.
You should also note that the labels pipe_to_next
and
pipe_to_prev
are relative to the process. For proc_1
the
pipe_to_next
is proc_2
's pipe_to_prev
. This is called
shifting the pipes. In the parent, after forking a child, you'll
need to set the previous pipe to the next pipe, so you can create a
new next pipe.
pipe_to_prev[0] = pipe_to_next[0];
pipe_to_prev[1] = pipe_to_next[1];
Widowing Pipes everywhere
Finally, one last note about widowing pipes. You have to widow all instances of the pipe. You will be creating pipes in the parent before you fork a child, which means that you'll have instances of the pipe in both the parent and child. As a result, it could be the case that you widowed everything properly in the child, but the fact that the parent could still write to the pipe, will cause your program to hang. As a result, you need to make sure you always widow the write end of the pipe in the parent as well:
close(pipe_to_prev[1]);
Task 2 babypipe
(60 points)
For this task, change into the babypipe
directory in the lab
directory. In there, you will find the source file babypipe.c
,
which you will complete. Your program must meet the following
specification:
- It must use
pipe()
anddup2()
to set up the pipeline for inter-process communication between the child process. - All process must run in parallel, but you are not required to have all process run in a seperate group.
- The first process in the pipeline must leave the stdin file descriptor unaltered, and the last process in the pipeline must leave the stdout file descriptor unaltered.
There are six pre-parsed pipelines that can be used with different pre-compiler directives. To compile each of the pipeline choices you can use:
make 0
compile with the following pipeline:cat /etc/passwd | cut -d : -f 7 | sort | uniq
make 1
compile with the following pipeline:cat sample-db.csv | cut -d , -f 8 | sort | uniq | wc -l
make 2
compile with the following pipeline:cat sample-db.csv | cut -d , -f 10 | cut -d - -f 1 | sort | uniq |wc -l
make 3
compile with the following pipeline:cat | cut -d , -f 10 | cut -d - -f 1 | sort | uniq |wc -l
make 4
compile with the following piepline:cat sample-db.csv | tail
make 5
compile with the following pipeline:head sample-db.csv
You are required to cover all the se case, be careful of edge cases. Here are sample compilation and executions of the program to compare against:
$ ./babypipe
/bin/sh
/usr/bin/false
/bin/nologin
/bin/bash
/bin/csh
$ make 1
gcc -g -Wall -DPIPE=1 babypipe.c -o babypipe
$ ./babypipe
32
$ make 2
gcc -g -Wall -DPIPE=2 babypipe.c -o babypipe
$ ./babypipe
87
$ make 3
gcc -g -Wall -DPIPE=3 babypipe.c -o babypipe
$ cat sample-db.csv | ./babypipe
87
#> make clean