IC221: Systems Programming (SP16)


Home Policy Calendar Resources

Lab 08: Process State Monitoring

Table of Contents

Preliminaries

In this lab you will complete a set of C programs that will expose you to termination status, fork-exec-wait loops, and tokenizing strings. There are 2 tasks, you will likely complete only task 1 in lab, and begin with task 2. You will need to finish outside of lab.

Lab Setup

Run the following command

~aviv/bin/ic221-up

Change into the lab directory

cd ~/ic221/lab/08

All the material you need to complete the lab can be found in the lab directory. All material you will submit, you should place within the lab directory. Throughout this lab, we refer to the lab directory, which you should interpret as the above path.

Submission Folder

For this lab, all submission should be placed in the following folder:

~/ic221/lab/08

This directory contains 2 sub-directories; myps and mypstree. All lab work should be done in the remaining directories.

  • Only source files found in the folder will be graded.
  • Do not change the names of any source files

Finally, in the top level of the lab directory, you will find a README file. You must complete the README file, and include any additional details that might be needed to complete this lab.

Compiling your programs with gcc and make

You are required to provide your own Makefiles for this lab. Each of the source folders, myps, mypstree, and fg-shell, must have a Makefile. We should be able to compile your programs by typing make in each source directory. The following executables should be generated:

  • myps : executable for myps directory
  • mypstree : executable for mypstree directory

When compiling fg-shell, you will need to link against the readline library. Add the -lreadline to your compile command, and you can refer to the previous lab (Lab 6) for an example.

README

In the top level of the lab directory, you will find a README file. You must fill out the README file with your name and alpha. Please include a short summary of each of the tasks and any other information you want to provide to the instructor.

Test Script

You are provided a test script which prints pass/fail information for a set of tests for your programs. Note that passing all the tests does not mean you will receive a perfect score: other tests will be performed on your submission. To run the test script, execute test.sh from the lab directory.

./test.sh

You can comment out individual tests while working on different parts of the lab. Open up the test script and place comments at the bottom where appropriate.


Process State Monitoring via /proc/[pid]

We survived the "nuclear" apocalypse, and now on safer ground, we can start our investigation into process state. In this part of the lab, you will write two small programs that, given a process id, will retrieve status information to display … a lot like ps and pstree

The /proc File System

The /proc file system is special on Linux. It does not exist on disc, but rather is a pseudo-file system. It's the place where the O.S. allows users to take a peak at the internal workings. For example, you can check out information about memory usage:

cat /proc/meminfo 
MemTotal:        7862628 kB
MemFree:         6207280 kB
Buffers:          408632 kB
Cached:           865164 kB
SwapCached:            0 kB
Active:           634824 kB
Inactive:         724844 kB
Active(anon):      86628 kB
Inactive(anon):    38332 kB
Active(file):     548196 kB
Inactive(file):   686512 kB
Unevictable:           4 kB
Mlocked:               4 kB
SwapTotal:      12753916 kB
SwapFree:       12753916 kB
Dirty:                 0 kB
(...)

You can also see information about the processor:

cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping	: 9
microcode	: 0xc
cpu MHz		: 1600.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips	: 6783.84
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:
(...)

If we take a look at the directory more closely, we can see that there is all sorts of information in there:

#> ls -F /proc/
1/      1257/  1375/   1637/  1841/  250/  36/   477/  674/   acpi/        ioports        sched_debug
10/     1267/  1393/   1640/  1871/  253/  37/   48/   675/   asound/      irq/           schedstat
1003/   1339/  14/     1656/  19/    254/  38/   499/  68/    buddyinfo    kallsyms       scsi/
1010/   1349/  15/     1662/  1924/  257/  39/   50/   69/    bus/         kcore          self@
1014/   1353/  1525/   1666/  2/     259/  4/    51/   7/     cgroups      key-users      slabinfo
1021/   1360/  1528/   1670/  20/    26/   406/  52/   7059/  cmdline      kmsg           softirqs
1024/   1361/  1545/   1695/  2010/  262/  41/   53/   7066/  consoles     kpagecount     stat
1027/   1362/  15609/  17/    2040/  268/  412/  54/   7067/  cpuinfo      kpageflags     swaps
1038/   1363/  16/     1700/  22/    27/   42/   549/  8/     crypto       latency_stats  sys/
1040/   1364/  16077/  1702/  23/    278/  428/  55/   88/    devices      loadavg        sysrq-trigger
1042/   1365/  16095/  1704/  2354/  28/   43/   558/  887/   diskstats    locks          sysvipc/
1052/   1366/  16119/  1709/  2361/  281/  44/   56/   888/   dma          mdstat         timer_list
1071/   1367/  16130/  1725/  2362/  282/  441/  566/  89/    dri/         meminfo        timer_stats
11/     1368/  16131/  1727/  24/    29/   448/  6/    897/   driver/      misc           tty/
1110/   1369/  1618/   1729/  245/   3/    45/   623/  9/     execdomains  modules        uptime
11209/  1370/  1623/   1735/  246/   30/   450/  639/  90/    fb           mounts@        version
11511/  1371/  1624/   1737/  247/   31/   455/  65/   907/   filesystems  mtrr           version_signature
12/     1372/  1626/   18/    248/   32/   46/   660/  923/   fs/          net@           vmallocinfo
1200/   1373/  1632/   1835/  249/   34/   469/  661/  953/   interrupts   pagetypeinfo   vmstat
1224/   1374/  1635/   1840/  25/    35/   47/   67/   957/   iomem        partitions     zoneinfo

Most of this is not of interest to us, but the numerical directories are. Every time a process is created, the kernel creates a new directory in the /proc file system. The process id is the same as the directory name. For example, I can run ps and see the process id's for the processes in my terminal.

#> ps
  PID TTY          TIME CMD
 7067 pts/1    00:00:00 bash
16133 pts/1    00:00:00 ps

I can also look in the directory for the bash process:

ls -F /proc/7067/
attr/       cmdline          environ  latency     mem         ns/            pagemap      sessionid  status
autogroup   comm             exe@     limits      mountinfo   numa_maps      personality  smaps      syscall
auxv        coredump_filter  fd/      loginuid    mounts      oom_adj        root@        stack      task/
cgroup      cpuset           fdinfo/  map_files/  mountstats  oom_score      sched        stat       wchan
clear_refs  cwd@             io       maps        net/        oom_score_adj  schedstat    statm

Each of these files provide some information about that program. For example, the comm file is the name of the command and the fd/ stores the open file descriptors.

#> cat /proc/7067/comm 
bash
#> ls -F /proc/7067/fd
0@  1@  2@  255@

Parsing proc[pid]/stat

Of relevance to this lab task is the /proc/[pid]/stat file. This file contains the current status of a running process. Here is the status for bash:

cat /proc/7067/stat
7067 (bash) S 7066 7067 7067 34817 16543 4202496 15913 163332 0 5 35 8 176 78 20 0 1 0 112555708 21057536 1440 18446744073709551615 4194304 5111460 140734366799328 140734366797904 139797077691534 0 65536 3686404 1266761467 18446744071579207412 0 0 17 0 0 0 0 0 0 7212552 7248528 7630848 140734366801521 140734366801527 140734366801527 140734366801902 0

While this may seem like a just a bunch of numbers, to the trained eye, there is a ton of information in here. You can read about it using man 5 proc. The relevant information is below, as well as the fscanf() code for reading it.

  • pid : "%d" : process id
  • comm : "%s" : command name, with parenthesis
  • status : "%c" : status id, such as S for sleeping/suspend, R for running
  • ppid : "%d" : parent process id

In code, it's fairly straight forward to open the stat file and scan in this information:

FILE * stat_f = fopen("/proc/7067/stat","r");
fscanf(stat_f, "%d %s %c %d", &pid, comm, &stat, &ppid);
fclose(stat_f);

And, that is exactly what you are going to do in the next two tasks.

String Format Printing

For the tasks below, you will need to be able to perform a format print into a string. So far, we've been format printing to the terminal an open file, but you can also format print into a string to save it for other uses. Here is the function definition for snprintf() the string printf function:

int snprintf(char *str, size_t size, const char *format, ...);

The first argument, str, is a pointer to the string to store the result of the formatting. The second argument size is the size of str so we don't overflow the string buffer. The rest is the same as other format printing. Here's a sample program:

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[]]){
 char hello[1024];

 snprintf(hello, 1024, "You said, \"%s\"\n", argv[1]);

 printf("%s\n",hello); 

}

You'll find snprintf() very useful because you'll receive a process id from the command line, and need to open the stat file /proc/[pid]/stat, which means you'll need to write a number into a string before using fopen(). This is exactly what snprintf() does best.

Task 1 myps

Change into the myps directory in the lab directory, in which you'll find the source code myps.c. You should provide a Makefile to compile the source code to myps.

The myps command will accept process id's on the command line and will print information about those process in a tab separated format. You should use the format print below in order to pass the tests:

printf("%d\t%s\t%c\t%d\n", pid, clean_comm(comm), state, ppid);

The clean_comm() function is provided for you to remove the "(" and ")". You must read all information from /proc/[pid]/stat file.

Here is some sample usage.

aviv@saddleback: myps $ ps
  PID TTY          TIME CMD
15201 pts/2    00:00:30 emacs
17724 pts/2    00:00:00 ps
27538 pts/2    00:00:09 bash
aviv@saddleback: myps $ ./myps 15201
PID	COMM	STATE	PPID
15201	emacs	T	27538
aviv@saddleback: myps $ ./myps 27538
PID	COMM	STATE	PPID
27538	bash	S	27537
aviv@saddleback: myps $ ./myps 27538 15201
PID	COMM	STATE	PPID
27538	bash	S	27537
15201	emacs	T	27538
aviv@saddleback: myps $ ./myps BAD_PID
PID	COMM	STATE	PPID
ERROR: Invalid pid BAD_PID
aviv@saddleback: myps $ ./myps 27538 BAD_PID 15201
PID	COMM	STATE	PPID
27538	bash	S	27537
ERROR: Invalid pid BAD_PID
15201	emacs	T	27538
aviv@saddleback: myps $ ./myps 11111
PID	COMM	STATE	PPID
ERROR: Invalid pid 11111

If a bad process id is provided, an error message should be printed to stderr.

The process family hierarchy

All processes start with the kernel, but we can also view the process as a tree, from init down. The pstree function is really quite good at this. Here's an example just for the process I'm using:

pstree -ua aviv
sshd    
  └─bash

sshd    
  └─bash
      ├─cat
      ├─emacs
      ├─emacs myps.c
      └─pstree -ua aviv

If you think about it, all this is doing is traversing from a given process to the parent process, all the way up to init. The /proc file system has all this information, and it's fairly simple to adapt the code above to recurse the parent-child tree.

Double Linked Lists

To store and mange paths from a leaf to a root in the process tree, we will use a double linked list. A double linked list is just like the linked lists we used in previous labs except that instead of having only a forward pointer, from one node to the next node, it also has a previous pointer, from one node to the previous node in the list. Additionally, the list structure references both the head of the list and the tail of the list.

Visually, a double linked list looks like so:



        list     
     .--------.
  .--+- head  |
  |  |  tail -+-------------------------------.
  |  '--------'                               |
  '----.                                      |
       V                                      V
      .------.   .------.       .------.   .------.
      | next +-->| next +-> ... | next +-->| next +->NULL
NULL<-+ prev |<--+ prev |     <-+ prev |<--+ prev |
      '------'   '------'       '------'   '------'

This structure is particularly useful for the task of implementing pstree because you will traverse from the child up to the parent, and then to print out the path, you have to iterate backwards. A double linked list can easily iterate in both the forward and reverse direction because of the double pointer structure.

Task 2 mypstree

Change into the mypstree directory in the lab directory, in which you'll find the source code mypstree.c as well as the header file linkedlist.h and the source file linkedlist.c. You should provide a Makefile to compile the source code for mypstree that will depend on compiling both source files..

Your program must meet the following requirements:

  • You should complete the append() and del_list() functions for the double linked list in the linkedlist.c source file. You will use this structure to store process parent/child paths.
  • In mypstree you should complete the functions path_to_parent() and main()
    • The path_to_parent() function is a recursive function, as in it will call itself. The recursive step occurs for each parent process until the base case is reached, i.e., init is reached. For each recursive call, a process will be added to the list.
    • In the main function, you will need to create a new list, call path_to_parent(), print the list with the function provided, and deallocated the list. Use the linked list library functions where appropriate.
  • The pid is the command line argument, and multiple pid's can be provided, which will print multiple trees.
  • Errors should be reported and printed to STDERR, using the following format print below. This can occur, for example, if you are unable to open the stat file.

    fprintf(stderr, "ERROR: Invalid pid %d\n", parent);
    

Some sample output is provided below:

aviv@saddleback: mypstree $ ps
  PID TTY          TIME CMD
15201 pts/2    00:00:30 emacs
20955 pts/2    00:00:00 cat
20956 pts/2    00:00:00 ps
27538 pts/2    00:00:09 bash

[2]+  Stopped                 cat
aviv@saddleback: mypstree $ ./mypstree 15201
init
  └─sshd
    └─sshd
      └─sshd
        └─bash
          └─emacs
            
aviv@saddleback: mypstree $ ./mypstree 20955
init
  └─sshd
    └─sshd
      └─sshd
        └─bash
          └─cat
            
aviv@saddleback: mypstree $ ./mypstree 15201 20955
init
  └─sshd
    └─sshd
      └─sshd
        └─bash
          └─emacs
            
init
  └─sshd
    └─sshd
      └─sshd
        └─bash
          └─cat
            
aviv@saddleback: mypstree $ ./mypstree BAD_PID
ERROR: Invalid pid BAD_PID
aviv@saddleback: mypstree $ ./mypstree 15201 BAD_PID 20955
init
  └─sshd
    └─sshd
      └─sshd
        └─bash
          └─emacs
            
ERROR: Invalid pid BAD_PID
init
  └─sshd
    └─sshd
      └─sshd
        └─bash
          └─cat