IC221: Systems Programming (SP14)


Home Policy Calendar Syllabus Resources Piazza

Lec 12: Process State and O.S. Scheduling

Table of Contents

1 Process State and Scheduling

The scheduler of the O.S. is the most fundamental and perhaps the most important component of the O.S. kernel. Its sole task is to decide which process will run next on the CPU, the heart of process management. This decisions cannot be made lightly. There are many contingent properties to consider when discussing the scheduler and process management; such as, ensuring that all process have a chance to run and that the cpu is being fully utilized.

To complicate the issue, not all process are ready to run. Process can be in many different states. Some are ready and can be run immediately, for sure, but others may be blocked or waiting for some other process to complete, like a shell, or for a hardware component to become available, like reading from disc. Some process are neither blocked nor waiting, but just sitting around wasting resources … zombies, and others loose their parents and are orphaned and adopted by other processes. It's a whole ecosystem.

In this lesson, we'll explore this ecosystem of varied process states and how this affect the O.S. resource of process management. We will look briefly at different ideas in scheduling algorithms and, in particular, how to interact with the Unix scheduler. We will also go over the various states of a process, both alive and undead.

2 The O.S. Scheduler

The scheduling of process or jobs is a large academic area. Depending upon the task at hand, different scheduling algorithms are used. For example, in real-time systems, like in a airplane auto-pilo, it is important to ensure that a computation occurs immediately so action can be taken. You may want to schedule tasks differently on a desktop computer where waiting a few milliseconds for a webpage to load isn't going to kill anyone.

We will begin the discussion by first overviewing a few different scheduling concepts. Next, we'll explore the Unix scheduling routine, and how you, the programmer, can interact with the schedule to change the priority of your processes.

2.1 O.S. Scheduling Ideas

A good scheduling algorithm must meet the following constraints:

  • CPU Utilization: The processor should be utilized at the highest possible level. If there exists a process that can run, it should be running.
  • Turnaround Time: How long does it take for a process to run once it is ready to run?
  • Fairness: All tasks, of equal priority, should have the same opportunity to run. Or, no process should be starved for resources, and unable to run.

Based on these concepts, we can come up with some fairly naive strategy for scheduling. Consider a situation where, you — the processor — are being tasked with completing some amount of work divided into jobs. Jobs arrive in some order, each takes some amount of time, and once you start a job, you don't stop until you finish it.

One strategy you could employ is just First-Come-First-Serve, like a queue. While this is effective at CPU Utilization and Fairness, it doesn't have good turnaround. A job at the end of the queue could end up waiting a long time before it actually gets to run, and that job could be the computation needed most of all.

Another strategy, we could employ is some kind of introspection scheduling routine. Suppose we knew how long it would take for a job to complete, then we could just always schedule the shortest job next. Or, perhaps we are facing a deadline, like we must compute a flight trajectory RIGHT NOW or this plane is going down, then we could schedule jobs based on a deadline, or deadline scheduling. These strategies work great if we know how long tasks take to complete; we do not have that luxury on most Unix systems — have you ever written an infinite loop?

2.2 Preemptive Scheduling

Unix systems, and most modern consumer Operatin Systems, use a form of preemptive scheduling, which means that a process gets scheduled for some amount of time, then it is preempted before it get's to finish and another process get's to run for some amount of time, which will get preempted, allowing another process to run, and so on. Returning to the scenario where you are the scheduler, now you don't have to fully complete the job before moving onto the next. We can again come up with a few reasonable strategies for how to schedule tasks.

One simple solution to scheduling is to just allow each task to run for a set amount of time, say 500 milliseconds, and then you swap it out with the next job, running for 500 milliseconds, and then the next, and etc. You move in this round robin fashion until all the jobs are complete, and, on its surface, this seems like a great strategy. The CPU is being utilized a lot, all jobs have the same turnaround, and it is the most fair way to divide time as possible.

But, there are other factors to consider. Maybe if you let a job run for a bit longer then your timeout, it would have completed, and would not have to rotate all the way around before finishing? Or, maybe a job is using the CPU but it is not doing any computation; instead, it is just waiting for the disc to return with data, which can take an eternity in CPU time. It doesn't seem like the CPU is being properly utilized in either of those situations.

2.3 Priority Queue Scheduling

Unix implements a hybrid strategy for scheduling that is a mix of round-robin scheduling and priority scheduling, where higher priority process go first. Essentially, there is a set of priority queues, from -20 to positive 19, and within each priority queue, process are run in a round robin fashion, but which queue executes is chosen based on priority preference. To avoid starvation, process can be moved from lower priority queues to higher priority queues when they have not run in awhile. This strategy is called multilevel queue scheduling and it is used in most Unix systems, and is very effective and highly efficient in the selection of next processes to run. It is also easily customized and adaptable for user needs, e.g., running tasks at the highest priority.

The gritty details of the Unix scheduler is a topic for your Operating Systems class, not systems programming, and instead we will concern ourselves with how we interact with the scheduler. There are two key questions to address:

  • Given a process, how do we understand its priority?
  • And, how do we change its priority?

2.3.1 How nice a program is its priority

In Unix, the concept of priority is described as how nice a process is to another process. The nicer a process is, the more willing it is for another process to run in its stead. Thus, nicer processes have lower priority and meaner processes have higher priority.

The nice-ness scale is between -20, very mean, to 19, very nice. By default, all programs are born with niceness 0: just plain ambivalent. Some programs must then become less nice, such as interactive process that must meet some deadline. For example, you want your computer to play sound when you click on something, or, for that matter, your mouse to move when you move it. Those process shouldn't be scheduled with a low priority because it would degrade the user experience.

There are also other process running where being nice is fine. For example, many daemons, programs constantly running the background, do not need a lot of priority. They are not interactive, and not being scheduled right away is not going to upset any users.

2.3.2 Viewing Process State with ps

One way to view the nice level of a process is using the shell command ps, short for process state. The ps command is an amazing tool for inspecting the state of process, and below, we use it to show the nice-ness of the process running on a typical Linux system in the lab.

#> ps axo nice,comm,pid --sort nice
 NI COMMAND           PID
-20 cpuset             36
-20 khelper            37
-20 netns              39
-20 kintegrityd        43
-20 kblockd            44
-20 ata_sff            45
-20 md                 47
-20 crypto             56
-20 kthrotld           65
-20 binder             69
-20 deferwq            88
-20 charger_manager    89
-20 devfreq_wq         90
-20 ext4-dio-unwrit   282
-20 rpciod            448
-20 nfsiod            455
-20 kpsmoused         549
-20 kvm-irqfd-clean   887
-20 iprt             1224
-20 hd-audio0        1353
-11 pulseaudio       1735
-10 krfcommd          639
  0 init                1
(...)

These commands, with negative nice values, are vital to the execution of the computer. For example, cpuset is a process that assists with loading of memory and process for execution, the pulseaudio systems is used for audio output, and the various k* commands are kernel process. Those really need to be run with high priority.

In the middle, you'll find many process with priority 0.

(...)
  0 ntpd             2040
  0 sshd            13782
  0 sshd            13789
  0 bash            13790
  0 emacs           14453
  0 emacs           14712
  0 emacs           15219
  0 emacs           15273
  0 emacs           15304
  0 emacs           18917
  0 kworker/7:0     21860
  0 emacs           27417
  0 crontab         28380
  0 sh              28381
  0 sensible-editor 28382
  0 emacs23         28390
  0 sshd            29814
  0 sshd            29821
  0 bash            29822
  0 crontab         29894
  0 sh              29895
(...)

These process are user-level process. I have a number of emacs sessions open, which all have equal priority. The shell, bash, is also running with priority 0. Finally, at the bottom, are the nicest of processes:

(...)
  1 rtkit-daemon     1737
  5 ksmd               52
 19 khugepaged         53

The rtkit-daemon is a process for realtime scheduling, but it doesn't need to run that often, in relation to the process it is scheduling. Thus, its just a bit nice. While khugepaged is a process that manages large memory pages, collapsing them whenever necessary, but this operation doesn't have to occur for the system to function properly, it's just nice when it does. Thus, the priority of this process is quite low, being very nice.

2.3.3 Programming Priority with getpriority() and nice()

When programming, we can inspect the current priority level of a process using the getpriority() system call, which has the following function declaration:

#include <sys/time.h>
#include <sys/resource.h>

int getpriority(int which, int who);

The which is for which kind of priority we are interested in. The Unix system has multiple priority indicators, but we are only concerned with the process level priority, or PRIO_PROCESS. The who is to query the priority of a given process (or other identity) by id. Using the value 0 for who refers to the current process.

Here's a program that will print the current nice value for the process:

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/resource.h>

int main(){

  int prio;

  prio = getpriority(PRIO_PROCESS, 0); //get the priority of this process

  printf("Priority: %d\n",prio);
  return 0;

}

Of course, since the default priority of any process is 0, the output of this program is:

#> ./getpriority
Priority: 0

We can change the priority of a process using the nice command, which exists as both a command line tool and system call. It's purpose is to make programs more nice or less nice. For example, on the command line, we can run the getpriority with a nice value of 10.

#> nice -n 10 ./getpriority
Priority: 10

The output now indicates that the priority of the program is 10; it is more nice, more willing to let other programs with lower priority values run. We can try and set a negative priority value with nice:

#>nice -n -10 ./getpriority
nice: setpriority: Permission denied
Priority: 0

This will fail because only the privileged user can set a negative nice value; however, sudo can make this sandwich.

#> sudo nice -n -10 ./getpriority
Password:
Priority: -10

The sudo command says, run this command as the privilege super user … you do not have sudo privilege on the lab machines.

In code, we can do the same thing using the nice() system call:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/resource.h>


int main(){

  int prio;

  prio = getpriority(PRIO_PROCESS, 0); //get the priority of this process
  printf("Priority: %d\n",prio);


  if( nice(10) != 10){ //error checking is different
    perror("nice");
    return 1;
  }

  prio = getpriority(PRIO_PROCESS, 0); //get the priority of this process
  printf("Priority: %d\n",prio);

  return 0;
}

One thing to note about nice() is that it returns the new priority value, which may be negative. To error check, you must compare the output of nice() with the value you expect.

3 Process States

Now that we have a decent understanding about how process are scheduled and how to interact with the O.S. to affect the priority of scheduling, let us turn our attention to the different states of a process. This is an important discussion because a scheduler must consider both the priority of a process and if it ready to run in order to maximize the utilization of the CPU. It turns out that not all processes are ready to run, and that a process can be in a number of different states.

3.1 Running, Waiting, Blocked, and Undead

There are three main states of a process. It may be running, which means it is currently running on the CPU; it may be ready and waiting, which menas that it is ready to run immediately if selected; and, it may be blocked and waiting, which means that it is not ready to run because some other information is needed by the process before it can proceed. A scheduler will consider both the priority of the process as well as it's current state

A process can be blocked for a number of reasons. Perhaps the most common cause of a process being blocked is I/O, such as reading and writing from a device. Consider that a read/write is a system call, which means that a context switch must occur where the process stops running and the kernel starts running to complete the action. Conceptually, we'd like to think that the context switch occurs immediately, the process swaps out, the kernel swaps in, performs the action, and the kernel swaps out, and the process swaps back in and continues on — but it doesn't work that way.

Instead, once the system call is initiated, the process is blocked and waiting for the system call to complete. At this point, the process is no longer eligible to be run, i.e., it is not ready, and the scheduler can schedule another process to run. The next process may be the kernel, which could complete the system call and return the result and unblock the process, or it could just be another user process, in which case the blocked process remains so.

Another way to think about how I/O blocks a process is to consider the shell. The shell prompts a user for input by reading from stdin, but it cannot continue until the user actually typed something into the terminal and hits "enter." That could take an eternity in compute cycles, and instead, the shell is blocking on I/O waiting for user input. In the meanwhile, other processes can run, and this is how the scheduler ensures high utilization.

Beyond I/O, a process can block for other reasons. For example, a call to wait() is a blocking system call. The parent process is designated to block and will not proceed until the kernel recognizes a state change in the child, which will unblock the parent.

3.2 Undead processes: Zombies

To understand the idea behind undead process, you must first think about how the system call wait() functions. When a parent is calling wait() it is specifically blocking until the child process has terminated (or changed state). That means, there is some period of time between when the process actually terminates and when the parent process becomes unblocked. The child process still exists during that short window, but is not alive … it is undead, or a zombie!

A zombie is a process that has terminated but has not been wait()'ed on by its parent. For example, consider the following program:

int main(){
  int i;
  for(i=0; i<10;i++){

   if(fork() == 0){ //create a child
      _exit(0); // and exit
   }

  }

  while(1); //run forever
}

The parent process will fork 10 children, who all just _exit() immediately, but the parent does not call wait() and instead spin's in a while loop. The child processes have terminated and are now in limbo, not running but waiting to be wait()'ed on by the parent. They are zombies! And we can see this state by looking at the ps output:

#> ps -o pid,comm,state
  PID COMMAND       STATE
 2362 bash            S
 2463 zombies         R
 2464 zombi <defunct> Z
 2465 zombi <defunct> Z
 2466 zombi <defunct> Z
 2467 zombi <defunct> Z
 2468 zombi <defunct> Z
 2469 zombi <defunct> Z
 2470 zombi <defunct> Z
 2471 zombi <defunct> Z
 2472 zombi <defunct> Z
 2473 zombi <defunct> Z
 2475 ps              R

"Z" is for Zombie, while "R" is for running. The additional "<defunct>" indicator is what the UNIX system uses to indicate that the process is now a zombie.

The big problem with zombie processes is in long running services. If you have zombie leak, these process can stick around for the entirety of the parent program. This wastes resources, and eventually, a zombie apocalypse will occur, and you're kernel will crash as it slowly runs out of resources.

3.3 Orphan Process, init, and daemons

At this point you may be wondering, what happens if the parent dies before the children? Won't the children become zombies forever if there is no parent to ever call wait()?

The answer is no. If a parent dies before the children, those children are now designated as orphans. The special process init inherits all orphans and will call wait() on them once the children complete.

It is sometimes desirable for a process to become an orphan and be inherited by init, such as long running process that always execute in the background. In Unix terms, such process are called daemons, and if you look at all the processes running on your unix computer, these are processes with names that end in "d." A daemon is usually formed by having a process fork a child, which executes the logic of the daemon, while the parent dies. The daemon is now an orphan, which is inherited by init, and the daemon continues running happily in the background.