Lec 15: Process State and O.S. Scheduling
Table of Contents
1 Process State and Scheduling
The scheduler of the O.S. is the most fundamental and perhaps the most important component of the O.S. kernel. Its sole task is to decide which process will run next on the CPU, the heart of process management. This decisions cannot be made lightly. There are many contingent properties to consider when discussing the scheduler and process management; such as, ensuring that all process have a chance to run and that the cpu is being fully utilized.
To complicate the issue, not all process are ready to run. Process can be in many different states. Some are ready and can be run immediately, for sure, but others may be blocked or waiting for some other process to complete, like a shell, or for a hardware component to become available, like reading from disc. Some process are neither blocked nor waiting, but just sitting around wasting resources … zombies, and others loose their parents and are orphaned and adopted by other processes. It's a whole ecosystem.
In this lesson, we'll explore this ecosystem of varied process states and how this affect the O.S. resource of process management. We will look briefly at different ideas in scheduling algorithms and, in particular, how to interact with the Unix scheduler. We will also go over the various states of a process, both alive and undead.
2 The O.S. Scheduler
The scheduling of process or jobs is a large academic area. Depending upon the task at hand, different scheduling algorithms are used. For example, in real-time systems, like in a airplane auto-pilo, it is important to ensure that a computation occurs immediately so action can be taken. You may want to schedule tasks differently on a desktop computer where waiting a few milliseconds for a webpage to load isn't going to kill anyone.
We will begin the discussion by first overviewing a few different scheduling concepts. Next, we'll explore the Unix scheduling routine, and how you, the programmer, can interact with the schedule to change the priority of your processes.
2.1 O.S. Scheduling Ideas
A good scheduling algorithm must meet the following constraints:
- CPU Utilization: The processor should be utilized at the highest possible level. If there exists a process that can run, it should be running.
- Turnaround Time: How long does it take for a process to run once it is ready to run?
- Fairness: All tasks, of equal priority, should have the same opportunity to run. Or, no process should be starved for resources, and unable to run.
Based on these concepts, we can come up with some fairly naive strategy for scheduling. Consider a situation where, you — the processor — are being tasked with completing some amount of work divided into jobs. Jobs arrive in some order, each takes some amount of time, and once you start a job, you don't stop until you finish it.
One strategy you could employ is just First-Come-First-Serve, like a queue. While this is effective at CPU Utilization and Fairness, it doesn't have good turnaround. A job at the end of the queue could end up waiting a long time before it actually gets to run, and that job could be the computation needed most of all.
Another strategy, we could employ is some kind of introspection scheduling routine. Suppose we knew how long it would take for a job to complete, then we could just always schedule the shortest job next. Or, perhaps we are facing a deadline, like we must compute a flight trajectory RIGHT NOW or this plane is going down, then we could schedule jobs based on a deadline, or deadline scheduling. These strategies work great if we know how long tasks take to complete; we do not have that luxury on most Unix systems — have you ever written an infinite loop?
2.2 Preemptive Scheduling
Unix systems, and most modern consumer Operatin Systems, use a form of preemptive scheduling, which means that a process gets scheduled for some amount of time, then it is preempted before it get's to finish and another process get's to run for some amount of time, which will get preempted, allowing another process to run, and so on. Returning to the scenario where you are the scheduler, now you don't have to fully complete the job before moving onto the next. We can again come up with a few reasonable strategies for how to schedule tasks.
One simple solution to scheduling is to just allow each task to run for a set amount of time, say 500 milliseconds, and then you swap it out with the next job, running for 500 milliseconds, and then the next, and etc. You move in this round robin fashion until all the jobs are complete, and, on its surface, this seems like a great strategy. The CPU is being utilized a lot, all jobs have the same turnaround, and it is the most fair way to divide time as possible.
But, there are other factors to consider. Maybe if you let a job run for a bit longer then your timeout, it would have completed, and would not have to rotate all the way around before finishing? Or, maybe a job is using the CPU but it is not doing any computation; instead, it is just waiting for the disc to return with data, which can take an eternity in CPU time. It doesn't seem like the CPU is being properly utilized in either of those situations.
2.3 Priority Queue Scheduling
Unix implements a hybrid strategy for scheduling that is a mix of round-robin scheduling and priority scheduling, where higher priority process go first. Essentially, there is a set of priority queues, from -20 to positive 19, and within each priority queue, process are run in a round robin fashion, but which queue executes is chosen based on priority preference. To avoid starvation, process can be moved from lower priority queues to higher priority queues when they have not run in awhile. This strategy is called multilevel queue scheduling and it is used in most Unix systems, and is very effective and highly efficient in the selection of next processes to run. It is also easily customized and adaptable for user needs, e.g., running tasks at the highest priority.
The gritty details of the Unix scheduler is a topic for your Operating Systems class, not systems programming, and instead we will concern ourselves with how we interact with the scheduler. There are two key questions to address:
- Given a process, how do we understand its priority?
- And, how do we change its priority?
2.3.1 How nice
a program is its priority
In Unix, the concept of priority is described as how nice a process is to another process. The nicer a process is, the more willing it is for another process to run in its stead. Thus, nicer processes have lower priority and meaner processes have higher priority.
The nice-ness scale is between -20, very mean, to 19, very nice. By default, all programs are born with niceness 0: just plain ambivalent. Some programs must then become less nice, such as interactive process that must meet some deadline. For example, you want your computer to play sound when you click on something, or, for that matter, your mouse to move when you move it. Those process shouldn't be scheduled with a low priority because it would degrade the user experience.
There are also other process running where being nice is fine. For example, many daemons, programs constantly running the background, do not need a lot of priority. They are not interactive, and not being scheduled right away is not going to upset any users.
2.3.2 Viewing Process State with ps
One way to view the nice level of a process is using the shell
command ps
, short for process state. The ps
command is an
amazing tool for inspecting the state of process, and below, we
use it to show the nice-ness of the process running on a typical
Linux system in the lab.
#> ps axo nice,comm,pid --sort nice NI COMMAND PID -20 cpuset 36 -20 khelper 37 -20 netns 39 -20 kintegrityd 43 -20 kblockd 44 -20 ata_sff 45 -20 md 47 -20 crypto 56 -20 kthrotld 65 -20 binder 69 -20 deferwq 88 -20 charger_manager 89 -20 devfreq_wq 90 -20 ext4-dio-unwrit 282 -20 rpciod 448 -20 nfsiod 455 -20 kpsmoused 549 -20 kvm-irqfd-clean 887 -20 iprt 1224 -20 hd-audio0 1353 -11 pulseaudio 1735 -10 krfcommd 639 0 init 1 (...)
These commands, with negative nice values, are vital to the
execution of the computer. For example, cpuset
is a process that
assists with loading of memory and process for execution, the
pulseaudio
systems is used for audio output, and the various
k*
commands are kernel process. Those really need to be run with
high priority.
In the middle, you'll find many process with priority 0.
(...) 0 ntpd 2040 0 sshd 13782 0 sshd 13789 0 bash 13790 0 emacs 14453 0 emacs 14712 0 emacs 15219 0 emacs 15273 0 emacs 15304 0 emacs 18917 0 kworker/7:0 21860 0 emacs 27417 0 crontab 28380 0 sh 28381 0 sensible-editor 28382 0 emacs23 28390 0 sshd 29814 0 sshd 29821 0 bash 29822 0 crontab 29894 0 sh 29895 (...)
These process are user-level process. I have a number of emacs
sessions open, which all have equal priority. The shell, bash
, is
also running with priority 0. Finally, at the bottom, are the nicest
of processes:
(...) 1 rtkit-daemon 1737 5 ksmd 52 19 khugepaged 53
The rtkit-daemon
is a process for realtime scheduling, but it
doesn't need to run that often, in relation to the process it is
scheduling. Thus, its just a bit nice. While khugepaged
is a process
that manages large memory pages, collapsing them whenever necessary,
but this operation doesn't have to occur for the system to function
properly, it's just nice when it does. Thus, the priority of this
process is quite low, being very nice.
2.3.3 Programming Priority with getpriority()
and nice()
When programming, we can inspect the current priority level of a
process using the getpriority()
system call, which has the
following function declaration:
#include <sys/time.h> #include <sys/resource.h> int getpriority(int which, int who);
The which
is for which kind of priority we are interested in. The
Unix system has multiple priority indicators, but we are only
concerned with the process level priority, or PRIO_PROCESS
. The
who
is to query the priority of a given process (or other identity)
by id. Using the value 0 for who
refers to the current process.
Here's a program that will print the current nice value for the process:
#include <stdlib.h> #include <stdio.h> #include <sys/types.h> #include <sys/resource.h> int main(){ int prio; prio = getpriority(PRIO_PROCESS, 0); //get the priority of this process printf("Priority: %d\n",prio); return 0; }
Of course, since the default priority of any process is 0, the output of this program is:
#> ./getpriority Priority: 0
We can change the priority of a process using the nice
command,
which exists as both a command line tool and system call. It's
purpose is to make programs more nice or less nice. For example,
on the command line, we can run the getpriority
with a nice value
of 10.
#> nice -n 10 ./getpriority Priority: 10
The output now indicates that the priority of the program is 10; it
is more nice, more willing to let other programs with lower priority
values run. We can try and set a negative priority value with
nice
:
#>nice -n -10 ./getpriority nice: setpriority: Permission denied Priority: 0
This will fail because only the privileged user can set a
negative nice value; however, sudo
can make this sandwich.
#> sudo nice -n -10 ./getpriority Password: Priority: -10
The sudo
command says, run this command as the privilege super user
… you do not have sudo
privilege on the lab machines.
In code, we can do the same thing using the nice()
system call:
#include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <sys/resource.h> int main(){ int prio; prio = getpriority(PRIO_PROCESS, 0); //get the priority of this process printf("Priority: %d\n",prio); if( nice(10) != 10){ //error checking is different perror("nice"); return 1; } prio = getpriority(PRIO_PROCESS, 0); //get the priority of this process printf("Priority: %d\n",prio); return 0; }
One thing to note about nice()
is that it returns the new priority
value, which may be negative. To error check, you must compare the
output of nice()
with the value you expect.
3 Process States
Now that we have a decent understanding about how process are scheduled and how to interact with the O.S. to affect the priority of scheduling, let us turn our attention to the different states of a process. This is an important discussion because a scheduler must consider both the priority of a process and if it ready to run in order to maximize the utilization of the CPU. It turns out that not all processes are ready to run, and that a process can be in a number of different states.
3.1 Running, Waiting, Blocked, and Undead
There are three main states of a process. It may be running, which means it is currently running on the CPU; it may be ready and waiting, which menas that it is ready to run immediately if selected; and, it may be blocked and waiting, which means that it is not ready to run because some other information is needed by the process before it can proceed. A scheduler will consider both the priority of the process as well as it's current state
A process can be blocked for a number of reasons. Perhaps the most common cause of a process being blocked is I/O, such as reading and writing from a device. Consider that a read/write is a system call, which means that a context switch must occur where the process stops running and the kernel starts running to complete the action. Conceptually, we'd like to think that the context switch occurs immediately, the process swaps out, the kernel swaps in, performs the action, and the kernel swaps out, and the process swaps back in and continues on — but it doesn't work that way.
Instead, once the system call is initiated, the process is blocked and waiting for the system call to complete. At this point, the process is no longer eligible to be run, i.e., it is not ready, and the scheduler can schedule another process to run. The next process may be the kernel, which could complete the system call and return the result and unblock the process, or it could just be another user process, in which case the blocked process remains so.
Another way to think about how I/O blocks a process is to consider
the shell. The shell prompts a user for input by reading from
stdin
, but it cannot continue until the user actually typed
something into the terminal and hits "enter." That could take an
eternity in compute cycles, and instead, the shell is blocking
on I/O waiting for user input. In the meanwhile, other processes
can run, and this is how the scheduler ensures high utilization.
Beyond I/O, a process can block for other reasons. For example, a
call to wait()
is a blocking system call. The parent process is
designated to block and will not proceed until the kernel
recognizes a state change in the child, which will unblock the
parent.
3.2 Undead processes: Zombies
To understand the idea behind undead process, you must first think
about how the system call wait()
functions. When a parent is
calling wait()
it is specifically blocking until the child
process has terminated (or changed state). That means, there is
some period of time between when the process actually terminates
and when the parent process becomes unblocked. The child process
still exists during that short window, but is not alive … it is
undead, or a zombie!
A zombie is a process that has terminated but has not been
wait()
'ed on by its parent. For example, consider the following
program:
int main(){ int i; for(i=0; i<10;i++){ if(fork() == 0){ //create a child _exit(0); // and exit } } while(1); //run forever }
The parent process will fork 10 children, who all just _exit()
immediately, but the parent does not call wait()
and instead
spin's in a while loop. The child processes have terminated and are
now in limbo, not running but waiting to be wait()
'ed on by the
parent. They are zombies! And we can see this state by looking at
the ps
output:
#> ps -o pid,comm,state PID COMMAND STATE 2362 bash S 2463 zombies R 2464 zombi <defunct> Z 2465 zombi <defunct> Z 2466 zombi <defunct> Z 2467 zombi <defunct> Z 2468 zombi <defunct> Z 2469 zombi <defunct> Z 2470 zombi <defunct> Z 2471 zombi <defunct> Z 2472 zombi <defunct> Z 2473 zombi <defunct> Z 2475 ps R
"Z" is for Zombie, while "R" is for running. The additional "<defunct>" indicator is what the UNIX system uses to indicate that the process is now a zombie.
The big problem with zombie processes is in long running services. If you have zombie leak, these process can stick around for the entirety of the parent program. This wastes resources, and eventually, a zombie apocalypse will occur, and you're kernel will crash as it slowly runs out of resources.
3.3 Orphan Process, init
, and daemons
At this point you may be wondering, what happens if the parent dies
before the children? Won't the children become zombies forever if
there is no parent to ever call wait()
?
The answer is no. If a parent dies before the children, those
children are now designated as orphans. The special process
init inherits all orphans and will call wait()
on them once
the children complete.
It is sometimes desirable for a process to become an orphan and be
inherited by init
, such as long running process that always
execute in the background. In Unix terms, such process are called
daemons, and if you look at all the processes running on your
unix computer, these are processes with names that end in "d." A
daemon is usually formed by having a process fork a child, which
executes the logic of the daemon, while the parent dies. The daemon
is now an orphan, which is inherited by init
, and the daemon
continues running happily in the background.