Lec 28: POSIX Threads
Table of Contents
1 What are Threads?
A thread is a portion of the program that can be scheduled independently of the larger program. It is a way to divide a single process into sub-runnable parts that each can be scheduled by the OS. This concept should already be somewhat familiar to you from the world of processes — processes are separate independent programs that can be scheduled independently — a thread is similar in that it provides a further division of schedulable units of a process that the O.S. can choose to run.
A single process can have multiple threads, and each thread runs independently. But resources are shared across threads; for example, threads share memory, unlike multi-processes which have their own memory layout without data being shared across memory layouts. Two threads have access to the same set of variables and can alter each other's variable values. Threads are also independently scheduled by the OS which means that a single program may actually use more than 100% of CPU resources on multi-processor machines.
2 Hello POSIX Threads?
Threading is traditionally provided via hardware and OS support, but
since there was great variance across hardware and software a
unifying setting was needed. In 1995, POSIX became the standard
interface for many system calls in UNIX including the threading
enviroemnt. So called POSIX threads, or pthreads
is the key model
for programming with threads in nearly every language and setup that
is built with the C language, such as Java and python and other high
level languages.
The lifecycle of a thread, much like a process, begins with creation. But, threads are not forked from a parent to create a child, instead they are simply created with a starting function as the entry point. A thread does not terminated, like a process, instead they are joined with the main thread or they are detached to run on their own until completion.
2.1 Creating a Thread
This new terminology will become more easily accessible via a hello world program:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <pthread.h> void * hello_fun(void * args){ printf("Hello World!\n"); return NULL; } int main(int argc, char * argv[]){ pthread_t thread; //thread identifier //create a new thread have it run the function hello_fun pthread_create(&thread, NULL, hello_fun, NULL); //wait until the thread completes pthread_join(thread, NULL); return 0; }
The way to follow this program is that the thread is created with
pthread_create()
and is set to run the function hello_fun()
. The
main thread, e.g., the original program, then attempts to join the
new thread with the main thread with pthread_join()
, which blocks
until successful. In the meanwhile, the new thread running the
function prints the all important "Hello World!" and returns,
joining with the main thread and then exiting.
They key function to consider:
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg);
This creates a new thread in the calling process. The thread is
identified by the type pthread_t
, and can have a set of
attributes. We will not use the attribute field for our
threads. Following is a function pointer start_routine
which takes
a pointer argument and returns a pointer. This is the function that
gets called when the program starts. The next argument arg
is a
reference to the argument to start_routine
.
2.2 Passing Arguments to a Thread
In the hello world example, we did not pass arguments, but let's look at a slightly more advanced example:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <pthread.h> void * hello_arg(void * args){ char * str = (char *) args; printf("%s", str); return NULL; } int main(int argc, char * argv[]){ char hello[] = "Hello World!\n"; pthread_t thread; //thread identifier //create a new thread that runs hello_arg with argument hello pthread_create(&thread, NULL, hello_arg, hello); //wait until the thread completes pthread_join(thread, NULL); return 0; }
This time we set up the function hello_arg
to take the argument
hello
, a string containing the phrase "Hello World!". Note that in
the startup routine, it must take a void *
argument, so we have to
cast
char * str = (char *) args;
but we know the argument was a string, so casting to a char *
is
safe here. The result of calling pthread_create()
is that the new
thread executes:
hello_arg(hello);
2.3 Joining Threads
Just like with processes, it is often useful to be able to identify
when a thread has completed or exited. The method for doing this is
to join the thread, which is a lot like the wait()
call for
processes. Joining is a blocking operation, and the calling thread
will not continue until the thread identified has changed states.
Typically, only the main thread calls join, but other threads can also join each other. All threads are automatically joined when the main thread terminates. For example the following code produces no output:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <pthread.h> void * hello_fun(){ printf("Hello World!\n"); return NULL; } int main(int argc, char * argv[]){ pthread_t thread; pthread_create(&thread, NULL, hello_fun, NULL); return 0; }
The program fails to join the new thread before the main thread terminated. As a result, the thread was automatically joined and did not have a chance to print "Hello World".
The fact that you do not have to join threads is actually an advantage because once the main thread dies the entire process dies. There are no zombies, no wasted resources. It all just comes to a hault.
2.4 Return values from threads
If we look more closely at the join function we see that a thread can pass a return value, much like an exit status for processes, except it can be of any type instead of just an integer.
int pthread_join(pthread_t thread, void **retval);
The retval
is a reference to a reference which will be set to the
return value. Let's see how it's used in a hello world program:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <pthread.h> void * hello_return(void * args){ //allocate a new string on the heap with "Hello World!" char * hello = strdup("Hello World!\n"); return (void *) hello; } int main(int argc, char * argv[]){ char * str; pthread_t thread; //thread identifier //create a new thread that runs hello_return without arugments pthread_create(&thread, NULL, hello_return, NULL); //wait until the thread completes, assign return value to str pthread_join(thread, (void **) &str); printf("%s", str); return 0; }
The start up routine returned a void *
which is really a reference
to the string containing "Hello World!". The reference to the
string is stored at str
and printed to the string.
If you look more closely at this program, you start to notice that
the fact that this is possible is why threads and process are so
different. The string hello
in the thread is allocated on the
heap and the reference to that string is provided in the main
program. That means the thread and the main thread are sharing
resources, namely memory. While two processes can share some
memory, it occurs naturally for threads and enables a lot of
powerful programming paradigms — it also creates new
challenges. What happens when two threads try and write to the same
memory at the same time? We'll explore that in a following lesson.
2.5 Compiling POSIX threads
To compile a program with POSIX thread, first you must include the header file:
#include <pthread.h>
This provides access to the underlying data types, like
pthread_t
, and function declarations. However, this is not enough
because pthreads are not part of the standard C library. Instead,
you must also link the pthreads library at compilation, much the
same way we lined the readline library:
#> clang -lpthread hello_thread.c -o hello_thread
Where the -lpthread
indicates to the compile to link the POSIX
thread library.
3 Threads and the OS
Now that you have a basic understanding of creating, using, and joining threads. Let's dive into the OS systems that underly threading. While we like to describe threads as a user level service, it is actually an OS level service. The POSIX environment just standardizes that interface so we can be consistent across operating systems.
In Unix, threads are implemented using the clone()
system
call. Which is a lot like fork()
but has more options, including
sharing memory and creating kernel threads. This enables threads to
have a close kernel-level relationship and from the OS perspective
to be scheduled and treated much like a process.
3.1 Identifying Threads: pid's vs. tid's vs pthread_t
When identifying a process, we use its process id, or pid. So far,
we've seen POSIX threads identified by their pthread_t
which is a
transparent identifier used by the POSIX overlay of the clone()
system call. While pthread_t
identifiers are necessary for
working with the pthread library they are not that human
usable. Instead, we will often assign a thread a user level
identifier, like a number – thread 1, thread 2, and etc.
Interestingly, a thread has a kernel level identifier much like a
process id call the thread id. The traditional method for
retrieving this is using the gettid()
system call. Usually, this
system call is not implemented in the standard C library, and
instead we have to use the syscall()
interface to gain
access. Let's look at a sample program:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/types.h> #include <sys/syscall.h> #include <pthread.h> //have to call syscall directly, no libc wrapper pid_t gettid(){ return (pid_t) syscall (SYS_gettid); } void * hello_fun(void * args){ //retrieve the thread_id pthread_t thread = pthread_self(); //print all identifying information printf("THREAD: TID:%d PID:%d PthreadID:%lu\n", gettid(), getpid(), thread); return NULL; } int main(int argc, char * argv[]){ pthread_t thread; //thread identifier //create a new thread have it run the function hello_fun pthread_create(&thread, NULL, hello_fun, NULL); //print all identifying information printf("MAIN: TID:%d PID:%d \n", gettid(), getpid()); //wait until the thread completes pthread_join(thread, NULL); return 0; }
The output of this program is:
#> ./hello_id_pthread MAIN: TID:21301 PID:21301 THREAD: TID:21302 PID:21301 PthreadID:140378868139776
You'll notice that a thread id is a LOT like a process id. In fact, for the main program, the thread id is process id. For the new thread, the thread id is different, but it retains the process id of the main program. This idea of process id's being like thread id's is why threads are schedule like normal programs.
3.2 Threads Running Like Processes
Let's a look at an example program to start this conversation:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <pthread.h> void * hello_fun(void * args){ while(1){ //busy wait } return NULL; } int main(int argc, char * argv[]){ pthread_t thread[4]; //thread identifier int i; //create 4 threads for(i = 0 ; i < 4; i++){ pthread_create(&thread[i], NULL, hello_fun, NULL); } //wait for all threads to finish for(i = 0 ; i < 4; i++){ pthread_join(thread[i], NULL); } return 0; }
This program creates 4 threads, all of which busy wait, and the main thread waits for the rest of the threads to complete. The question is, how much resources does this use? Let's run this program to inspect its resources usage:
#> ./busy_pthreads & [1] 21322 #> ps PID TTY TIME CMD 18344 pts/5 00:00:00 bash 21322 pts/5 00:00:06 busy_pthreads 21327 pts/5 00:00:00 ps
If we just run the program in the background and look at the ps
output, we see that the program is singular. No information about
the threads is provided. But, let's look at the top
output instead:
Tasks: 204 total, 1 running, 198 sleeping, 5 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 7862628k total, 4490728k used, 3371900k free, 194504k buffers Swap: 12753916k total, 12k used, 12753904k free, 3641808k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21322 aviv 20 0 39240 380 296 S 394 0.0 4:06.83 busy_pthreads 1 root 20 0 24604 2548 1352 S 0 0.0 0:01.81 init 2 root 20 0 0 0 0 S 0 0.0 0:00.19 kthreadd 3 root 20 0 0 0 0 S 0 0.0 0:08.67 ksoftirqd/0 (...)
Look at the column for %CPU
. You're reading that right 394%!!!
That's because each of the threads is scheduled individually and is
using resources like a process. The machine running the program is
multi-core, so each thread can run at the same time, and thus, one
program is using all 4 cores at 100%.
This might seem nuts, but we can see this a bit better when we expand the program into its constituent threads.
#> ps -L PID LWP TTY TIME CMD 18344 18344 pts/5 00:00:00 bash 21322 21322 pts/5 00:00:00 busy_pthreads 21322 21323 pts/5 00:03:50 busy_pthreads 21322 21324 pts/5 00:03:50 busy_pthreads 21322 21325 pts/5 00:03:50 busy_pthreads 21322 21326 pts/5 00:03:50 busy_pthreads 21333 21333 pts/5 00:00:00 ps
The -L
option for ps
will organize by thread id (LWP) and
process id (PID), and now you can start to see that the program is
actually running as 5, the 4 threads and the 1 more for the main
program. If we look at the top
output using the H
option (hit
H
when top is up).
Cpu(s): 0.1%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 7862628k total, 4490620k used, 3372008k free, 194552k buffers Swap: 12753916k total, 12k used, 12753904k free, 3641820k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21323 aviv 20 0 39240 380 296 R 100 0.0 6:34.12 busy_pthreads 21324 aviv 20 0 39240 380 296 R 100 0.0 6:34.13 busy_pthreads 21325 aviv 20 0 39240 380 296 R 100 0.0 6:34.10 busy_pthreads 21326 aviv 20 0 39240 380 296 R 100 0.0 6:34.10 busy_pthreads 21322 aviv 20 0 39240 380 296 S 0 0.0 0:00.00 busy_pthreads
We can see that each of the reads is using 100% of single core and
the main thread is blocking (S) waiting to join each of the other
threads. top
even goes so far as using the tid
as the pid
for
each thread.