IC221: Systems Programming (SP17)


Home Policy Calendar Units Assignments Resources

Unit 9: Threads

Table of Contents

1 What are Threads?

A thread is a portion of the program that can be scheduled independently of the larger program. It is a way to divide a single process into sub-runnable parts that each can be scheduled by the OS. This concept should already be somewhat familiar to you from the world of processes — processes are separate independent programs that can be scheduled independently — a thread is similar in that it provides a further division of schedulable units of a process that the O.S. can choose to run.

A single process can have multiple threads, and each thread runs independently. But resources are shared across threads; for example, threads share memory, unlike multi-processes which have their own memory layout without data being shared across memory layouts. Two threads have access to the same set of variables and can alter each other's variable values. Threads are also independently scheduled by the OS which means that a single program may actually use more than 100% of CPU resources on multi-processor machines.

2 Hello POSIX Threads?

Threading is traditionally provided via hardware and OS support, but since there was great variance across hardware and software a unifying setting was needed. In 1995, POSIX became the standard interface for many system calls in UNIX including the threading enviroemnt. So called POSIX threads, or pthreads is the key model for programming with threads in nearly every language and setup that is built with the C language, such as Java and python and other high level languages.

The lifecycle of a thread, much like a process, begins with creation. But, threads are not forked from a parent to create a child, instead they are simply created with a starting function as the entry point. A thread does not terminated, like a process, instead they are joined with the main thread or they are detached to run on their own until completion.

2.1 Creating a Thread

This new terminology will become more easily accessible via a hello world program:

/* hello_pthread.c*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <pthread.h>


void * hello_fun(void * args){

  printf("Hello World!\n");

  return NULL;
}

int main(int argc, char * argv[]){

  pthread_t thread;  //thread identifier

  //create a new thread have it run the function hello_fun
  pthread_create(&thread, NULL, hello_fun, NULL);

  //wait until the thread completes
  pthread_join(thread, NULL);

  return 0;
}

The way to follow this program is that the thread is created with pthread_create() and is set to run the function hello_fun(). The main thread, e.g., the original program, then attempts to join the new thread with the main thread with pthread_join(), which blocks until successful. In the meanwhile, the new thread running the function prints the all important "Hello World!" and returns, joining with the main thread and then exiting.

They key function to consider:

int pthread_create(pthread_t *thread, const pthread_attr_t *attr,
                         void *(*start_routine) (void *), void *arg);

This creates a new thread in the calling process. The thread is identified by the type pthread_t, and can have a set of attributes. We will not use the attribute field for our threads. Following is a function pointer start_routine which takes a pointer argument and returns a pointer. This is the function that gets called when the program starts. The next argument arg is a reference to the argument to start_routine.

2.2 Passing Arguments to a Thread

In the hello world example, we did not pass arguments, but let's look at a slightly more advanced example:

/* hello_args_pthread.c*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <pthread.h>


void * hello_arg(void * args){

  char * str = (char *) args;

  printf("%s", str);

  return NULL;
}

int main(int argc, char * argv[]){

  char hello[] = "Hello World!\n";

  pthread_t thread;  //thread identifier

  //create a new thread that runs hello_arg with argument hello
  pthread_create(&thread, NULL, hello_arg, hello);

  //wait until the thread completes
  pthread_join(thread, NULL);

  return 0;
}

This time we set up the function hello_arg to take the argument hello, a string containing the phrase "Hello World!". Note that in the startup routine, it must take a void * argument, so we have to cast

char * str = (char *) args;

but we know the argument was a string, so casting to a char * is safe here. The result of calling pthread_create() is that the new thread executes:

hello_arg(hello);

2.3 Joining Threads

Just like with processes, it is often useful to be able to identify when a thread has completed or exited. The method for doing this is to join the thread, which is a lot like the wait() call for processes. Joining is a blocking operation, and the calling thread will not continue until the thread identified has changed states.

Typically, only the main thread calls join, but other threads can also join each other. All threads are automatically joined when the main thread terminates. For example the following code produces no output:

/* hello_pthread_bad.c*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <pthread.h>


void * hello_fun(){

  printf("Hello World!\n");

  return NULL;
}

int main(int argc, char * argv[]){

  pthread_t thread;

  pthread_create(&thread, NULL, hello_fun, NULL);

  return 0;
}

The program fails to join the new thread before the main thread terminated. As a result, the thread was automatically joined and did not have a chance to print "Hello World".

The fact that you do not have to join threads is actually an advantage because once the main thread dies the entire process dies. There are no zombies, no wasted resources. It all just comes to a hault.

2.4 Return values from threads

If we look more closely at the join function we see that a thread can pass a return value, much like an exit status for processes, except it can be of any type instead of just an integer.

int pthread_join(pthread_t thread, void **retval);

The retval is a reference to a reference which will be set to the return value. Let's see how it's used in a hello world program:

/* hello_return_pthread.c*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <pthread.h>


void * hello_return(void * args){
  //allocate a new string on the heap with "Hello World!"
  char * hello = strdup("Hello World!\n");

  return (void *) hello;
}

int main(int argc, char * argv[]){

  char * str;

  pthread_t thread;  //thread identifier

  //create a new thread that runs hello_return without arugments
  pthread_create(&thread, NULL, hello_return, NULL);

  //wait until the thread completes, assign return value to str
  pthread_join(thread, (void **) &str);

  printf("%s", str);

  return 0;
}

The start up routine returned a void * which is really a reference to the string containing "Hello World!". The reference to the string is stored at str and printed to the string.

If you look more closely at this program, you start to notice that the fact that this is possible is why threads and process are so different. The string hello in the thread is allocated on the heap and the reference to that string is provided in the main program. That means the thread and the main thread are sharing resources, namely memory. While two processes can share some memory, it occurs naturally for threads and enables a lot of powerful programming paradigms — it also creates new challenges. What happens when two threads try and write to the same memory at the same time? We'll explore that in a following lesson.

2.5 Compiling POSIX threads

To compile a program with POSIX thread, first you must include the header file:

#include <pthread.h>

This provides access to the underlying data types, like pthread_t, and function declarations. However, this is not enough because pthreads are not part of the standard C library. Instead, you must also link the pthreads library at compilation, much the same way we lined the readline library:

#> clang -lpthread hello_thread.c -o hello_thread

Where the -lpthread indicates to the compile to link the POSIX thread library.

3 Threads and the OS

Now that you have a basic understanding of creating, using, and joining threads. Let's dive into the OS systems that underly threading. While we like to describe threads as a user level service, it is actually an OS level service. The POSIX environment just standardizes that interface so we can be consistent across operating systems.

In Unix, threads are implemented using the clone() system call. Which is a lot like fork() but has more options, including sharing memory and creating kernel threads. This enables threads to have a close kernel-level relationship and from the OS perspective to be scheduled and treated much like a process.

3.1 Identifying Threads: pid's vs. tid's vs pthread_t

When identifying a process, we use its process id, or pid. So far, we've seen POSIX threads identified by their pthread_t which is a transparent identifier used by the POSIX overlay of the clone() system call. While pthread_t identifiers are necessary for working with the pthread library they are not that human usable. Instead, we will often assign a thread a user level identifier, like a number – thread 1, thread 2, and etc.

Interestingly, a thread has a kernel level identifier much like a process id call the thread id. The traditional method for retrieving this is using the gettid() system call. Usually, this system call is not implemented in the standard C library, and instead we have to use the syscall() interface to gain access. Let's look at a sample program:

/* hello_id_pthread.c*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <unistd.h>
#include <sys/types.h>
#include <sys/syscall.h>

#include <pthread.h>

//have to call syscall directly, no libc wrapper
pid_t gettid(){
  return (pid_t) syscall (SYS_gettid);
}

void * hello_fun(void * args){

  //retrieve the thread_id
  pthread_t thread = pthread_self();

  //print all identifying information
  printf("THREAD: TID:%d PID:%d PthreadID:%lu\n", gettid(), getpid(), thread);

  return NULL;
}

int main(int argc, char * argv[]){

  pthread_t thread;  //thread identifier


  //create a new thread have it run the function hello_fun
  pthread_create(&thread, NULL, hello_fun, NULL);

  //print all identifying information
  printf("MAIN: TID:%d PID:%d \n", gettid(), getpid());

  //wait until the thread completes
  pthread_join(thread, NULL);

  return 0;
}

The output of this program is:

#> ./hello_id_pthread 
MAIN: TID:21301 PID:21301 
THREAD: TID:21302 PID:21301 PthreadID:140378868139776

You'll notice that a thread id is a LOT like a process id. In fact, for the main program, the thread id is process id. For the new thread, the thread id is different, but it retains the process id of the main program. This idea of process id's being like thread id's is why threads are schedule like normal programs.

3.2 Threads Running Like Processes

Let's a look at an example program to start this conversation:

/* busy_thread.c*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <pthread.h>


void * hello_fun(void * args){

  while(1){
    //busy wait
  }

  return NULL;
}

int main(int argc, char * argv[]){

  pthread_t thread[4];  //thread identifier
  int i;

  //create 4 threads
  for(i = 0 ; i < 4; i++){
    pthread_create(&thread[i], NULL, hello_fun, NULL);
  }

  //wait for all threads to finish
  for(i = 0 ; i < 4; i++){
    pthread_join(thread[i], NULL);
  }

  return 0;
}

This program creates 4 threads, all of which busy wait, and the main thread waits for the rest of the threads to complete. The question is, how much resources does this use? Let's run this program to inspect its resources usage:

#> ./busy_pthreads &
[1] 21322
#> ps
  PID TTY          TIME CMD
18344 pts/5    00:00:00 bash
21322 pts/5    00:00:06 busy_pthreads
21327 pts/5    00:00:00 ps

If we just run the program in the background and look at the ps output, we see that the program is singular. No information about the threads is provided. But, let's look at the top output instead:

Tasks: 204 total,   1 running, 198 sleeping,   5 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   7862628k total,  4490728k used,  3371900k free,   194504k buffers
Swap: 12753916k total,       12k used, 12753904k free,  3641808k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                     
21322 aviv      20   0 39240  380  296 S  394  0.0   4:06.83 busy_pthreads
    1 root      20   0 24604 2548 1352 S    0  0.0   0:01.81 init
    2 root      20   0     0    0    0 S    0  0.0   0:00.19 kthreadd
    3 root      20   0     0    0    0 S    0  0.0   0:08.67 ksoftirqd/0       
(...)

Look at the column for %CPU. You're reading that right 394%!!! That's because each of the threads is scheduled individually and is using resources like a process. The machine running the program is multi-core, so each thread can run at the same time, and thus, one program is using all 4 cores at 100%.

This might seem nuts, but we can see this a bit better when we expand the program into its constituent threads.

#> ps -L
  PID   LWP TTY          TIME CMD
18344 18344 pts/5    00:00:00 bash
21322 21322 pts/5    00:00:00 busy_pthreads
21322 21323 pts/5    00:03:50 busy_pthreads
21322 21324 pts/5    00:03:50 busy_pthreads
21322 21325 pts/5    00:03:50 busy_pthreads
21322 21326 pts/5    00:03:50 busy_pthreads
21333 21333 pts/5    00:00:00 ps

The -L option for ps will organize by thread id (LWP) and process id (PID), and now you can start to see that the program is actually running as 5, the 4 threads and the 1 more for the main program. If we look at the top output using the H option (hit H when top is up).

Cpu(s):  0.1%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   7862628k total,  4490620k used,  3372008k free,   194552k buffers
Swap: 12753916k total,       12k used, 12753904k free,  3641820k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                     
21323 aviv      20   0 39240  380  296 R  100  0.0   6:34.12 busy_pthreads
21324 aviv      20   0 39240  380  296 R  100  0.0   6:34.13 busy_pthreads
21325 aviv      20   0 39240  380  296 R  100  0.0   6:34.10 busy_pthreads
21326 aviv      20   0 39240  380  296 R  100  0.0   6:34.10 busy_pthreads
21322 aviv      20   0 39240  380  296 S    0  0.0   0:00.00 busy_pthreads

We can see that each of the reads is using 100% of single core and the main thread is blocking (S) waiting to join each of the other threads. top even goes so far as using the tid as the pid for each thread.

4 Threads and shared resources

In the last lesson, we explored threads and their usefulness for concurrent programming. Threads are a way to divide the resources of a process so that individual parts can be scheduled independently. We also described this as user level parallelism, as oppose to O.S. level parallelism which is provide by processing.

There are many benefits to user level parallelism, such as simpler programming structure and design. User level parallelism also means that each thread shares the same resources, including memory. For example, consider the following small program:

#include <stdio.h>
#include <stdlib.h>

#include <unistd.h>
#include <sys/types.h>

#include <pthread.h>

int shared = 10;

void * fun(void * args){

  time_t start = time(NULL);
  time_t end = start+3; //run for 3 seconds

  while(time(NULL) < end){
    shared++;
  }

  return NULL;
}

int main(){
  pthread_t thread_id;

  pthread_create(&thread_id, NULL, fun, NULL);

  pthread_join(thread_id, NULL);

  printf("shared: %d\n", shared);

  return 0;
}

The thread executing fun() will increment the shared variable shared for 3 seconds. The main thread will block on the join and print the result. This is only possible because both threads, the worker thread and the main thread share memory. While a contrived example, it is demonstrative of the capabilities of shared memory and your experience with multi-processing and piping should help you appreciate the benefit.

However, shared resources comes at a cost. The above example has a single thread working on a single variable: what happens if we have multiple threads trying to manipulate a single memory address? You might think that each thread will be able to act without interference, but computers are finicky and the thread scheduling routine is not transparent. A thread may be just in the middle of an operation and then be interupted by another thread that is also operating on the same data. The result is that the data can become inconsistent and neither thread has the true representation of the data.

To solve these problems, we need a new mechanism to ensure that all critical operations are atomic or mutually exclusive, that is, an operation completes full within one thread without the possibility of another thread preempting that process. In today's lessons, we are going to learn how to add mutual exclusion to our programs to avoid inconsistencies and how to avoid using these tools in ways to hamper program progress.

5 Locking

The concept of resource locking is a huge area of study in computer and operating system research. The big idea is that when a program enters a critical section or a set of code that must fully complete without interruption, the program holds a lock that only one program (or thread) can hold a time. In this way only one program can ever be in the critical section at any time.

5.1 Why we need locks?

Before we get into the details of programming with locks. Let's look at an example of why we need locks. Consider the program below which is simulating a bank. The user selects the number of threads from the command line and then each thread will manipulate an account balance by randomly choosing an amount to credit or debit the account while tracking the total number of debits and credits.

/* bank.c*/
int balance = INIT_BALANCE;


int credits = 0;
int debits = 0;

void * transactions(void * args){
  int i,v;

  for(i=0;i<NUM_TRANS;i++){

    //choose a random value
    v = (int) random() % NUM_TRANS;

    //randomnly choose to credit or debit
    if( random()% 2){
      //credit

      balance = balance + v;
      credits = credits + v;

    }else{
      //debit

      balance = balance - v;
      debits = debits + v;

    }

  }

  return 0;
}

int main(int argc, char * argv[]){

  int n_threads,i;
  pthread_t * threads;

  //error check
  if(argc < 2){
    fprintf(stderr, "ERROR: Require number of threads\n");
    exit(1);
  }

  //convert string to int
  n_threads = atol(argv[1]);

  //error check
  if(n_threads <= 0){
    fprintf(stderr, "ERROR: Invalivd value for number of threads\n");
    exit(1);
  }

  //allocate array of thread identifiers
  threads = calloc(n_threads, sizeof(pthread_t));

  //start all threads
  for(i=0;i<n_threads;i++){
    pthread_create(&threads[i], NULL, transactions, NULL);
  }

  //join all threads
  for(i=0;i<n_threads;i++){
    pthread_join(threads[i], NULL);
  }

  printf("\tCredits:\t%d\n", credits);
  printf("\t Debits:\t%d\n\n", debits);
  printf("%d+%d-%d=   \t%d\n", INIT_BALANCE,credits,debits,
         INIT_BALANCE+credits-debits);
  printf("\t Balance:\t%d\n", balance);

  //free array
  free(threads);

  return 0;
}

Once the program completes, it prints out the total amount credited and debited and the current balance. The program also prints out what the balance should be if we consider credits, debits and initial balance. Here is two runs of the program with 2 worker threads:

#> ./bank 2
	Credits:	4707
	 Debits:	4779

1000+4707-4779=   	928
	 Balance:	942

#> ./bank 2
	Credits:	4868
	 Debits:	4465

1000+4868-4465=   	1403
	 Balance:	1587

What happened? In neither case did the balance at the end match the expectation based on the total debits and credits. How could this be?

Each thread is scheduled independently and can be interrupted and preempted arbitrarily by the OS. This can even occur in the middle of an operation. Consider just the simple bit of code to add numbers:

balance = balance + v

To assign the final balance, two things must happen:

  1. balance + v must be computed and stored in a temp variable
  2. The temp variable must be assigned to balance

What happens if the thread did step (1) but was preempted by the other thread before completing step (2). In the meanwhile, the other thread completes (1) and (2), and finally the original thread is scheduled back to do step (2). Consider the diagram below:

      Thread 1                 Thread 2        balance     v     tmp

(1) bal + v -> tmp                               1000      5     1005

                           (1) bal + v -> tmp    1000      3     1003
                           (2) bal <- tmp        1003
(2) bal <- tmp                                   1005            1005

Since the incremental work of Thread 1 was assigned after the work of Thread 2, it is if Thread 2 never did the work in the first place. This is why we get such inconsistent results with the balance, and further, it's more than just the balance value, it not clear that even credit or debit counters are consisten either.

To solve this problem, we have to identify the critical sections of the program, that is, sections of code that only one thread can execute at a time. Once a critical section is identified, we use a shared variables to lock that section. Only one thread can hold the lock, so only one thread executes the critical section at the time.

5.2 Naive Locking

While it may simple at first creating a lock is no small task. The locking procedure must be an atomic operation where the whole process of testing if the lock is available and acquiring the lock occur singularly. To demonstrate why this must be the case, consider the following code where we implement a naive lock, which is just a variable integer, either set to 0 for available or 1 for locked.

char USAGE[] = "naive_lock n_threads\n"
  "USAGE: run n threads with a naive lock\n";

int lock = 0; //0 for unlocked, 1 for locked

int shared = 0; //shared variable

void * incrementer(void * args){
  int i;

  for(i=0;i<100;i++){

    //check lock
    while(lock > 0);  //spin until unlocked

    lock = 1; //set lock

    shared++; //increment

    lock = 0; //unlock
  }

  return NULL;
}


int main(int argc, char * argv[]){
  pthread_t * threads;
  int n,i;

  if(argc < 2){
    fprintf(stderr, "ERROR: Invalid number of threads\n");
    exit(1);
  }

  //convert argv[1] to a long
  if((n = atol(argv[1])) == 0){
    fprintf(stderr, "ERROR: Invalid number of threads\n");
    exit(1);
  }

  //allocate array of pthread_t identifiers
  threads = calloc(n,sizeof(pthread_t));

  //create n threads
  for(i=0;i<n;i++){
    pthread_create(&threads[i], NULL, incrementer, NULL);
  }

  //join all threads
  for(i=0;i<n;i++){
    pthread_join(threads[i], NULL);
  }

  //print shared value and result
  printf("Shared: %d\n",shared);
  printf("Expect: %d\n",n*100);

  return 0;
}

If you follow the code, you see that the variable lock is set to 0 for unlocked and 1 for locked. Then each thread will spin until the variable is set to 0 for unlocked, that is while(lock > 0);

If we run this program a few times with 2 threads, it might seem to work …

#> ./naive_lock 2
Shared: 200
Expect: 200
#> ./naive_lock 2
Shared: 200
Expect: 200
#> ./naive_lock 2
Shared: 200
Expect: 200
#> ./naive_lock 2
Shared: 200
Expect: 200
#> ./naive_lock 2
Shared: 200
Expect: 200
#> ./naive_lock 2
Shared: 172
Expect: 200

… until all of a sudden it does not work. What happened?

The problem is the testing of the lock and the acquisition of the lock are not atomic operations. If we look at that sequence of code:

//check lock
while(lock > 0);  //spin until unlocked
                                         <----------- What if threads swap here?
lock = 1; //set lock                                  Then two (or more) threads could have the lock.

What we need is a specialized operation for testing and setting that all happens at once, and fortunately, this is provided to us by pthread library.

5.3 mutex locks

The term mutex stands for a mutually exclusion, which is a fancy name for lock. A mutex is not a standard variable, instead, it is guaranteed to be atomic in operation. The act of acquiring a lock cannot be interrupted.

In the pthread library the type of a mutex is:

pthread_mutext_t mutex;

You must first initialize a mutex before you can use it:

 pthread_mutex_init(&mutex, NULL);
 //                          ^
//                      will not use second arg of init

You then can acquire and unlock a mutex:

pthread_mutex_lock(&mutex);

/* critical section */

pthread_mutex_unlock(&mutex);

Both of these operations are atomic, and the locking function is blocking. That is, once a thread tries to acquire a lock, it will be suspended execution until the lock is available. Further, locks are ordered in their acquisition; the order in which threads try to acquire the lock determine which thread gets the lock next. How that works is a discussion for your operating systems class, and more generally, the locking functionality implantation details are heavily supported by the operating system and the hardware.

Finally, creating a mutex allocates memory. So we have to deallocate the mutex, or destroy it:

pthread_mutex_destroy(&mutex);

6 Locking Strategies

There are two strategies for locking critical sections. In this part, we will refer back the bank.c program at the top of the lessons where we had three shared variables, balance, credits, debits, and multiple threads depositing and withdrawing random funds from the balance. As we identified, there was inconsistencies in the balance after running the program, and those inconsistencies are present with all the shared variables.

6.1 Coarse Locking

One locking strategy for locking a program is to use a single lock for the critical section to protect the entire critical section. This is called coarse locking because it is a large granularity. In code, this might look like this for the critical section:

/*bank_coarse_lock.c*/
int balance = INIT_BALANCE;


int credits = 0;
int debits = 0;

pthread_mutex_t lock;

void * transactions(void * args){
  int i,v;

  for(i=0;i<NUM_TRANS;i++){

    pthread_mutex_lock(&lock); //acquire lock

    //choose a random value
    v = (int) random() % NUM_TRANS;

    //randomnly choose to credit or debit
    if( random()% 2){
      //credit

      balance = balance + v;
      credits = credits + v;

    }else{
      //debit

      balance = balance - v;
      debits = debits + v;

    }

    pthread_mutex_unlock(&lock); //release lock
  }

  return 0;
}

With this locking strategy, consistency is achieved. All the critical code executes atomically and the output is consistent:

#> ./bank_coarse_lock 3
	Credits:	7247
	 Debits:	7462

1000+7247-7462=   	785
	 Balance:	785

6.2 Fine Locking

While coarse locking is a reasonable choice, it is inefficient. We looks some paralleism because not all parts of the critical section are critical to each other. For example, consider that the variable credits and debits are used exclusively of each other; each thread only performs a credit or debit but not both. Maybe it would be worth while to do more fine grain locking.

Consider the following changes:

int balance = INIT_BALANCE;


int credits = 0;
int debits = 0;

pthread_mutex_t b_lock,c_lock,d_lock;

void * transactions(void * args){
  int i,v;

  for(i=0;i<NUM_TRANS;i++){

    //choose a random value
    v = (int) random() % NUM_TRANS;

    //randomnly choose to credit or debit
    if( random()% 2){
      //credit

      pthread_mutex_lock(&b_lock);
      balance = balance + v;
      pthread_mutex_unlock(&b_lock);


      pthread_mutex_lock(&c_lock);
      credits = credits + v;
      pthread_mutex_unlock(&c_lock);

    }else{
      //debit

      pthread_mutex_lock(&b_lock);
      balance = balance - v;
      pthread_mutex_unlock(&b_lock);

      pthread_mutex_lock(&d_lock);
      debits = debits + v;
      pthread_mutex_unlock(&d_lock);

    }
  }

  return 0;
}

In this example, we use fine locking because each of the shared variables are independently locked only when used. This allows for more parallelism at the cost of complexity. If we compare the run times of the two strategies we see that, yes, fine grain locking is marginally faster and much faster on larger load sets.

6.3 Deadlocks

One consequence of fine grain locking is the potential for deadlocks. A deadlock occurs when two threads each hold a resource the other is waiting on. Let's look at a naive example:

#include <stdio.h>
#include <stdlib.h>

#include <pthread.h>

pthread_mutex_t lock_a, lock_b;

void * fun_1(void * arg){
  int i;
  for (i = 0 ; i< 10000 ; i++){

    pthread_mutex_lock(&lock_a); //lock a then b
    pthread_mutex_lock(&lock_b);

    //CRITICAL SECTION

    pthread_mutex_unlock(&lock_a);
    pthread_mutex_unlock(&lock_b);

  }

  return NULL;
}

void * fun_2(void * arg){
  int i;
  for (i = 0 ; i< 10000 ; i++){

    pthread_mutex_lock(&lock_b); //lock b then a
    pthread_mutex_lock(&lock_a);

    //CRITICAL SECTION

    pthread_mutex_unlock(&lock_b);
    pthread_mutex_unlock(&lock_a);

  }

  return NULL;
}

int main(){

  pthread_t thread_1,thread_2;

  pthread_mutex_init(&lock_a, NULL);
  pthread_mutex_init(&lock_b, NULL);

  pthread_create(&thread_1, NULL, fun_1, NULL);
  pthread_create(&thread_2, NULL, fun_2, NULL);

  pthread_join(thread_1, NULL);
  pthread_join(thread_2, NULL);

  return 0;
}

In this code we have two threads and two locks. One thread acquires lock_a and then lock_b while the other thread acquires lock_b and then lock_a. If we run this program we find that nothing happens; the program hangs. This is because of a deadlock.

Consider what happens when one each thread run such that one holds lock_a and the other holds lock_b, then one thread is waiting on the lock the other thread holds. Both threads are now blocking on acquiring the resources, and the main thread is blocking trying to join the worker threads. Nothing is running: deadlock.

7 Avoiding Deadlocks

Deadlocks are very hard to debug and are a big part of concurrent programming. The best way to avoid deadlocks is to ensure that all locking for all threads occur in the same order. That may seam like a simple task, but it is not because the notion of ordering is not always clear. Let's look at a slightly more complicated example of the bank where instead of a single account, we have multiple accounts. A thread randonly chooses two accounts and transfer money between them.

/*multicccount.c*/
   #define INIT_BALANCE 1000
#define NUM_TRANS 100
#define MAX_TRANS 10
#define NUM_ACCOUNTS 10

//create an account
struct account{
  int balance;
  int credits;
  int debits;
};

//array of accounts
struct account accounts[NUM_ACCOUNTS];


void * transactions(void * args){
  int i,v;
  int a1,a2;

  for(i=0;i<NUM_TRANS;i++){

    //choose a random value
    v = (int) random() % NUM_TRANS;

    //choose two random accounts
    a1 = (int) random() % NUM_ACCOUNTS;
    while((a2 = (int) random() % NUM_ACCOUNTS) == a1);

    //transfer from a1 to a2
    accounts[a1].balance += v;
    accounts[a1].credits  += v;

    accounts[a2].balance -= v;
    accounts[a2].debits += v;
  }

  return 0;
}

If we run this program, since we have no locking, we run into the same situation as before, inconsistent output. Let's look at how we might want to add locks to the program. The simplest means to do this is to use fine locking at the level accounts.

7.1 Lock Acquisition Issues

By adding a lock to each account, we now require a worker thread to acquire the account lock on both accounts before executing a transfer:

//create an account
struct account{
  int balance;
  int credits;
  int debits;
  pthread_mutex_t lock; //lock for the account
};

//array of accounts
struct account accounts[NUM_ACCOUNTS];


void * transactions(void * args){
  int i,v;
  int a1,a2;

  for(i=0;i<NUM_TRANS;i++){

    //choose a random value
    v = (int) random() % NUM_TRANS;

    //choose two random accounts
    a1 = (int) random() % NUM_ACCOUNTS;
    while((a2 = (int) random() % NUM_ACCOUNTS) == a1);

    //transfer from a1 to a2
    //    printf("Transfer %d->%d : %d \n", a1,a2,v);

    //acquire lock for a1 then a2
    pthread_mutex_lock(&accounts[a1].lock);
    pthread_mutex_lock(&accounts[a2].lock);


    accounts[a1].balance += v;
    accounts[a1].credits  += v;

    accounts[a2].balance -= v;
    accounts[a2].debits += v;

    pthread_mutex_unlock(&accounts[a1].lock);    
    pthread_mutex_unlock(&accounts[a2].lock);

  }

  return 0;
}

This might seem to work, but this locking strategy actually will form a deadlock. Why? The locking strategy is to lock a1 then a2 always in that order, how could a deadlock occur?

Consider that the accounts are chosen randomly from all accounts and that there are multiple threads running. The variables a1 and a2 are standin's for account numbers or indexes into the array of accounts. What if in two threads the following happens?

Thread 1        Thread 2
---------      ----------
a1 = 5;          a1 = 3;
a2 = 3;          a2 = 5;

Then when we look at lock ordering, we find the following situation and deadlock might occur:

Thread 1        Thread 2
---------      ----------
lock( 5 )       
                lock( 3 )
lock( 3 )       
                lock( 5 )


        DEADLOCK

We find ourselves back where we started, each thread is now waiting on a lock held by another thread. Progress cannot be made and a deadlock occurs. This happens because we did not apply a proper order to locking: How do we apply a proper ordering?

7.2 Ordering Resources to Avoid Deadlock

One way to apply a proper ordering is to leverage the fact that the account indexes are already ordered. What if we always lock accounts in account order? Then in above example, both threads would do lock(3) then lock(5). In code, this would look like this:

void * transactions(void * args){
  int i,v;
  int a1,a2;

  for(i=0;i<NUM_TRANS;i++){

    //choose a random value
    v = (int) random() % NUM_TRANS;

    //choose two random accounts
    a1 = (int) random() % NUM_ACCOUNTS;
    while((a2 = (int) random() % NUM_ACCOUNTS) == a1);

    //transfer from a1 to a2

    if(a1 < a2){
      pthread_mutex_lock(&accounts[a1].lock);
      pthread_mutex_lock(&accounts[a2].lock);
    }else{
      pthread_mutex_lock(&accounts[a2].lock);
      pthread_mutex_lock(&accounts[a1].lock);
    }

    accounts[a1].balance += v;
    accounts[a1].credits  += v;

    accounts[a2].balance -= v;
    accounts[a2].debits += v;

    if(a1 < a2){
      pthread_mutex_unlock(&accounts[a2].lock);
      pthread_mutex_unlock(&accounts[a1].lock);
    }else{
      pthread_mutex_unlock(&accounts[a1].lock);
      pthread_mutex_unlock(&accounts[a2].lock);
    }
  }

  return 0;
}

Now, everything is consistent and deadlock is avoided.

8 The Dining Philosophers

The Dining philosophers is a classic problem of deadlock avoidance. The story goes that a philosopher does only two things in life, eat or philosophize. This particular group of philosophers sits at a large round table with food in front of each of them, and between each philosopher is a single fork. When a philosopher wishes to eat, they must acquire the fork to their right and their left, but those forks are also shared with the adjacent philosopher. If an adjacent philosopher is currently eating, then the hungry philosopher must wait for the other to finish before acquiring the fork. Once a fork is acquired, a philosopher must hold on to it until they are finished eating.

Let's look at a code example of how this might be implemented using threading:

#include <stdio.h>
#include <stdlib.h>

#include <unistd.h>
#include <sys/types.h>

#include <pthread.h>

#define NUM_PHIL 10
#define ROUNDS 1000

struct fork{
  int id;
  pthread_mutex_t * lock; //the lock on the fork
};


struct philosopher{
  int id;               //id of this philosopher
  struct fork * left;   //the left fork
  struct fork * right;  //the right fork
  pthread_t thread_id; //POSIX thread id
};

struct fork forks[NUM_PHIL];
struct philosopher philosophers[NUM_PHIL];

void  * eat(void * args){
  struct philosopher * phil  = (struct philosopher *) args;
  int i;

  for (i = 0;i<ROUNDS;i++){
    //TODO: fix locking to avoid deadlock 

    printf("Philosopher %d ACQUIRING left fork %d\n", phil->id, phil->left->id);
    pthread_mutex_lock(phil->left->lock);

    printf("Philosopher %d ACQUIRING right fork %d\n", phil->id, phil->right->id);
    pthread_mutex_lock(phil->right->lock);

    printf("Philosopher %d EATING with fork %d and %d\n", phil->id, phil->left->id, phil->right->id);
    usleep(250); //sleep for a quarter second

    printf("Philosopher %d RELEASING fork %d\n", phil->id, phil->left->id);
    pthread_mutex_unlock(phil->left->lock);
    printf("Philosopher %d RELEASING fork %d\n", phil->id, phil->right->id);
    pthread_mutex_unlock(phil->right->lock);
  }

  return NULL;
}

int main(){

  int i;

  printf("Initializing forks: %d\n", NUM_PHIL-1);
  //initialize the forks
  for(i=0;i<NUM_PHIL;i++){
    forks[i].id = i;
    forks[i].lock = malloc(sizeof(pthread_mutex_t));
    pthread_mutex_init(forks[i].lock, NULL);
  }

  printf("Initializing philosophers: %d\n", NUM_PHIL);
  //initialize the philosophers
  for(i=0;i<NUM_PHIL;i++){
    philosophers[i].id = i;
    if( i == 0){
      philosophers[i].left = &forks[(NUM_PHIL-1)];
    }else{
      philosophers[i].left = &forks[i-1];
    }

    philosophers[i].right = &forks[(i) % NUM_PHIL];
  }


  printf("Starting Threads\n");
  //start the threads
  for(i=0;i<NUM_PHIL;i++){
    pthread_create(&(philosophers[i].thread_id), NULL,
                   eat, (void *) &(philosophers[i]));
  }

  //join the threads
  for(i=0;i<NUM_PHIL;i++){
    pthread_join(philosophers[i].thread_id, NULL);
  }

  printf("All threads joined\n");
  //clean memory
  for(i=0;i<NUM_PHIL-1;i++){
    pthread_mutex_destroy(forks[i].lock);
    free(forks[i].lock);
  }

  return 0;
}

You'll see we have a structure for philosophers and forks, and that each philosopher has a fork to the right and left. The philosopher with id 0 shares a fork with the philosopher with id NUM_PHIL-1 because they are sitting at a round table.

If we look at the locking procedure more closely, we see that there appears to be reasonable ordering:

printf("Philosopher %d ACQUIRING left fork %d\n", phil->id, phil->left->id);
pthread_mutex_lock(phil->left->lock);

printf("Philosopher %d ACQUIRING right fork %d\n", phil->id, phil->right->id);
pthread_mutex_lock(phil->right->lock);

Every philosopher first reaches left to pick up a fork and then reaches right to pick up a fork. But, it doesn't avoid deadlock.

Can you solve the problem?