Lec 29: Critical Sections, Locking, and Deadlocks

1. Threads and shared resources
2. Locking
3. Locking Strategies

1 Threads and shared resources

In the last lesson, we explored threads and their usefulness for concurrent programming. Threads are a way to divide the resources of a process so that individual parts can be scheduled independently. We also described this as user level parallelism, as oppose to O.S. level parallelism which is provide by processing.

There are many benefits to user level parallelism, such as simpler programming structure and design. User level parallelism also means that each thread shares the same resources, including memory. For example, consider the following small program:

#include <stdio.h>
#include <stdlib.h>

#include <unistd.h>
#include <sys/types.h>

#include <pthread.h>

int shared = 10;

void * fun(void * args){

  time_t start = time(NULL);
  time_t end = start+3; //run for 3 seconds

  while(time(NULL) < end){
    shared++;
  }

  return NULL;
}

int main(){
  pthread_t thread_id;

  pthread_create(&thread_id, NULL, fun, NULL);

  pthread_join(thread_id, NULL);

  printf("shared: %d\n", shared);

  return 0;
}

The thread executing fun() will increment the shared variable shared for 3 seconds. The main thread will block on the join and print the result. This is only possible because both threads, the worker thread and the main thread share memory. While a contrived example, it is demonstrative of the capabilities of shared memory and your experience with multi-processing and piping should help you appreciate the benefit.

However, shared resources comes at a cost. The above example has a single thread working on a single variable: what happens if we have multiple threads trying to manipulate a single memory address? You might think that each thread will be able to act without interference, but computers are finicky and the thread scheduling routine is not transparent. A thread may be just in the middle of an operation and then be interupted by another thread that is also operating on the same data. The result is that the data can become inconsistent and neither thread has the true representation of the data.

To solve these problems, we need a new mechanism to ensure that all critical operations are atomic or mutually exclusive, that is, an operation completes full within one thread without the possibility of another thread preempting that process. In today's lessons, we are going to learn how to add mutual exclusion to our programs to avoid inconsistencies and how to avoid using these tools in ways to hamper program progress.

2 Locking

The concept of resource locking is a huge area of study in computer and operating system research. The big idea is that when a program enters a critical section or a set of code that must fully complete without interruption, the program holds a lock that only one program (or thread) can hold a time. In this way only one program can ever be in the critical section at any time.

2.1 Why we need locks?

Before we get into the details of programming with locks. Let's look at an example of why we need locks. Consider the program below which is simulating a bank. The user selects the number of threads from the command line and then each thread will manipulate an account balance by randomly choosing an amount to credit or debit the account while tracking the total number of debits and credits.

/* bank.c*/
int balance = INIT_BALANCE;


int credits = 0;
int debits = 0;

void * transactions(void * args){
  int i,v;

  for(i=0;i<NUM_TRANS;i++){

    //choose a random value
    v = (int) random() % NUM_TRANS;

    //randomnly choose to credit or debit
    if( random()% 2){
      //credit

      balance = balance + v;
      credits = credits + v;

    }else{
      //debit

      balance = balance - v;
      debits = debits + v;

    }

  }

  return 0;
}

int main(int argc, char * argv[]){

  int n_threads,i;
  pthread_t * threads;

  //error check
  if(argc < 2){
    fprintf(stderr, "ERROR: Require number of threads\n");
    exit(1);
  }

  //convert string to int
  n_threads = atol(argv[1]);

  //error check
  if(n_threads <= 0){
    fprintf(stderr, "ERROR: Invalivd value for number of threads\n");
    exit(1);
  }

  //allocate array of thread identifiers
  threads = calloc(n_threads, sizeof(pthread_t));

  //start all threads
  for(i=0;i<n_threads;i++){
    pthread_create(&threads[i], NULL, transactions, NULL);
  }

  //join all threads
  for(i=0;i<n_threads;i++){
    pthread_join(threads[i], NULL);
  }

  printf("\tCredits:\t%d\n", credits);
  printf("\t Debits:\t%d\n\n", debits);
  printf("%d+%d-%d=   \t%d\n", INIT_BALANCE,credits,debits,
         INIT_BALANCE+credits-debits);
  printf("\t Balance:\t%d\n", balance);

  //free array
  free(threads);

  return 0;
}

Once the program completes, it prints out the total amount credited and debited and the current balance. The program also prints out what the balance should be if we consider credits, debits and initial balance. Here is two runs of the program with 2 worker threads:

#> ./bank 2
	Credits:	4707
	 Debits:	4779

1000+4707-4779=   	928
	 Balance:	942

#> ./bank 2
	Credits:	4868
	 Debits:	4465

1000+4868-4465=   	1403
	 Balance:	1587

What happened? In neither case did the balance at the end match the expectation based on the total debits and credits. How could this be?

Each thread is scheduled independently and can be interrupted and preempted arbitrarily by the OS. This can even occur in the middle of an operation. Consider just the simple bit of code to add numbers:

balance = balance + v

To assign the final balance, two things must happen:

balance + v must be computed and stored in a temp variable
The temp variable must be assigned to balance

What happens if the thread did step (1) but was preempted by the other thread before completing step (2). In the meanwhile, the other thread completes (1) and (2), and finally the original thread is scheduled back to do step (2). Consider the diagram below:

      Thread 1                 Thread 2        balance     v     tmp

(1) bal + v -> tmp                               1000      5     1005

                           (1) bal + v -> tmp    1000      3     1003
                           (2) bal <- tmp        1003
(2) bal <- tmp                                   1005            1005

Since the incremental work of Thread 1 was assigned after the work of Thread 2, it is if Thread 2 never did the work in the first place. This is why we get such inconsistent results with the balance, and further, it's more than just the balance value, it not clear that even credit or debit counters are consisten either.

To solve this problem, we have to identify the critical sections of the program, that is, sections of code that only one thread can execute at a time. Once a critical section is identified, we use a shared variables to lock that section. Only one thread can hold the lock, so only one thread executes the critical section at the time.

2.2 Naive Locking

While it may simple at first creating a lock is no small task. The locking procedure must be an atomic operation where the whole process of testing if the lock is available and acquiring the lock occur singularly. To demonstrate why this must be the case, consider the following code where we implement a naive lock, which is just a variable integer, either set to 0 for available or 1 for locked.

char USAGE[] = "naive_lock n_threads\n"
  "USAGE: run n threads with a naive lock\n";

int lock = 0; //0 for unlocked, 1 for locked

int shared = 0; //shared variable

void * incrementer(void * args){
  int i;

  for(i=0;i<100;i++){

    //check lock
    while(lock > 0);  //spin until unlocked

    lock = 1; //set lock

    shared++; //increment

    lock = 0; //unlock
  }

  return NULL;
}


int main(int argc, char * argv[]){
  pthread_t * threads;
  int n,i;

  if(argc < 2){
    fprintf(stderr, "ERROR: Invalid number of threads\n");
    exit(1);
  }

  //convert argv[1] to a long
  if((n = atol(argv[1])) == 0){
    fprintf(stderr, "ERROR: Invalid number of threads\n");
    exit(1);
  }

  //allocate array of pthread_t identifiers
  threads = calloc(n,sizeof(pthread_t));

  //create n threads
  for(i=0;i<n;i++){
    pthread_create(&threads[i], NULL, incrementer, NULL);
  }

  //join all threads
  for(i=0;i<n;i++){
    pthread_join(threads[i], NULL);
  }

  //print shared value and result
  printf("Shared: %d\n",shared);
  printf("Expect: %d\n",n*100);

  return 0;
}

If you follow the code, you see that the variable lock is set to 0 for unlocked and 1 for locked. Then each thread will spin until the variable is set to 0 for unlocked, that is while(lock > 0);

If we run this program a few times with 2 threads, it might seem to work …

#> ./naive_lock 2
Shared: 200
Expect: 200
#> ./naive_lock 2
Shared: 200
Expect: 200
#> ./naive_lock 2
Shared: 200
Expect: 200
#> ./naive_lock 2
Shared: 200
Expect: 200
#> ./naive_lock 2
Shared: 200
Expect: 200
#> ./naive_lock 2
Shared: 172
Expect: 200

… until all of a sudden it does not work. What happened?

The problem is the testing of the lock and the acquisition of the lock are not atomic operations. If we look at that sequence of code:

//check lock
while(lock > 0);  //spin until unlocked
                                         <----------- What if threads swap here?
lock = 1; //set lock                                  Then two (or more) threads could have the lock.

What we need is a specialized operation for testing and setting that all happens at once, and fortunately, this is provided to us by pthread library.

2.3 `mutex` locks

The term mutex stands for a mutually exclusion, which is a fancy name for lock. A mutex is not a standard variable, instead, it is guaranteed to be atomic in operation. The act of acquiring a lock cannot be interrupted.

In the pthread library the type of a mutex is:

pthread_mutext_t mutex;

You must first initialize a mutex before you can use it:

 pthread_mutex_init(&mutex, NULL);
 //                          ^
//                      will not use second arg of init

You then can acquire and unlock a mutex:

pthread_mutex_lock(&mutex);

/* critical section */

pthread_mutex_unlock(&mutex);

Both of these operations are atomic, and the locking function is blocking. That is, once a thread tries to acquire a lock, it will be suspended execution until the lock is available. Further, locks are ordered in their acquisition; the order in which threads try to acquire the lock determine which thread gets the lock next. How that works is a discussion for your operating systems class, and more generally, the locking functionality implantation details are heavily supported by the operating system and the hardware.

Finally, creating a mutex allocates memory. So we have to deallocate the mutex, or destroy it:

pthread_mutex_destroy(&mutex);

3 Locking Strategies

There are two strategies for locking critical sections. In this part, we will refer back the bank.c program at the top of the lessons where we had three shared variables, balance, credits, debits, and multiple threads depositing and withdrawing random funds from the balance. As we identified, there was inconsistencies in the balance after running the program, and those inconsistencies are present with all the shared variables.

3.1 Coarse Locking

One locking strategy for locking a program is to use a single lock for the critical section to protect the entire critical section. This is called coarse locking because it is a large granularity. In code, this might look like this for the critical section:

/*bank_coarse_lock.c*/
int balance = INIT_BALANCE;


int credits = 0;
int debits = 0;

pthread_mutex_t lock;

void * transactions(void * args){
  int i,v;

  for(i=0;i<NUM_TRANS;i++){

    pthread_mutex_lock(&lock); //acquire lock

    //choose a random value
    v = (int) random() % NUM_TRANS;

    //randomnly choose to credit or debit
    if( random()% 2){
      //credit

      balance = balance + v;
      credits = credits + v;

    }else{
      //debit

      balance = balance - v;
      debits = debits + v;

    }

    pthread_mutex_unlock(&lock); //release lock
  }

  return 0;
}

With this locking strategy, consistency is achieved. All the critical code executes atomically and the output is consistent:

#> ./bank_coarse_lock 3
	Credits:	7247
	 Debits:	7462

1000+7247-7462=   	785
	 Balance:	785

3.2 Fine Locking

While coarse locking is a reasonable choice, it is inefficient. We looks some paralleism because not all parts of the critical section are critical to each other. For example, consider that the variable credits and debits are used exclusively of each other; each thread only performs a credit or debit but not both. Maybe it would be worth while to do more fine grain locking.

Consider the following changes:

int balance = INIT_BALANCE;


int credits = 0;
int debits = 0;

pthread_mutex_t b_lock,c_lock,d_lock;

void * transactions(void * args){
  int i,v;

  for(i=0;i<NUM_TRANS;i++){

    //choose a random value
    v = (int) random() % NUM_TRANS;

    //randomnly choose to credit or debit
    if( random()% 2){
      //credit

      pthread_mutex_lock(&b_lock);
      balance = balance + v;
      pthread_mutex_unlock(&b_lock);


      pthread_mutex_lock(&c_lock);
      credits = credits + v;
      pthread_mutex_unlock(&c_lock);

    }else{
      //debit

      pthread_mutex_lock(&b_lock);
      balance = balance - v;
      pthread_mutex_unlock(&b_lock);

      pthread_mutex_lock(&d_lock);
      debits = debits + v;
      pthread_mutex_unlock(&d_lock);

    }
  }

  return 0;
}

In this example, we use fine locking because each of the shared variables are independently locked only when used. This allows for more parallelism at the cost of complexity. If we compare the run times of the two strategies we see that, yes, fine grain locking is marginally faster and much faster on larger load sets.

3.3 Deadlocks

One consequence of fine grain locking is the potential for deadlocks. A deadlock occurs when two threads each hold a resource the other is waiting on. Let's look at a naive example:

#include <stdio.h>
#include <stdlib.h>

#include <pthread.h>

pthread_mutex_t lock_a, lock_b;

void * fun_1(void * arg){
  int i;
  for (i = 0 ; i< 10000 ; i++){

    pthread_mutex_lock(&lock_a); //lock a then b
    pthread_mutex_lock(&lock_b);

    //CRITICAL SECTION

    pthread_mutex_unlock(&lock_a);
    pthread_mutex_unlock(&lock_b);

  }

  return NULL;
}

void * fun_2(void * arg){
  int i;
  for (i = 0 ; i< 10000 ; i++){

    pthread_mutex_lock(&lock_b); //lock b then a
    pthread_mutex_lock(&lock_a);

    //CRITICAL SECTION

    pthread_mutex_unlock(&lock_b);
    pthread_mutex_unlock(&lock_a);

  }

  return NULL;
}

int main(){

  pthread_t thread_1,thread_2;

  pthread_mutex_init(&lock_a, NULL);
  pthread_mutex_init(&lock_b, NULL);

  pthread_create(&thread_1, NULL, fun_1, NULL);
  pthread_create(&thread_2, NULL, fun_2, NULL);

  pthread_join(thread_1, NULL);
  pthread_join(thread_2, NULL);

  return 0;
}

In this code we have two threads and two locks. One thread acquires lock_a and then lock_b while the other thread acquires lock_b and then lock_a. If we run this program we find that nothing happens; the program hangs. This is because of a deadlock.

Consider what happens when one each thread run such that one holds lock_a and the other holds lock_b, then one thread is waiting on the lock the other thread holds. Both threads are now blocking on acquiring the resources, and the main thread is blocking trying to join the worker threads. Nothing is running: deadlock.

IC221: Systems Programming (SP16)

Lec 29: Critical Sections, Locking, and Deadlocks

Table of Contents

1 Threads and shared resources

2 Locking

2.1 Why we need locks?

2.2 Naive Locking

2.3 `mutex` locks

3 Locking Strategies

3.1 Coarse Locking

3.2 Fine Locking

3.3 Deadlocks

IC221: Systems Programming (SP16)

Lec 29: Critical Sections, Locking, and Deadlocks

Table of Contents

1 Threads and shared resources

2 Locking

2.1 Why we need locks?

2.2 Naive Locking

2.3 mutex locks

3 Locking Strategies

3.1 Coarse Locking

3.2 Fine Locking

3.3 Deadlocks

2.3 `mutex` locks