Friday, June 17, 2011

What is the difference between a mutex and a semaphore?

Okay. This is another common interview question. The one line answer to this question is:

"A mutex is a binary semaphore that can only be unlocked by the thread that locked it. It's primary purpose is to ensure single threaded access to a piece of code called the critical section. A semaphore, in contrast, is often used for cooperative signalling across threads, a feature that is explicitly disallowed by the construction of the mutex as a binary semaphore owned by the thread that locked it."

Here are a few of the points you should keep in mind:

Q1: What is a semaphore?
* A semaphore is a mechanism to convey events between two or more cooperating processes. Now what does this mean exactly?
In a typical multiprocessing machine, many processes are being timeshared on a single CPU. The degree to which computation will proceed in one process depends on how big a time slice it has been given by the OS CPU Scheduler. Thus, different processes will be at different stages of computation at any point of time. So, how do we coordinate across processes? We use semaphores to tell other processes that we've reached certain stages in our processing (via a sem_post) or to wait for other processes to tell us that they've reached certain well defined stages in their computation (via a sem_wait). These two major functions allow us to synchronize computation across process boundaries and this is the main function of a semaphore.

Effectively, you can think of a semaphore as a click lock with many keys. Anybody who holds a key can open the lock. There can be multiple keys and multiple people can open the lock at any time. Anybody can lock the lock.

Q2: What is a mutex?
* A mutex is a mechanism to ensure mutual exclusion. In simple terms, this means that one and only 1 process will be allowed to execute a particular piece of code guarded by a mutex. This particular piece of code is called the critical section. A mutex has 2 major properties:
1. Only one of the cooperating processes gains access to the critical section.
2. Only the process that holds access to the critical section (called the owner of the mutex) can revoke its access to the critical section. (This is called ownership of the critical section or mutex).

Effectively, a mutex is the concept of a lock + owner. There is a lock and only 1 key. Whoever uses a key to open the lock is called the owner of the mutex. Only he can lock the lock or open it. Thus, there can be at max 1 person at any time in the locked room.

By ensuring that no more than 1 process can ever own and release the mutex, mutual exclusion is ensured.

* A semaphore can be used to implement a mutex. This is possible if we associate the concept of an "owner" to a binary semaphore (a click lock with 1 key only). Look at the following pseudocode:

Global Initialization:
sem_init(&sem, 1); // Binary Semaphore (ensures that only 1 process can enter the Critical Section)
owner = 0; // No pid

Mutex Lock Code:
sem_wait(&sem);
owner = getpid();

// Critical Section

Mutex Unlock Code:
if (owner != getpid()) {
   abort(); // non-owner is trying to unlock the mutex. Invalid state - abort. (Ensures Mutual Exclusion)
}
sem_post(&sem);

Note how the unlock code verifies the owner of the lock and thus fulfills the function of a mutex. This is a requirement for ensuring that only the owner of the lock is able to unlock the code.

This is in sharp contrast to some other uses of a semaphore.
Consider the following example:

Global:
sem_init(&sem1, 1);
sem_init(&sem2, 0);

void f() {
  while(true) {
   sem_wait(&sem1);
   // CS1
   sem_post(&sem2);
  }
}

void g() {
  while(true) {
   sem_wait(&sem2);
   // CS2
   sem_post(&sem1);
  }
}

If f() and g() are 2 processes cooperating via the 2 semaphores, the way that the code is written ensures that the critical sections will be executed one after the other (CS1 - CS2 - CS1 - CS2 etc).
This example illustrates the fact that semaphores can be used to convey events between 2 cooperating processes. In this case, the events being conveyed are the completion of the critical sections of f() and g(). Also note that in this example, the process that waits on a semaphore isn't the one that is posting to the semaphore. This is a major difference between a semaphore and a mutex because a mutex is unable to achieve such a behaviour.

Q3: What are counting semaphores?
* Semaphores also come in a variety of flavours. One such flavour is called a counting semaphore. A counting semaphore is one that is able to allow more than one process to successfully wait on it. It lets through some "count" number of processes through sem_waits. This makes a counting semaphore more useful than a mutex - for example, it can be used to implement reader-writer locks or allow a fixed level of concurrency in a piece of code. Note that mutexes only allow for one process/thread in a critical section.

* Please remember that mutexes need to be acquired and released by the same thread. A request release a mutex currently held by another process should be treated as invalid code. This is different from the behaviour of a semaphore. A semaphore may be WAIT-ed upon by one thread/process and be POST-ed by another. This is usually by design. Otherwise mutual exclusion cannot be guaranteed in all cases.

I hope this makes the differences between semaphores and mutexes very clear. If you have any more doubts, just put in your questions in the comments!