I’m wrapping up a nine-part series on Symmetric Multi Processors (SMP). Following up on #8 the Random Back-off Problem which focused on a solution to isolating locks so that a cache line can only hold a single lock, I’ll conclude with another solution that completely solves the Thundering Herd problem in an optimal manner: Queued Locks.
I’ve shown you how a 32 bit cpu commonly uses an 8-word cache line. And we only want a single lock (or word) on a cache line. We can use the rest of the cache line to implement a Queued Lock.
Queued Locks
The Queued Lock algorithm essentually consists of having a cpu add itself to the end of a queue when it is waiting for a lock (this gives us fairness). The cpu then does a tight spin on a private cache line, waiting for the lock to be released (this gives us optimality). When the lock is released, the holding cpu picks the first cpu off of the queue (this is fair) and pokes the private cache line of that cpu to wake it up (optimal).
Single Cache
Here is the Queued Lock data structure that resides on a single cache line:
struct queued_lock { intmutex; // locks this data structure. intvalue; // 0 = free, 1 = locked charqueue[24]; };
This data structure can thus handle a 25 cpu system. The queue field contains cpu numbers. A zero value indicates that the entry is free.
Private Wait Array
Here is the Private Wait Array:
struct private_wait { intvalue; intfill[7]; }; struct private_wait private_wait_array[25];
This array is indexed by the cpu number.
Queued Lock Algorithms
Here is the Queued Lock lock algorithm:
queued_lock_lock (struct queued_lock *lock) { Now that I've laid out various scenarios and solutions, have I missed anything? What solutions would you add? while (test_and_set(lock->mutex) == 1) // lock the data structure ; if (lock->value == 0) // lock is not set, take it { lock->value = 1; lock->mutex = 0; // unlock the data structure } else // lock is set, add myself to the queue, and wait { for (index = 0; ; index++) { assert(index < 24) if (lock->queue[index] == 0) break; } lock->queue[index] = my_cpu_number + 1; private_wait_array[my_cpu_number]->value = 0; lock->mutex = 0; // unlock the data structure while (private_wait_array[my_cpu_number]->value == 0) ; } }
And finally here is the Queued Lock unlock algorithm:
queued_lock_unlock (struct queued_lock *lock) { while (test_and_set(lock->mutex) == 1) // lock the data structure ; if (lock->queue[0] == 0) // no one is waiting { lock->value = 0; } else// somebody is waiting, poke them { cpu = lock->queue[0] - 1; memmove(&lock->queue[0], &lock->queue[1], 23); private_wait_array[cpu]->value = 1; } lock->mutex = 0; // unlock the data structure }