My last four blogs posts have presented:
- Symmetric Multi Processor (SMP) scaling problem.
- Solution as presented by Pyramid Technology.
- User space SMP problem.
- Various lock primitives.
It’s logical to now explore how to debug an SMP system. But first I need to present the classic locking problem as defined by Dijkstra.
Classic Locking Problem
Consider two processes: P1 and P2; and two locks: L1 and L2. Now consider the following locking scenerio: P1 locks L1 (P1.L1) and then L2 (P2.L1); P2 locks L2 (P2.L2) and then L1 (P2.L1).
If the locks are aquired in the order: (1: P1.L1, 2: P2.L2, 3: P1.L2, 4: P2,L1), then steps 3 and 4 will never complete. This scenerio is called a deadlock in that both processes are stuck waiting for locks that are held by each other.
So how do we deal with a SMP deadlock? There are essentually two ways to deal with deadlocks. Method 1: try to detect when a deadlock has occured. Method 2: prevent deadlocks from occuring.
The first method is somewhat problomatic in that you have to keep track of all of the locks in the system and periodically monitor them for a deadlock.
The second method is much easier to implement. Pyramid dealt with this problem by adding code to detect when a deadlock was immenant. To wit, a positive, non-zero level was assigned to each lock and the lock protocol was modified to ensure that a lock could only be aquired if the lock level was increased.
Now let us apply lock levels to the Dijkstra lock problem. L1 will now be defined as L1.5 (lock 1 with level 5) and L2 will now be defined as L2.10 (lock 1 with level 10). If the locks are aquired in the order: (1: P1.L1, 2: P2.L2, 3: P1.L2, 4: P2,L1), then step 4 will generate a Lock Level Violation since P2 is holding L2 (which has a level of 10) and P2 is trying to lock L1 (which has a level of 5) – you can only increase the lock levels.
Handling SMP Interrupts
There is another problem that we have to deal with: interrupts. If a process is holding a lock and interrupts occur, then it will look like the process is holding the lock for a long time.
This problem is dealt with by adding an interrupt level to the definition of a lock. In addition, the lock protocol is expanded by requiring that the cpu interrupt level be set to the lock interrupt level. Setting the cpu interrupt level blocks interrupts at a lower level. System interrupts are defined by levels, for example: 0 = none, 1 = tty, 2 = network, 3 = disk, etc. The higher the interrupt level, the higher the priority. Higher level interrupts block out lower level interrupts. Thus, setting the interrupt level of a cpu to priority X, blocks all interrupts lower than and including level X.
Defining the Locking Protocol
So a lock is now defined as: (Name, Level, Interrupt). And the locking protocol is now defined as:
1) If the new lock level is less than the current lock level, then generate a Lock Level Protocol Violation.
2) If the new lock interrupt priority is less than the current cpu interrupt priority, then generate a Lock Interrupt Protocol Violation.
Next up? SMP’s Thundering Herd Problem.