In my last blog post, I described the SMP problem. How adding cpus to an SMP system does not necessarily reliably increase performance.
Pyramid Technology solved this problem by decreasing the granularity of the lock. This means that the amount of real estate that was controlled by the lock, was decreased. Thus instead of a single lock for the entire kernel, multiple locks were added for the kernel. That is, locks were added for each major data structure: Process Table, Schedular, File Systems, etc.
Pyramid found that this increase in kernel locks greatly increased system performance. But it was not enough, processing were still stalling for long periods of time (greater than 30 seconds), waiting for kernel locks to be released. The problem was that some of the locks were too course.
Process Table Example
For example: consider the Process Table. Initially, a single lock was used to control access to this data structure: the Process Table Lock. This lock was used when entries were added or deleted from the table. This lock was also used were entries were modified. Thus, when a process was created or killed, no other cpu could modify an existing process: change the wait state, post interrupts, etc. The solution was to add a Process Entry Lock. Thus the Process Table Lock controlled access when entries were added and deleted, and each Process Entry Lock was used to synchronize access to a single Process Table Entry.
This process thus decreased the granularity of the Process Table Lock. More locks were added to the system and the Process Table Lock was used to synchronize less code.
This process had to be repeated until cpus were no longer stalling. The definition of stalling is left up to the designer. A lower stall time means that the system is more efficient. It also means that more locks will have to be added.
By the way, the brains behind Pyramid Technology was Rich Hammonds. He designed both the software and hardware at Pyramid. He is the finest engineer that I have ever had the pleasure of working with.