Last time I presented a solution to the Thundering Herd problem: the Random Backoff, but noted a problem with solution. It is quite possible that when a cpu releases a contentious lock, waiting cpus will not wake up immediately. That is, it could take up to a second before a waiting lock tries to obtain the now-free lock. This solution is thus not optimal.
As a follow-up I’d like to present a SMP refinement that I believe solves this problem: isolating and separating all locks so that there is a single lock per cache line. Why do we do this?