« Objective-C enumeration macro | Main | Lightweight issue tracking »
January 26, 2006
Locking, double-checked locking and speed
When you're doing multi-threaded programming it's often necessary to use locking to ensure that resources aren't being used in conflicting ways by different threads. There are three different approaches:
Forget about locking, go for speeeeeed!!!!
This one is potentially dangerous and even disastrous if the act of multiple threads attempting to perform the operation simultaneously will lead to problems.
if (condition) { [self performOperation]; condition = NO; }
Do locking, forget about speed
This one is safe but not as fast because acquiring the lock can be time-consuming and the thread may have to block waiting for the resource.
@synchronized (self) { if (condition) { [self performOperation]; condition = NO; } }
Double-checked locking: the compromise
Double-checked locking is often the best compromise between speed and safety. If the condition is fairly inexpensive to test (and it is in the case of the example because we're just checking the truth of a BOOL variable) then we can economically check the condition and acquire the lock if and only if we need to. We thus avoid a potentially expensive locking operation. But due to the possibility of a race condition we must then double-check the condition as shown below.
OSMemoryBarrier(); // explained below if (condition) { @synchronized (self) { if (condition) // the double-check { [self performOperation]; OSMemoryBarrier(); // explained below condition = NO; } } }
Memory barriers
Jonathan Rentzsch pointed out to me that the @synchronized directive doesn't offer any documented guarantees about memory barriers and without these the double-checked locking method may not be safe. See this excellent paper (PDF) for more information about possible pitfalls. It goes considerably further than the simplistic account offered here in the paper (PDF) that originally brought double-checked locking to my attention.
The use of the OSMemoryBarrier() call is intended to address this risk but there is one problem: I've done some single-stepping into the function using GDB and although it appears to work fine on the G5 (ultimately relying on the isync and lwsync assembly instructions to implement the barrier) there are no guarantees that it will work on the Intel architecture (although I suspect it will due to its stricter ordering properties).
I'm currently investigating this further, so in the absence of more concrete confirmation one way or the other, double-checked locking and the OSMemoryBarrier() call should probably be used with caution. Based on the information that I have been able to gather on the Internet so far I believe that the following directives represent the safest way to implement memory barriers on Mac OS X:
- On PowerPC: sync for read and read/write memory barriers, eieio for write memory barriers.
- On PowerPC 64: lwsync for read memory barriers, sync for read/write and write memory barriers.
- On Intel: lock; addl for read and read/write memory barriers, and nothing at all for write memory barriers.
- On Intel with SSE support (applies to any Intel machine that runs Mac OS X): lfence for read memory barriers, mfence for read/write memory barriers, and nothing at all for write memory barriers.
Here's an include file I've whipped up with some macros that implement these various combinations. It works for me, at least in my informal testing here (tested on both PowerPC and Intel CPUs).
If all of this seems a bit difficult the paper reference above offers multiple alternative suggestions to the double-checked locking pattern, especially as it is applied to singletons. These notes by the same author also offer some excellent insight into the problem.
Notes
You can often speed things up further by using a more granular lock. In this example I'm locking on self because it's quick and easy but you might be able to use another object for which there'll be less contention (for example, a specially created NSLock object) and which will lead to fewer (or shorter) blocking situations.
This may be self-evident to some but I think it's nice to state it explicitly: the compiler itself is clever enough to handle things like the following example. You can jump out of the scope of a synchronized block in a number of ways (in this case by way of an early return); it's not necessary to "fall off the end":
@synchronized (self) { if (error) return; if (condition) [self performOperation]; }
The other thing to bear in mind is that the choice between normal and double-checked locking will depend on a number of factors:
Is testing the condition an inexpensive operation compared to that of acquiring a lock? If so then double-checked locking is favourable.
Is contention for the lock likely to be high? That's another factor which makes double-checked locking desirable, because it avoids unnecessary contention for the lock.
How often is the condition likely to be true? If the condition tends to be true more often than not then double-checked locking is probably a bad idea, because in most situations you'll end up having to acquire the lock anyway and will have done the double-check for nothing.
So basically you'll need to think about the specific details of your situation and choose the method that best balances speed and safety accordingly.
Posted by wincent at January 26, 2006 05:08 PM