Enforcing one-at-a-time access to pointer from a primative wrapper

Question 1

If you have a shared resource (your vector) which will be concurrently accessed through reads and writes from different tasks, you may associated a dedicated dispatch queue with this resource where these tasks will exclusively run.

That is, every access to this resource (read or write) will be executed on that dispatch queue exclusively. Let's name this queue "sync_queue".

This "sync_queue" may be a serial queue or a concurrent queue.

If it's a serial queue, it should be immediately obvious that all accesses are thread-safe.

If it's a concurrent queue, you can allow read accesses to happen simultaneously, that is you simply call dispatch_async(sync_queue, block):

dispatch_async(sync_queue, ^{
    if (_shared_value == 0) {
        dispatch_async(otherQueue, block);
    }
});

If that read access "moves" the value to a call-site executing on a different execution context, you should use the synchronous version:

__block int x;
dispatch_sync(sync_queue, ^{
    x = _shared_value;
});
return x;

Any write access requires exclusive access to the resource. Having a concurrent queue, you accomplish this through using a barrier:

dispatch_barrier_async(sync_queue, ^{
    _shared_value = 0;
    dispatch_async(mainQueue, ^{
        NSLog(@"value %d", _shared_value);
    });
});

Question 2

It really depends what you're doing, most of the time I drop back to the main queue (or a specifically designated queue) using dispatch_async() or dispatch_sync().

Async is obviously better, if you can do it.

It's going to depend on your specific use case but there are times when dispatch_async/dispatch_sync is multiple orders of magnitude faster than creating a lock.

The entire point of grand central dispatch (and NSOperationQueue) is to take away many of the bottlenecks found in traditional threaded programming, including locks.

Regarding your comment about NSOperation being harder to use... that's true, I don't use it very often either. But it does have useful features, for example if you need to be able to terminate a task half way through execution or before it's even started executing, NSOperation is the way to go.

Question 3

There is a simple way to get what you need even without locking. The idea is that you have either shared, immutable data or you exclusive, mutable data. The reason why you don't need a lock for shared, immutable data is that it is simply read-only, so no race conditions during writing can occur.

All you need to do is to switch between both depending on what you currently need:

When you are adding samples to your storage, you need exclusive access to the data. If you already have a "working copy" of the data, you can just extend it as you need. If you only have a reference to the shared data, you create a working copy which you then keep for later exclusive access.
When you want to evaluate your samples, you need read-only access to the shared data. If you already have a shared copy, you just use that. If you only have an exclusive-access working copy, you convert that to a shared one.

Both of these operations are performed on demand. Assuming C++, you could use std::shared_ptr<vector const> for the shared, immutable data and std::unique_ptr<vector> for the exclusive-access, mutable data. For the older C++ standard those would be boost::shared_ptr<..> and std::auto_ptr<..> instead. Note the use of const in the shared version and that you can convert from the exclusive to the shared one easily, but the inverse is not possible, in order to get a mutable from an immutable vector, you have to copy.

Note that I'm assuming that copying the sample data is not possible and doesn't explode the complexity of your algorithm. If that doesn't work, your approach with the scrap space that is used while the background operations are in progress is probably the best way to go. You can automate a few things using a dedicated structure that works similar to a smart pointer though.