How can a read memory barrier work in the presence of interrupts?

https://stackoverflow.com/questions/12529400

03-07-2021
|

题

There's something I don't understand about memory barrier use and I'm hoping for clarification.

So, imagine we have a Treibers stack, but we're using SMR, so there's no counter associated with each pointer - we have to get the pointers right in our atomic operations (this isn't about ABA - we're using SMR, that deals with ABA, it's not part of the question).

Now, let's assume we're working on Intel (x86/x64) so every CAS comes with a full memory barrier. I think what happens when CASing is that the cache line is locked, the read barrier is issued, which clears the invalidate queue, the cache line is loaded therefore with the latest version of that cache line, the compare occurs, then the write barrier is issued, flushing the store buffer, finally we release the cache line lock.

So, we have the following code for pop;

BARRIER_PROCESSOR_READ;

original_top = stack_state->top;

do
{
  if( original_top == NULL )
    return( 0 );

  copy_of_original_top = original_top;

  original_top = compare_and_swap( &stack_state->top, original_top->next, original_top );
}
while( copy_of_original_top != original_top );

*user_data = original_top->user_data;

So, we first issue a read barrier - this ensures we flush our invalidate queue. But there is then a gap between doing so and reading state_state->top. Between clearing the invalidate queue and reading state_stack->top, anything can happen. The core can service an interrupt, have bus contention and be really slow, you name it - the invalidate cache lines can be reloaded (and be re-invalidated by another processor). Basically - the invalidate queue can refill. That means we cannot actually trust the value of original_top; we could be reading a local cache line which is actually wrong (we've just no invalidated it yet) and by doing so, falsely think its value is NULL and return 0.

So basically, I don't see how read barriers help, because anything can still happen after the barrier but before the actual read you wish to perform.

What am I missing here?

解决方案

I'm not completely sure I understand your question, but, after issuing the read barrier, any subsequent reads will definitely be ordered after reads which occurred before the barrier. Depending on exactky how BARRIER_PROCESS_READ is defined, it may also force subsequent reads to pull data from shared memory rather than a processor-specific cache line, with the effect that writes performed on other processors will be visible (assuming those writes are followed by an appropriate write barrier!).

These things remain true in the presence of interrupts. Even if cache lines get filled from within the interrupt handler just after the read barrier, then reading from those cache lines will still give you a value that is valid with respect to the semantics of the read barrier.

I suspect that in the code sample you provided, the purpose of the read barrier is actually to make writes by other processors visible, to ensure that the next line - original_top = stack_state->top; - retrieves a fresh value rather than a value which has been cached locally after a read which occurred prior to the barrier. If an interrupt handler reads the same address, this constraint will still be true. The value read won't be "as fresh", but it will at least not be a value that has been cached for an unbounded time.

其他提示

I'm not fully understanding your question, but I strongly suspect that you are missing a detail.

Memory fencing is used to guarantee visibility of changes, not to synchronize processes. Fencing alone does not lock access to the data.

On the other hand both an atomic operation and a lock (like a mutex or critical section or semaphore or any other synchronization primitive) will guarantee only one thread accesses a given area of memory (assuming all accesses are coded to happen when you 'own' such a lock or atomically). But they do not guarantee ordered visibility.

You need both an exclusive access and fencing if you want both (note: fencing is generally already implemented as part of high level synchronization primitives like mutexes, so if you use those you do not have to worry about fencing explicitly).

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow