Real-life use cases of barriers (DSB, DMB, ISB) in ARM

Question 1

Sorry, not going to give you a straight-out example like you're asking, because as you are already looking through the Linux source code, you have plenty of those to go around, and they don't appear to help. No shame in that - every sane person is at least initially confused by memory access ordering issues :)

If you are mainly an application developer, then there is every chance you won't need to worry too much about it - whatever concurrency frameworks you use will resolve it for you.

If you are mainly a device driver developer, then examples are fairly straightforward to find - whenever there is a dependency in your code on a previous access having had an effect (cleared an interrupt source, written a DMA descriptor) before some other access is performed (re-enabling interrupts, initiating the DMA transaction).

If you are in the process of developing a concurrency framework (, or debugging one), you probably need to read up on the topic a bit more - but your question suggests a superficial curiosity rather than an immediate need? If you are developing your own method for passing data between threads, not based on primitives provided by a concurrency framework, that is for all intents and purposes a concurrency framework.

Paul McKenney wrote an excellent paper on the need for memory barriers, and what effects they actually have in the processor: Memory Barriers: a Hardware View for Software Hackers

If that's a bit too hardcore, I wrote a 3-part blog series that's a bit more lightweight, and finishes off with an ARM-specific view. First part is Memory access ordering - an introduction.

But if it is specifically lists of examples you are after, especially for the ARM architecture, you could do a lot worse than Barrier Litmus Tests and Cookbook.

The extra-extra light programmer's view and not entirely architecturally correct version is:

DMB - whenever a memory access requires ordering with regards to another memory access.
DSB - whenever a memory access needs to have completed before program execution progresses.
ISB - whenever instruction fetches need to explicitly take place after a certain point in the program, for example after memory map updates or after writing code to be executed. (In practice, this means "throw away any prefetched instructions at this point".)

Question 2

Usually you need to use a memory barrier in cases where you have to make SURE that memory access occurs in a specific order. This might be required for a number of reasons, usually it's required when two or more processes/threads or a hardware component access the same memory structure, which has to be kept consistent.

It's used very often in DMA-transfers. A simple DMA control structures might look like this:

struct dma_control {
  u32 owner;
  void * data;
  u32 len;
};

The owner will usually be set to something like OWNER_CPU or OWNER_HARDWARE, to indicate who of the two participants is allowed to work with the structure.

Code which changes this will usually like like this

dma->data = data;
dma->len  = length;
smp_mb();
dma->owner = OWNER_HARDWARE;

So, data an len are always set before the ownership gets transfered to the DMA hardware. Otherwise the engine might get stale data, like a pointer or length which was not updated, because the CPU reordered the memory access.

The same goes for processes or threads running on different cores. The could communicate in a similar manner.

Question 3

One simple example of a barrier requirement is a spinlock. If you implement a spinlock using compare-and-swap(or LDREX/STREX on ARM) and without a barrier, the processor is allowed to speculatively load values from memory and lazily store computed values to memory, and neither of those are required to happen in the order of the loads/stores in the instruction stream.

The DMB in particular prevents memory access reordering around the DMB. Without DMB, the processor could reorder a store to memory protected by the spinlock after the spinlock is released. Or the processor could read memory protected by the spinlock before the spinlock was actually locked, or while it was locked by a different context.

unixsmurf already pointed it out, but I'll also point you toward Barrier Litmus Tests and Cookbook. It has some pretty good examples of where and why you should use barriers.