how to declare and use "one writer, many readers, one process, simple type" variable?

Question 1

I have simple type variable (like int). I have one process, one writer thread, several "readonly" threads. How should I declare variable?

volatile int std::atomic int

Use std::atomic with memory_order_relaxed for the store and load

It's quick, and from your description of your problem, safe. E.g.

void func_fast()
{
    std::atomic<int> a; 
    a.store(1, std::memory_order_relaxed);
}

Compiles to:

func_fast():
    movl    $1, -24(%rsp)
    ret

This assumes you don't need to guarantee that any other data is seen to be written before the integer is updated, and therefore the slower and more complicated synchronisation is unnecessary.

If you use the atomic naively like this:

void func_slow()
{
    std::atomic<int> b;
    b = 1; 
}

You get an MFENCE instruction with no memory_order* specification which is massive slower (100 cycles more more vs just 1 or 2 for the bare MOV).

func_slow():
    movl    $1, -24(%rsp)
    mfence
    ret

See http://goo.gl/svPpUa

(Interestingly on Intel the use of memory_order_release and _acquire for this code results in the same assembly language. Intel guarantees that writes and reads happen in order when using the standard MOV instruction).

Question 2

Just use std::atomic.

Don't use volatile, and don't use it as it is; that doesn't give the necessary synchronisation. Modifying it in one thread and accessing it from another without synchronisation will give undefined behaviour.

Question 3

If you have unsynchronized access to a variable where you have one or more writers then your program has undefined behavior. Some how you have to guarantee that while a write is happening no other write or read can happen. This is called synchronization. How you achieve this synchronization depends on the application.

For something like this where we have one writer and and several readers and are using a TriviallyCopyable datatype then a std::atomic<> will work. The atomic variable will make sure under the hood that only one thread can access the variable at the same time.

If you do not have a TriviallyCopyable type or you do not want to use a std::atomic You could also use a conventional std::mutex and a std::lock_guard to control access

{ // enter locking scope
    std::lock_guard lock(mutx); // create lock guard which locks the mutex
    some_variable = some_value; // do work
} // end scope lock is destroyed and mutx is released

An important thing to keep in mind with this approach is that you want to keep the // do work section as short as possible as while the mutex is locked no other thread can enter that section.

Another option would be to use a std::shared_timed_mutex(C++14) or std::shared_mutex(C++17) which will allow multiple readers to share the mutex but when you need to write you can still look the mutex and write the data.

You do not want to use volatile to control synchronization as jalf states in this answer:

For thread-safe accesses to shared data, we need a guarantee that:

the read/write actually happens (that the compiler won't just store the value in a register instead and defer updating main memory until much later)

that no reordering takes place. Assume that we use a volatile variable as a flag to indicate whether or not some data is ready to be read. In our code, we simply set the flag after preparing the data, so all looks fine. But what if the instructions are reordered so the flag is set first?

volatile does guarantee the first point. It also guarantees that no reordering occurs between different volatile reads/writes. All volatile memory accesses will occur in the order in which they're specified. That is all we need for what volatile is intended for: manipulating I/O registers or memory-mapped hardware, but it doesn't help us in multithreaded code where the volatile object is often only used to synchronize access to non-volatile data. Those accesses can still be reordered relative to the volatile ones.

As always if you measure the performance and the performance is lacking then you can try a different solution but make sure to remeasure and compare after changing.

Lastly Herb Sutter has an excellent presentation he did at C++ and Beyond 2012 called Atomic Weapons that:

This is a two-part talk that covers the C++ memory model, how locks and atomics and fences interact and map to hardware, and more. Even though we’re talking about C++, much of this is also applicable to Java and .NET which have similar memory models, but not all the features of C++ (such as relaxed atomics).

Question 4

I'll complete a little bit the previous answers.

As exposed previously, just using int or eventually volatile int is not enough for various reason (even with the memory order constraint of Intel processors.)

So, yes, you should use atomic types for that, but you need extra considerations: atomic types guarantee coherent access but if you have visibility concerns you need to specify memory barrier (memory order.)

Barriers will enforce visibility and coherency between threads, on Intel and most modern architectures, it will enforce cache synchronizations so updates are visible for every cores. The problem is that it may be expensive if you're not careful enough.

Possible memory order are:

relaxed: no special barrier, only coherent read/write are enforce;
sequential consistency: strongest possible constraint (the default);
acquire: enforce that no loads after the current one are reordered before and add the required barrier to ensure that released stores are visible;
consume: a simplified version of acquire that mostly only constraint reordering;
release: enforce that all stores before are complete before the current one and that memory writes are done and visible to loads performing an acquire barrier.

So, if you want to be sure that updates to the variable are visible to readers, you need to flag your store with a (at least) a release memory order and, on the reader side you need an acquire memory order (again, at least.) Otherwise, readers may not see the actual version of the integer (it'll see a coherent version at least, that is the old or the new one, but not an ugly mix of the two.)

Of course, the default behavior (full consistency) will also give you the correct behavior, but at the expense of a lot of synchronization. In short, each time you add a barrier it forces cache synchronization which is almost as expensive as several cache misses (and thus reads/writes in main memory.)

So, in short you should declare your int as atomic and use the following code for store and load:

// Your variable
std::atomic<int> v;

// Read
x = v.load(std::memory_order_acquire);

// Write
v.store(x, std::memory_order_release);

And just to complete, sometimes (and more often that you think) you don't really need the sequential consistency (even the partial release/acquire consistency) since visibility of updates are pretty relative. When dealing with concurrent operations, updates take place not when write is performed but when others see the change, reading the old value is probably not a problem !

I strongly recommend reading articles related to relativistic programming and RCU, here are some interesting links:

Relativistic Programming wiki: http://wiki.cs.pdx.edu/rp/
Structured Deferral: Synchronization via Procrastination: https://queue.acm.org/detail.cfm?id=2488549
Introduction to RCU Concepts: http://www.rdrop.com/~paulmck/RCU/RCU.LinuxCon.2013.10.22a.pdf

Question 5

Let's start from int at int. In general, when used on single processor, single core machine this should be sufficient, assuming int size same or smaller than CPU word (like 32bit int on 32bit CPU). In this case, assuming correctly aligned address word addresses (high level language should assure this by default) the write/read operations should be atomic. This is guaranteed by Intel as stated in [1] . However, in C++ specification simultaneous reading and writing from different threads is undefined behaviour.

$1.10

6 Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.

Now volatile. This keyword disables almost every optimization. This is the reason why it was used. For example, sometimes when optimizing the compiler can come to idea, that variable you only read in one thread is constant there and simply replace it with it's initial value. This solves such problems. However, it does not make access to variable atomic. Also, in most cases, it is simply unnecessary, because use of proper multithreading tools, like mutex or memory barrier, will achieve same effect as volatile on it's own, as described for instance in [2]

While this may be sufficient for most uses, there are other operations that are not guaranteed to be atomic. Like incrementation is a one. This is when std::atomic comes in. It has those operations defined, like here for mentioned incrementations in [3]. It is also well defined when reading and writing from different threads [4].

In addition, as stated in answers in [5], there exists a lot of other factors that may influence (negatively) atomicity of operations. From loosing cache coherency between multiple cores to some hardware details are the factors that may change how operations are performed.

To summarize, std::atomic is created to support accesses from different threads and it is highly recommended to use it when multithreading.

[1] http://www.intel.com/Assets/PDF/manual/253668.pdf see section 8.1.1.

[2] https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt

[3] http://en.cppreference.com/w/cpp/atomic/atomic/operator_arith

[4] http://en.cppreference.com/w/cpp/atomic/atomic

[5] Are C++ Reads and Writes of an int Atomic?

Question 6

The other answers, which say to use atomic and not volatile, are correct when portability matters. If you’re asking this question, and it’s a good question, that’s the practical answer for you, not, “But, if the standard library doesn’t provide one, you can implement a lock-free, wait-free data structure yourself!” Nevertheless, if the standard library doesn’t provide one, you can implement a lock-free data structure yourself that works on a particular compiler and a particular architecture, provided that there’s only one writer. (Also, somebody has to implement those atomic primitives in the standard library.) If I’m wrong about this, I’m sure someone will kindly inform me.

If you absolutely need an algorithm guaranteed to be lock-free on all platforms, you might be able to build one with atomic_flag. If even that doesn’t suffice, and you need to roll your own data structure, you can do that.

Since there’s only one writer thread, your CPU might guarantee that certain operations on your data will still work atomically even if you just use normal accesses instead of locks or even compare-and-swaps. This is not safe according to the language standard, because C++ has to work on architectures where it isn’t, but it can be safe, for example, on an x86 CPU if you guarantee that the variable you’re updating fits into a single cache line that it doesn’t share with anything else, and you might be able to ensure this with nonstandard extensions such as __attribute__ (( aligned (x) )).

Similarly, your compiler might provide some guarantees: g++ in particular makes guarantees about how the compiler will not assume that the memory referenced by a volatile* hasn’t changed unless the current thread could have changed it. It will actually re-read the variable from memory each time you dereference it. That is in no way sufficient to ensure thread-safety, but it can be handy if another thread is updating the variable.

A real-world example might be: the writer thread maintains some kind of pointer (on its own cache line) which points to a consistent view of the data structure that will remain valid through all future updates. It updates its data with the RCU pattern, making sure to use a release operation (implemented in an architecture-specific way) after updating its copy of the data and before making the pointer to that data globally visible, so that any other thread that sees the updated pointer is guaranteed to see the updated data as well. The reader then makes a local copy (not volatile) of the current value of the pointer, getting a view of the data which will stay valid even after the writer thread updates again, and works with that. You want to use volatile on the single variable that notifies the readers of the updates, so they can see those updates even if the compiler “knows” your thread couldn’t have changed it. In this framework, the shared data just needs to be constant, and readers will use the RCU pattern. That’s one of the two ways I’ve seen volatile be useful in the real world (the other being when you don’t want to optimize out your timing loop).

There also needs to be some way, in this scheme, for the program to know when nobody’s using an old view of the data structure any longer. If that’s a count of readers, that count needs to be atomically modified in a single operation at the same time as the pointer is read (so getting the current view of the data structure involves an atomic CAS). Or this might be a periodic tick when all the threads are guaranteed to be done with the data they’re working with now. It might be a generational data structure where the writer rotates through pre-allocated buffers.

Also observe that a lot of things your program might do could implicitly serialize the threads: those atomic hardware instructions lock the processor bus and force other CPUs to wait, those memory fences could stall your threads, or your threads might be waiting in line to allocate memory from the heap.

Question 7

Unfortunately it depends.

When a variable is read and written in multiple threads, there may be 2 failures.

1) tearing. Where half the data is pre-change and half the data is post change.

2) stale data. Where the data read has some older value.

int, volatile int and std:atomic all don't tear.

Stale data is a different issue. However, all values have existed, can be concieved as correct.

volatile. This tells the compiler neither to cache the data, nor to re-order operations around it. This improves the coherence between threads by ensuring all operations in a thread are either before the variable, at the variable, or after.

This means that

volatile int x;
int y;
y =5;
x = 7;

the instruction for x = 7 will be written after y = 5;

Unfortunately, the CPU is also capable of re-ordering operations. This can mean that another thread sees x ==7 before y =5

std::atomic x; would allow a guarantee that after seeing x==7, another thread would see y ==5. (Assuming other threads are not modifying y)

So all reads of int, volatile int, std::atomic<int> would show previous valid values of x. Using volatile and atomic increase the ordering of values.

See kernel.org barriers

Question 8

Here is my attempt at bounty: - a. General answer already given above says 'use atomics'. This is correct answer. volatile is not enough. -a. If you dislike the answer, and you are on Intel, and you have properly aligned int, and you love unportable solutions, you can do away with simple volatile, using Intel strong memory ordering gurantees.

Question 9

TL;DR: Use std::atomic<int> with a mutex around it if you read multiple times.

Depends on how strong guarantees you want.

First volatile is a compiler hint and you shouldn't count on it doing something helpful.

If you use int you can suffer for memory aliasing. Say you have something like

struct {
  int x;
  bool q;
}

Depending on how this is aligned in memory and the exact implementation of CPU and memory bus it's possible that writing to q will actually overwrite x when the page is copied from the cpu cache back to ram. So unless you know how much to allocate around your int it's not guaranteed that your writer will be able to write without being overwritten by some other thread. Also even if you write you depend on the processor for reloading the data to the cache of other cores so there's no guarantee that your other thread will see a new value.

std::atomic<int> basically guarantees that you will always allocate sufficient memory, properly aligned so that you don't suffer from aliasing. Depending on the memory order requested you will also disable a bunch of optimizations, like caching, so everything will run slightly slower.

This still doesn't grantee that if your read the var multiple times you'll get the value. The only way to do that is to put a mutex around it to block the writer from changing it.

Still better find a library that already solves the problem you have and it has already been tested by others to make sure it works well.