do integer reads need to be critical section protected?

Question 1

Technically yes, but no on many platforms. First, let us assume that int is 32 bits (which is pretty common, but not nearly universal).

It is possible that the two words (16 bit parts) of a 32 bit int will be read or written to separately. On some systems, they will be read separately if the int isn't aligned properly.

Imagine a system where you can only do 32-bit aligned 32 bit reads and writes (and 16-bit aligned 16 bit reads and writes), and an int that straddles such a boundary. Initially the int is zero (ie, 0x00000000)

One thread writes 0xBAADF00D to the int, the other reads it "at the same time".

The writing thread first writes 0xBAAD to the high word of the int. The reader thread then reads the entire int (both high and low) getting 0xBAAD0000 -- which is a state that the int was never put into on purpose!

The writer thread then writes the low word 0xF00D.

As noted, on some platforms all 32 bit reads/writes are atomic, so this isn't a concern. There are other concerns, however.

Most lock/unlock code includes instructions to the compiler to prevent reordering across the lock. Without that prevention of reordering, the compiler is free to reorder things so long as it behaves "as-if" in a single threaded context it would have worked that way. So if you read a then b in code, the compiler could read b before it reads a, so long as it doesn't see an in-thread opportunity for b to be modified in that interval.

So possibly the code you are reading is using these locks to make sure that the read of the variable happens in the order written in the code.

Other issues are raised in the comments below, but I don't feel competent to address them: cache issues, and visibility.

Question 2

Looking at this it seems that arm has quite relaxed memory model so you need a form of memory barrier to ensure that writes in one thread are visible when you'd expect them in another thread. So what you are doing, or else using std::atomic seems likely necessary on your platform. Unless you take this into account you can see updates out of order in different threads which would break your example.

Question 3

I think you can use C++11 to ensure that integer reads are atomic, using (for example) std::atomic<int>.

Question 4

The C++ standard says that there's a data race if one thread writes to a variable at the same time as another thread reads from that variable, or if two threads write to the same variable at the same time. It further says that a data race produces undefined behavior. So, formally, you must synchronize those reads and writes.

There are three separate issues when one thread reads data that was written by another thread. First, there is tearing: if writing requires more than a single bus cycle, it's possible for a thread switch to occur in the middle of the operation, and another thread could see a half-written value; there's an analogous problem if a read requires more than a single bus cycle. Second, there's visibility: each processor has its own local copy of the data that it's been working on recently, and writing to one processor's cache does not necessarily update another processor's cache. Third, there's compiler optimizations that reorder reads and writes in ways that would be okay within a single thread, but will break multi-threaded code. Thread-safe code has to deal with all three problems. That's the job of synchronization primitives: mutexes, condition variables, and atomics.

Question 5

Although the integer read/write operation indeed will most likely be atomic, the compiler optimizations and processor cache will still give you problems if you don't do it properly.

To explain - the compiler will normally assume that the code is single-threaded and make many optimizations that rely on that. For example, it might change the order of instructions. Or, if it sees that the variable is only written and never read, it might optimize it away entirely.

The CPU will also cache that integer, so if one thread writes it, the other one might not get to see it until a lot later.

There are two things you can do. One is to wrap in in critical section like in your original code. The other is to mark the variable as volatile. That will signal the compiler that this variable will be accessed by multiple threads and will disable a range of optimizations, as well as placing special cache-sync instructions (aka "memory barriers") around accesses to the variable (or so I understand). Apparently this is wrong.

Added: Also, as noted by another answer, Windows has Interlocked APIs that can be used to avoid these issues for non-volatile variables.