Easiest way to implement shared integer counter in C++11 without mutexes:

Question 1

You might want to look into atomic types. You can access them without a need for a lock/mutex.

Question 2

We solved a similiar problem by declaring an array[nThreads], we then gave every thread an id ranging from 0-n the thread then can write safely at its position in the array. Then you can just sum the array to get the total sum. This however is only helpful as long as you dont need to sum the array before all the threads are dead.

To be even more efficient we had a local counter at each thread which we then before the thread died appended to the array.

example (pseudo code:)

counter[nThreads];

thread(int id)
{
    // Do stuff
    if(something happened)
       counter[id]++;   
}

or

counter[nThreads];

thread(int id)
{
    int localcounter = 0;
    //Do stuff
    if(something happened)
       localcounter++;   

    //Thread is about to die
    counter[id] = localcounter;
}

Question 3

You can use the InterlockedIncrement function.

Many functions to mutate variables in atomic ways are documented on MSDN under the banner of Synchronization Functions - they may be useful to you.

Question 4

it is not just a race condition, you could have no communication of the actual i value at all between threads if your compiler decides so.

obviously atomic is the good way. mutex is also a good way, when you don't have collisions they are as fast as atomics. They get slower only when they actually need to make the kernel fiddle with sleeping and ready thread queues. What can get taxy is if the wait signal doesn't use a condition variable in which case you might have to wait for a schedule kernel tick for your ready threads to be running, which can be very long (30ms).

atomics will get you optimality though, and may even be easier to maintain than condition variables thanks to not having to care about spurious events and notify_one versus notify_all etc.

If you check C++11-made STL's shared_ptr base classes, they contain a base_count or (base_shared_count or something) that works exactly like you need. You may also check the new boost::shared_count implementation if you will.