I'll share my point of view on your questions.
Two addresses that are separated by more bytes than block's size, won't reside on the exact same cache line. Thus, if a core has the first address in its cache, and another core requests the second address, the first won't be removed from cache because of that request. So a false sharing miss won't occur.
I can't imagine how false sharing would occur when there's no concurrency at all, as there won't be anyone else but the single thread to compete for the cache line.
Taken from here, using OpenMP, a simple example to reproduce false sharing would be:
double sum=0.0, sum_local[NUM_THREADS]; #pragma omp parallel num_threads(NUM_THREADS) { int me = omp_get_thread_num(); sum_local[me] = 0.0; #pragma omp for for (i = 0; i < N; i++) sum_local[me] += x[i] * y[i]; #pragma omp atomic sum += sum_local[me]; }
Some general notes that I can think of to avoid false sharing would be:
a. Use private data as much as possible.
b. Sometimes you can use padding in order to align data, to make sure that no other variables will reside in the same cache that shared data reside.
Any correction or addition is welcome.