What is "false sharing"? How to reproduce / avoid it?

https://stackoverflow.com/questions/22766191

24-06-2023
|

Question

Today I got a different understand with my professor on the Parallel Programming class, about what is "false sharing". What my professor said makes little sense so I pointed it out immediately. She thought "false sharing" will cause a mistake in the program's result.

I said, "false sharing" happens when different memory address are assigned to the same cache line, writing data to one of it will cause another being kicked out of the cache. If the processors write between the two false sharing address turn and turn about, both of them could not stay on the cache so all operations will result in the access of DRAMs.

That's my opinion so far. In fact I'm not definitely sure about what I said either... If I got a misunderstanding just point it out please.

So there are some questions. The cache is assumed 64 bytes aligned, 4-way set-associative.

Is it possible that two address separated by more than 64 bytes are “false sharing”?
Is it possible that a single threaded program encountered a "false sharing" issue?
What's the best code example to reproduce the "false sharing"?
In general, what should be noted to avoid "false sharing" for programmers?

Solution

I'll share my point of view on your questions.

Two addresses that are separated by more bytes than block's size, won't reside on the exact same cache line. Thus, if a core has the first address in its cache, and another core requests the second address, the first won't be removed from cache because of that request. So a false sharing miss won't occur.
I can't imagine how false sharing would occur when there's no concurrency at all, as there won't be anyone else but the single thread to compete for the cache line.

Taken from here, using OpenMP, a simple example to reproduce false sharing would be:

double sum=0.0, sum_local[NUM_THREADS];

#pragma omp parallel num_threads(NUM_THREADS)
{
    int me = omp_get_thread_num();
    sum_local[me] = 0.0;

    #pragma omp for
    for (i = 0; i < N; i++)
        sum_local[me] += x[i] * y[i];

    #pragma omp atomic
    sum += sum_local[me];
}

Some general notes that I can think of to avoid false sharing would be:

a. Use private data as much as possible.

b. Sometimes you can use padding in order to align data, to make sure that no other variables will reside in the same cache that shared data reside.

Any correction or addition is welcome.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow