Do I need a semaphore when reading from a global structure?

https://stackoverflow.com/questions/265708

06-07-2019
|

Question

A fairly basic question, but I don't see it asked anywhere.

Let's say we have a global struct (in C) like so:

struct foo {
  int written_frequently1;
  int read_only;
  int written_frequently2;
};

It seems clear to me that if we have lots of threads reading and writing, we need a semaphore (or other lock) on the written_frequently members, even for reading, since we can't be 100% sure that assignments to this struct will be atomic.

If we want lots of threads to read the read_only member, and none to write, to we need a semaphore on the struct access just for reading?

(I'm inclined to say no, because the fact that the locations immediately before and after are constantly changed shouldn't affect the read_only member, and multiple threads reading the value shouldn't interfere with each other. But I'm not sure.)

[Edit: I realize now I should have asked this question much better, in order to clarify very specifically what I meant. Naturally, I didn't really grok all of the issues involved when I first asked the question. Of course, if I comprehensively edit the question now, I will ruin all of these great answers. What I meant is more like:

struct bar {
  char written_frequently1[LONGISH_LEN];
  char read_only[LONGISH_LEN];
  char written_frequently2[LONGISH_LEN];
};

The major issue I asked about is, since this data is part of a struct, is it at all influenced by the other struct members, and might it influence them in return?

The fact that the members were ints, and therefore writes are likely atomic, is really just a red herring in this case.]

Solution 10

Many thanks to all the great answerers (and for all the great answers).

To sum up:

If there is a read-only member of a struct (in our case, if the value is set once, long before any thread might want to read it), then threads reading this member do not need locks, mutexes, semaphores, or any other concurrency protection.

This is true even if the other members are written to frequently. The fact that the different variables are all part of the same struct makes no difference.

OTHER TIPS

You need a mutex to guarantee that an operation is atomic. So in this particular case, you may not need a mutex at all. Specifically, if each thread writes to one element and the write is atomic and the new value is independent of the current value of any element (including itself), there is no problem.

Example: each of several threads updates a "last_updated_by" variable that simply records the last thread that updated it. Clearly, as long as the variable itself is updated atomically, no errors will occur.

However, you do need a mutex to guarantee consistency if a thread reads or writes more than one element at a time, particularly because you mention locking an element rather than the entire structure.

Example: a thread updates the "day", "month" and "year" elements of a structure. This must happen atomically, lest another thread read the structure after the "month" increments but before the "day" wraps to 1, to avoid dates such as February 31. Note that you must honor the mutex when reading; otherwise you may read an erroneous, half-updated value.

If the read_only member is actually read only, then there is no danger of the data being changed and therefore no need for synchronization. This could be data that is set up before the threads are started.

You will want synchronization for any data that can be written, regardless of the frequency.

"Read only" is a bit misleading, since the variable is written to at least once when it's initialized. In that case you still need a memory barrier between the initial write and subsequent reads if they're in different threads, or else they could see the uninitialized value.

Readers need mutexes, too!

There seems to be a common misconception that mutexes are for writers only, and that readers don't need them. This is wrong, and this misconception is responsible for bugs that are extremely difficult to diagnose.

Here's why, in the form of an example.

Imagine a clock that updates every second with the code:

if (++seconds > 59) {        // Was the time hh:mm:59?
   seconds = 0;              // Wrap seconds..
   if (++minutes > 59)  {    // ..and increment minutes.  Was it hh:59:59?
     minutes = 0;            // Wrap minutes..
     if (++hours > 23)       // ..and increment hours.  Was it 23:59:59?
        hours = 0;           // Wrap hours.
    }
}

If the code is not protected by a mutex, another thread can read the hours, minutes, and seconds variables while an update is in progress. Following the code above:

[Start just before midnight] 23:59:59
[WRITER increments seconds]  23:59:60
[WRITER wraps seconds]       23:59:00
[WRITER increments minutes]  23:60:00
[WRITER wraps minutes]       23:00:00
[WRITER increments hours]    24:00:00
[WRITER wraps hours]         00:00:00

The time is invalid from the first increment until the final operation six steps later. If a reader checks the clock during this period, it will see a value that may be not only incorrect but illegal. And since your code is likely to depend on the clock without displaying the time directly, this is a classic source of "ricochet" errors that are notoriously difficult to track down.

The fix is simple.

Surround the clock-update code with a mutex, and create a reader function that also locks the mutex while it executes. Now the reader will wait until the update is complete, and the writer won't change the values mid-read.

No.

In general you need semaphores to prevent concurrent access to resources (an int in this case). However, since the read_only member is read only, it won't change between/during accesses. Note that it doesn't even have to be an atomic read — if nothing changes, you're always safe.

How are you setting read_only initially?

If all the threads are only reading, you don't need a semaphore.

You might enjoy reading any one of these papers on practical lock free programming, or just dissecting and understanding the provided snippets.

I would hide each field behind behind a function call. The write-only fields would have a semaphore. The read-only just returns the value.

Adding to previous answers:

In this case the natural synchronization paradigm is mutual exclusion, not semaphores.
I agree that you don't need any mutex on readonly variables.
If the read-write part of the structure has consistency constraints, in general you will need one mutex for all of them, in order to keep the operations atomic.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow