Why datatypes are marked as thread-safe instead of procedures?

https://softwareengineering.stackexchange.com/questions/390282

23-02-2021
|

Question

In Rust, Send (or Sync) marker traits are used to indicate whether a value of a type (or a reference to that) can be worked on within threaded context.

However, it is an attribute of a function or a procedure that whether it is thread-safe, as frequently seen in C function man-pages (e.g. man 3 rand).

So, why rust is designed to apply such attributes to the datatypes instead of functions? like:

struct Foo { ... }

unsafe sync fn thread_safe_fn(foo: &Foo) { ... }

This way, any type can be used anywhere, but only sync functions can operate on shared data; which makes it possible to have for example a single Rc with defined operations of either atomic (sync) or non-atomic (!sync).

Solution

There is nothing intrinsic to a function that implies thread safety other than its awareness of what shared data may be modified, and how it protects that data. It is the data itself that exists in memory, and can be accessed from separate threads of execution, and not typically a function (unless you have a function whose code is modified after loading, but that's far more rare). In the case of C's rand() function, its implementation accesses shared data, and does not protect against concurrent access. From the man page you mention: "The function rand() is not reentrant, since it uses hidden state that is modified on each call."

If the language were written as you are describing, consider a situation where a utility function modifies Foo. If this utility function were marked as thread-safe, it would have to needlessly take locks and protect data that was not (necessarily) shared. Additionally, you want to only block threads from concurrent access of the same memory location, not block access to all entry points in a function, which would be a bit clumsier when the generic function has to serialize execution, instead of something protecting a specific piece of data.

OTHER TIPS

struct Foo { ... }

unsafe sync fn thread_safe_fn(foo: &Foo) { ... }
This way, any type can be used anywhere, but only sync functions can operate on shared data; which makes it possible to have for example a single Rc with defined operations of either atomic (sync) or non-atomic (!sync).

In order to protect a data structure from threaded race conditions, all accesses to that data structure need to be guarded, not just some. (Depending on the processor this includes guarding writes as well as reads, since reads must properly coordinate with the writes.)

One problem with the approach your proposing is that the system would have difficulty identifying improper usages — it would be up to the programmer to know when and where to use the sync calls vs. the non-sync calls (as it is in C) and the compiler will not be able to determine if the program is making a mistake.

By having separate types for Rc and Arc, the Rust compilation system can detect improper usages of Rc and flag them as errors. Since Rust does this, a program that passes compilation has provable properties of certain kinds of correctness.

We could simply use Arc everywhere but that would pay a cost penalty when sometimes not needed, hence Rc is also provided. It is the type system that is detecting these illegal operations on non thread safe data — with a single common data type for both thread safe and non thread safe access, Rust would not be able to do its job of guaranteeing certain desirable program properties.

Why can C get away with explaining that a function, rand, for example is not thread safe, while other functions are?

They are saying that rand is not thread safe due to the use of private global state. Other functions that are being described as thread safe simply don't have any global state — that makes them inherently thread safe with appropriate usage — it is up to the caller to provide parameters that are being used in a thread safe manner. C's thread safety claims are pretty weak: basically the potential for thread safety that is up to the programmer to realize, compared with Rust's guarantees.

The question of "Thread Safety" essentially means "if I modify shared mutable data using multiple operations on multiple different threads, will I get the correct results or not?"

So, without even thinking about how thread safety works, let us just very stupidly analyze the above sentence in English. If we want a single place to put a property such as "thread safe", where can we put it? We cannot put it in the thread, since there is more than one thread. We cannot put it on the operation, since there is more than one operation. The only place to put it is on the data!

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange