Performance vs. C++ memory model

Question 1

Heap managers indeed need to synchronize, and that's a possible performance problem for multi-threaded code. It's up to the program to mitigate that if necessary. Standard libraries are also reacting, trying to get in better multi-threaded allocators.

Edit: Some thoughts about the questions in the second paragraph.

Even C++ needs to be sufficiently safe to be usable. "YDPFWYU" is nice, but if it means that you have to wrap a mutex around every allocation if you want to use the code in a multi-threaded environment, you have a big problem. It's like exceptions, really: even code that doesn't actively use them should be somewhat aware that it might be used in a context where they exist, and both the programmer and the compiler need to be aware of that. The compiler needs to create exception support code/data structures, while the programmer needs to write exception-safe code. Multi-threading is the same, only worse: any piece of code you write might be used in a multi-threaded environment, so you need to write thread-safe code, and the compiler/environment needs to be aware of threading (forgo some very unsafe optimizations, and have a thread-safe allocator).

These are the points in C++ where you pay even for what you don't use, as far as the standard is concerned. Your particular compiler might give you an escape hatch (disable exceptions, use single-threaded runtime library), but that's no longer real C++ then.

That said, even (or especially) if you have a single global allocator lock, the overhead for a single-threaded program is minimal: locks are only expensive when under contention. An uncontested mutex lock/unlock is not very significant compared to the rest of the allocator operation.

Under contention, the story is different, which is where custom allocators possibly come in.

As I briefly mentioned above, one other place in C++ is slowed down very slightly by the mere existence of multi-threading: the prohibition on some particular optimizations. The compiler cannot invent reads and writes (especially writes) to possibly shared variables (like globals or things you handed out a pointer to) in code paths that wouldn't ordinarily have these accesses. This may slow down very specific pieces of code, but overall in a program, it's very unlikely that you'll notice.

Question 2

You are mixing allocation and access of heap memory.

Multi-threaded heap allocation is indeed synchronized, but at a C library level, at least in all modern (con)current OSes' C libraries. There may be specific-purpose C libraries that don't do this. See for example the old single- and multithreaded C runtime library for MSVC (note how new versions of MSVS deprecate and even remove single-threaded variants). I assume glibc has a similar mechanism, and is probably also solely multithreaded, and so always synchronized. I haven't heard anyone complain about multithreaded memory allocation speeds, so if you have a concrete complaint, I'd like to see it properly explained and documented with reproducible code.

Access of heap memory (i.e. after a call to new or malloc has returned) is not protected by any mechanism whatsoever. C++11 gives you mutex and other synchronization possibilities that you, as a user, need to implement in your code if you want to protect from race conditions. If you do not, no purrformance is lost.

Question 3

The compilers really not forced to be NOT optimizing. Always were possible to made a very bad compilers and "standard" libraries. And nowadays it is nothing else but bad quality. Despite it may be advertised as the "only real right C++".

"any piece of code you write might be used in a multi-threaded environment, so you need to write thread-safe code, and the compiler/environment needs to be aware of threading" - it is a clear stupidness.

Good implementation always can provide a normal way to optimize the single threaded code (and necessary libraries...), and the code, which not using exceptions, and allow the other features...

(For example, threading requires some certain functions to coordinate threads and also to create threads, and at link time the use of them are visible and may affect toolchain... Or at first call to thread creation function it may affect the memory allocation method (and have other effects) And there may be other good ways, like special switches to compiler and etc...)

Question 4

Not really. Both physical memory and backing store are system resources on modern operating systems. So allocations of them and accesses to them have to be properly synchronized.

The case of threads sharing virtual memory is just a special case of the many other ways scheduling entities can share virtual memory. Consider two processes that memory map the same library or data file.

The only extra overhead with threads is modifications to the virtual memory map because threads share a virtual memory map. Much of the synchronization overhead is unavoidable. For example, if you're unmapping something, some resource typically has to be returned to a system-level pool, and that requires synchronization anyway.

On many platforms, some special platform-specific thing is needed to let other threads running at the same time know that their view of virtual memory has changed. But this disappears anyway if there are no other threads since there would be nothing to notify.

It is simply a reality that some features have costs even if they're not being used. The existence of swap logic and checks in your kernel has some costs even if you never swap. Engineers are realists and have to balance costs and benefits.

Question 5

Both physical memory and backing store are system resources on modern operating systems. So allocations of them and accesses to them have to be properly synchronized.

The case of threads sharing virtual memory is just a special case of the many other ways scheduling entities can share virtual memory.

As soon it is features of Operating System, there are no need to use additional code in the C/C++ allocation functions in application program (really, of course, multi threading needs special additional synchronization in Standard Lib. and additional "system calls" and see the question at beginning).

Real trouble may be having many types (single- and multi - threading) of the same library (Standard C/C++ library and also others) in the system... But...