Memory effects of synchronization in Java

https://stackoverflow.com/questions/1850270

13-09-2019
|

Question

But there is more to synchronization than mutual exclusion. Synchronization ensures that memory writes by a thread before or during a synchronized block are made visible in a predictable manner to other threads which synchronize on the same monitor. After we exit a synchronized block, we release the monitor, which has the effect of flushing the cache to main memory, so that writes made by this thread can be visible to other threads. Before we can enter a synchronized block, we acquire the monitor, which has the effect of invalidating the local processor cache so that variables will be reloaded from main memory. We will then be able to see all of the writes made visible by the previous release.

I also remember reading that on modern Sun VMs uncontended synchronizations are cheap. I am a little confused by this claim. Consider code like:

class Foo {
    int x = 1;
    int y = 1;
    ..
    synchronized (aLock) {
        x = x + 1;
    }
}

Updates to x need the synchronization, but does the acquisition of the lock clear the value of y also from the cache? I can't imagine that to be the case, because if it were true, techniques like lock striping might not help. Alternatively can the JVM reliably analyze the code to ensure that y is not modified in another synchronized block using the same lock and hence not dump the value of y in cache when entering the synchronized block?

Solution

The short answer is that JSR-133 goes too far in its explanation. This isn't a serious issue because JSR-133 is a non-normative document which isn't part of the language or JVM standards. Rather, it is only a document which explains one possible strategy that is sufficient for implementing the memory model, but isn't in general necessary. On top of that, the comment about "cache flushing" is basically totally out place since essentially zero architectures would implement the Java memory model by doing any type of "cache flushing" (and many architectures don't even have such instructions).

The Java memory model is formally defined in terms of things like visibility, atomicity, happens-before relationships and so on, which explains exactly what threads must see what, what actions must occur before other actions and other relationships using a precisely (mathematically) defined model. Behavior which isn't formally defined could be random, or well-defined in practice on some hardware and JVM implementation - but of course you should never rely on this, as it might change in the future, and you could never really be sure that it was well-defined in the first place unless you wrote the JVM and were well-aware of the hardware semantics.

So the text that you quoted is not formally describing what Java guarantees, but rather is describing how some hypothetical architecture which had very weak memory ordering and visibility guarantees could satisfy the Java memory model requirements using cache flushing. Any actual discussion of cache flushing, main memory and so on is clearly not generally applicable to Java as these concepts don't exist in the abstract language and memory model spec.

In practice, the guarantees offered by the memory model are much weaker than a full flush - having every atomic, concurrency-related or lock operation flush the entire cache would be prohibitively expensive - and this is almost never done in practice. Rather, special atomic CPU operations are used, sometimes in combination with memory barrier instructions, which help ensure memory visibility and ordering. So the apparent inconsistency between cheap uncontended synchronization and "fully flushing the cache" is resolved by noting that the first is true and the second is not - no full flush is required by the Java memory model (and no flush occurs in practice).

If the formal memory model is a bit too heavy to digest (you wouldn't be alone), you can also dive deeper into this topic by taking a look at Doug Lea's cookbook, which is in fact linked in the JSR-133 FAQ, but comes at the issue from a concrete hardware perspective, since it is intended for compiler writers. There, they talk about exactly what barriers are needed for particular operations, including synchronization - and the barriers discussed there can pretty easily be mapped to actual hardware. Much of the actual mapping is discussed right in the cookbook.

OTHER TIPS

BeeOnRope is right, the text you quote delves more into typical implementation details than into what the Java Memory Model does indeed guarantee. In practice, you may often see that y is actually purged from CPU caches when you synchronize on x (also, if x in your example were a volatile variable in which case explicit synchronization is not necessary to trigger the effect). This is because on most CPUs (note that this is a hardware effect, not something the JMM describes), the cache works on units called cache lines, which are usually longer than a machine word (for example 64 bytes wide). Since only complete lines can be loaded or invalidated in the cache, there are good chances that x and y will fall into the same line and that flushing one of them will also flush the other one.

It is possible to write a benchmark which shows this effect. Make a class with just two volatile int fields and let two threads perform some operations (e.g. incrementing in a long loop), one on one of the fields and one on the another. Time the operation. Then, insert 16 int fields in between the two original fields and repeat the test (16*4=64). Note that an array is just a reference so an array of 16 elements won't do the trick. You may see a significant improvement in performance because operations on one field will not influence the other one any more. Whether this works for you will depend on the JVM implementation and processor architecture. I have seen this in practice on Sun JVM and a typical x64 laptop, the difference in performance was by a factor of several times.

Updates to x need the synchronization, but does the acquisition of the lock clear the value of y also from the cache? I can't imagine that to be the case, because if it were true, techniques like lock striping might not help.

I'm not sure, but I think the answer may be "yes". Consider this:

class Foo {
    int x = 1;
    int y = 1;
    ..
    void bar() {
        synchronized (aLock) {
            x = x + 1;
        }
        y = y + 1;
    }
}

Now this code is unsafe, depending on what happens im the rest of the program. However, I think that the memory model means that the value of y seen by bar should not be older than the "real" value at the time of acquisition of the lock. That would imply the cache must be invalidated for y as well as x.

Also can the JVM reliably analyze the code to ensure that y is not modified in another synchronized block using the same lock?

If the lock is this, this analysis looks like it would be feasible as a global optimization once all classes have been preloaded. (I'm not saying that it would be easy, or worthwhile ...)

In more general cases, the problem of proving that a given lock is only ever used in connection with a given "owning" instance is probably intractable.

we are java developers, we only know virtual machines, not real machines!

let me theorize what is happening - but I must say I don't know what I'm talking about.

say thread A is running on CPU A with cache A, thread B is running on CPU B with cache B,

thread A reads y; CPU A fetches y from main memory, and saved the value in cache A.
thread B assigns new value to 'y'. VM doesn't have to update the main memory at this point; as far as thread B is concerned, it can be reading/writing on a local image of 'y'; maybe the 'y' is nothing but a cpu register.
thread B exits a sync block and releases a monitor. (when and where it entered the block doesn't matter). thread B has updated quite some variables till this point, including 'y'. All those updates must be written to main memory now.
CPU B writes the new y value to place 'y' in main memory. (I imagine that) almost INSTANTLY, information 'main y is updated' is wired to cache A, and cache A invalidate its own copy of y. That must have happened really FAST on the hardware.
thread A acquires a monitor and enters a sync block - at this point it doesn't have to do anything regarding cache A. 'y' has already gone from cache A. when thread A reads y again, it's fresh from main memory with the new value assigned by B.

consider another variable z, which was also cached by A in step(1), but it's not updated by thread B in step(2). it can survive in cache A all the way to step(5). access to 'z' is not slowed down because of synchronization.

if the above statements make sense, then indeed the cost isn't very high.

addition to step(5): thread A may have its own cache which is even faster than cache A - it can use a register for variable 'y' for example. that will not be invalidated by step(4), therefore in step(5), thread A must erase its own cache upon sync entering. that's not a huge penalty though.

you might want to check jdk6.0 documentation http://java.sun.com/javase/6/docs/api/java/util/concurrent/package-summary.html#MemoryVisibility

Memory Consistency Properties Chapter 17 of the Java Language Specification defines the happens-before relation on memory operations such as reads and writes of shared variables. The results of a write by one thread are guaranteed to be visible to a read by another thread only if the write operation happens-before the read operation. The synchronized and volatile constructs, as well as the Thread.start() and Thread.join() methods, can form happens-before relationships. In particular:

Each action in a thread happens-before every action in that thread that comes later in the program's order.
An unlock (synchronized block or method exit) of a monitor happens-before every subsequent lock (synchronized block or method entry) of that same monitor. And because the happens-before relation is transitive, all actions of a thread prior to unlocking happen-before all actions subsequent to any thread locking that monitor.
A write to a volatile field happens-before every subsequent read of that same field. Writes and reads of volatile fields have similar memory consistency effects as entering and exiting monitors, but do not entail mutual exclusion locking.
A call to start on a thread happens-before any action in the started thread.
All actions in a thread happen-before any other thread successfully returns from a join on that thread

So,as stated in highlighted point above:All the changes that happens before a unlock happens on a monitor is visible to all those threads(and in there own synchronization block) which take lock on the same monitor.This is in accordance with Java's happens-before semantics. Therefore,all changes made to y would also be flushed to main memory when some other thread acquires the monitor on 'aLock'.

synchronize guarantees, that only one thread can enter a block of code. But it doesn't guarantee, that variables modifications done within synchronized section will be visible to other threads. Only the threads that enters the synchronized block is guaranteed to see the changes. Memory effects of synchronization in Java could be compared with the problem of Double-Checked Locking with respect to c++ and Java Double-Checked Locking is widely cited and used as an efficient method for implementing lazy initialization in a multi-threaded environment. Unfortunately, it will not work reliably in a platform independent way when implemented in Java, without additional synchronization. When implemented in other languages, such as C++, it depends on the memory model of the processor, the re-orderings performed by the compiler and the interaction between the compiler and the synchronization library. Since none of these are specified in a language such as C++, little can be said about the situations in which it will work. Explicit memory barriers can be used to make it work in C++, but these barriers are not available in Java.

Since y is out of scope of the synchronized method there are no guarantees that changes to it are visible in other threads. If you want to guarantee that changes to y are seen the same across all threads, then all threads must use synchronization when reading/writing y.

If some threads change y in a synchronized manner and others don't then you'll get unexpected behavior. All mutable state shared between threads must be synchronized for there to be any guarantees on seeing changes between threads. All access on all threads to shared mutable state (variables) must be synchronized.

And yes, the JVM guarantees that while the lock is held no other thread can enter a region of code protected by the same lock.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow