Example for a correctly synchronized program with data races in Java memory model

https://stackoverflow.com/questions/12024070

27-06-2021
|

Question

In JLS, §17.4.5. Happens-before Order, it says that

A program is correctly synchronized if and only if all sequentially consistent executions are free of data races.

According to discussion in Does a correctly synchronized program still allow data race?(Part I),we get following conclusion:

A program can be correctly synchronized and have data races.

The combination of two conclusions means that it must exists such an example:

All sequentially consistent executions of a program are data race free, but the normal executions (executions other than sequentially consistent executions) of such a program contain data race.

After heavy consideration, I still can not find such a code sample. So how about you?

Solution

It is not true that "A program can be correctly synchronized and have data races." The example by assylias in that discussion is not correctly synchronized. It is correct from the higher-level, functional standpoint—the data race it contains does not manifest itself as a bug. It is a so-called "benign" data race, but that is irrelevant when discussing the JLS definitions.

A program whose sequentially consistent execution don't contain data races is guaranteed not to contain data races in any execution, sequentially consistent or not. As the JLS says,

This is an extremely strong guarantee for programmers. Programmers do not need to reason about reorderings to determine that their code contains data races. Therefore they do not need to reason about reorderings when determining whether their code is correctly synchronized. Once the determination that the code is correctly synchronized is made, the programmer does not need to worry that reorderings will affect his or her code.

So please note that the definition of a correctly synchronized program is narrowed down to only sequentially consistent executions as a courtesy to the programmer, giving him the strong guarantee that sequentially consistent executions are the only ones he or she needs to reason about and all other executions will automatically have the same guarantee.

UPDATE

It is easy to get lost in the terminology used by the JMM and subtle misinterpretations result in deep misunderstandings later on. Therefore take these to heart:

an execution is simply a set of inter-thread actions. There is no a priori order to it. In particular, there is no time order to it.

This is a counterintuitive definition, so we must be careful about it: each time we say execution, we must be sure to imagine a bag of actions, never a string of them. Whenever we define a partial order, we should imagine several bags lined up.

a program contains instructions to perform actions. Each such instruction can be executed zero or more times, contributing zero or more distinct actions to the particular execution;
an execution may or may not have an execution order, which is a total order over all actions;
a sequentially consistent execution is what you would get if all your shared variables were volatile. This kind of execution always has a definite execution order;
a sequentially inconsistent execution is your real-world execution of a program: non-volatile variables are involved and the compiler reorders reads and writes, there are caches, thread-local stores, etc.
the synchronization order is the total order over all synchronization actions done by an execution. With respect to the execution itself it is still a partial order because not all actions are synchronization actions; most notably, reads and writes of non-volatile variables. Every execution, be it sequentially consistent or not, has a definite synchronization order;
likewise, the happens-before order is defined for a specific execution of a program and is derived as a transitive closure of the synchronization order with the program order.

It is interesting to note that if all your shared vars were volatile, then the synchronization order would become a total order and as such would meet the definition of the execution order. This way we come from a different angle to the conclusion that all executions of such a program would be sequentially consistent.

I have dug deep to get to the bottom of the JLS bug in the definition of a data race:

"When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race."

First of all, it is not the program that contains data races, but a program execution. If we refer back to the original paper defining the Java Memory Model, we'll see this corrected:

"Two accesses x and y form a data race in an execution of a program if they are from diﬀerent threads, they conﬂict, and they are not ordered by happens-before."

However, this still leaves us with actions on volatile vars being defined as data races. Consider the following happens-before graph:

Thread W        w1 ----> w2
                |
                 \
Thread R     r0 ----> r1

r1 observerd the write w1. It was preceded by another read, r0, and the write was followed by another one, w2. Now notice there is no path between r0 and either w1 or w2; likewise between r1 and w2. All these are examples of a data race by the definition.

Digging even deeper, however, I have found this post on the memoryModel mailing list. It says "a data race should be defined as conflicting actions on non-volatile variables that are not ordered by happens-before". Only with that addition would the loophole be closed, but this has still not entered the official JLS release.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow