Are Data Races bad?

Question 1

I thank everybody for their answers, although valuable they did not actually answer the question I was hoping I asked. The answers did allow me to reason better about what I was actually asking, and in the end find something of an answer online:

http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong

So I guess my question should have been:

The C(++)11 standard defines my first example as a data race (if I don't use the "atomic" keyword), and the second one not. The first one therefore has undefined behaviour (even though there don't seem to be compiler implementations that would result in anything but x==1 && y==0|1, according to the standard any resulting value for x and y is correct compiler behaviour). I was wondering why this is. I think the Intel document answers that question pretty elaborately.

Question 2

Edit. The full quote from Wikipedia is:

C++11 introduced formal support for multithreading, and defined a data race strictly as a race condition between non-atomic variables. While race conditions in general will continue to exist, a "data race" must be avoided by the programmer, who must assure that only one thread at a time may access any variable if the access is for writing.

Now, assuming this is correct (it's wikipedia, which tends to be reasonably good on programming but can often be very wrong indeed), it's defining "data race" in this context purely as one of the clearly bad cases; those which can cause shearing of values. Such cases obviously must be avoided, so clearly data-races—defined as they are here—must be avoided.

And by this definition, neither program in your question has a data race.

I leave my original answer on race conditions generally:

The second example has a data-race too. Indeed, it has the exact same data-race as the first one.

Is this bad? That depends. Note before any of the rest. Not only are many cases bad, as I'll describe more below, but those cases that are bad tend to be particularly hard to find and fix, which in itself should lean one towards assuming the worse.

An obvious case where a data race is bad is where it corrupts data. Let's say we change your example so that x and y were larger than the architecture's word size and we're setting x = -1. We'll also assume two's-complement. Now the possible values for y are not just -1 and 0, but also -4294967296 and 4294967295.

In this case, the locking you suggest wouldn't remove the data-race completely, but would remove that part of it that could cause shearing: The only possible values of y would again be -1 and 0.

Another question is serialisation. It's often necessary to be able to consider a sequence of concurrent events as having been one of a limited set of sequential events.

For example, consider we start with X = 0 and then have:

Thread 0    Thread 1
++x         x = -50

Now, there's still the risk of sheering here that could result in a possible bogus value.

Assuming that x is word-size or smaller, we still might have an issue. There are two possible values if the operations were not concurrent. Either x could be equal to -50 (increment, then assign -50) or x could be equal to -49 (assign -50 then increment). However, concurrently it's possible for us to end up with x having a value of 1 because thread 0 reads 0, thread 1 assigns -50and then thread 0 increments and assigns 1.

Now, it's quite possible that this is perfectly okay. It's very likely though that it isn't.

As programmers we've got four possibilities:

Identify the data-race. Determine that it is harmless (or relatively harmless*), and let it be.
Identify the data-race. Determine that it can cause problems, and fix it.
Identify the data-race. Just fix it because that way we can't make a mistake in determining it is harmless when it actually isn't.
Identify the data-race. Determine that it can cause problems. Change the code so the race doesn't cause problems.

The importance of case number 2 is obvious - we turn code that has a bug into code that isn't.

The importance of case number 3 comes down to time and provability. We might well be making code less efficient (many methods for stopping data-races have at least some overhead), but it often takes less developer time to remove a race than prove it harmless, and the cost of a wrong example is marginally slower code whereas the cost of being wrong in the other direction is a hard to fix bug.

The importance of number 1 is more complicated, it can be important in some very low-level concurrent code to avoid locking, so there are cases where we want to tolerate races. Number 4 is a way to turn something from number 2 into number 1, and comes up when either the data-race is inherent to the problem (we can't remove it) or we're doing the sort of low-level concurrency that number 1 involves.

Here's an interesting example in C#:

public static SomeResource GetTheResource()
{
  get
  {
    if(_theResource == null)
      _theResource = CreateTheResource();
    return _theResource
  }
}

The data-race should be obvious; until theResource is set and all CPU's caches see the update, we might assign to it several times from different threads. Is this a bug? Many people would say it is, but actually it depends. It's possible that it's safe to have a brief period where different versions of theResource are used, and all we really lose is some efficiency in the beginning from the multiple calls to CreateTheResource(). In code with a high requirement for performance we might decide to tolerate this initial lower efficiency for the long-term efficiency gain of no locking. Or it might be vital that we lock. Or we might just lock because we don't have that pressing a need to avoid it, and it's simpler just to assume that the might be a problem.

Important Point 1: If you do decide to tolerate a race like this, you should add a comment to that effect and why. Otherwise every time someone comes across this code they'll have to check again that it's safe, rather than at most check your stated reasoning.

Important Point 2: While the principle here is language-agnostic, the details in each case often are not. In this case tolerating the race depends not just on the temporary multiple copies being safe, but also on garbage collection cleaning those excess copies up. If we were instead assigning a pointer to the heap in C++ the above would at the very best be leaky, even if otherwise safe.

A more complicated case is something like this (again a C# example, but applicable to other languages):

internal sealed class LockFreeQueue<T>
{
  private sealed class Node
  {
    public readonly T Item;
    public Node Next;
    public Node(T item)
    {
      Item = item;
    }
  }
  private volatile Node _head;
  private volatile Node _tail;
  public LockFreeQueue()
  {
    _head = _tail = new Node(default(T));
  }
#pragma warning disable 420 // volatile semantics not lost as only by-ref calls are interlocked
  public void Enqueue(T item)
  {
    Node newNode = new Node(item);
    for(;;)
    {
      Node curTail = _tail;
      if (Interlocked.CompareExchange(ref curTail.Next, newNode, null) == null)   //append to the tail if it is indeed the tail.
      {
        Interlocked.CompareExchange(ref _tail, newNode, curTail);   //CAS in case we were assisted by an obstructed thread.
        return;
      }
      else
      {
        Interlocked.CompareExchange(ref _tail, curTail.Next, curTail);  //assist obstructing thread.
      }
    }
  }    
  public bool TryDequeue(out T item)
  {
    for(;;)
    {
      Node curHead = _head;
      Node curTail = _tail;
      Node curHeadNext = curHead.Next;
      if (curHead == curTail)
      {
        if (curHeadNext == null)
        {
          item = default(T);
          return false;
        }
        else
          Interlocked.CompareExchange(ref _tail, curHeadNext, curTail);   // assist obstructing thread
      }
      else
      {
        item = curHeadNext.Item;
        if (Interlocked.CompareExchange(ref _head, curHeadNext, curHead) == curHead)
        {
          return true;
        }
      }
    }
  }
#pragma warning restore 420
}

This code doesn't prevent data-races, but rather it reacts to them. If an operation is affected by another thread, then rather than error or return an incorrect result, the thread deals with the race and returns something else (and indeed even helps the other thread in some cases).

So in summary, data-races are not in and of themselves bad things. They are though complicating things, and those complications can cause problems. When you have a data-race you have a choice between proving it's not a problem, changing your code to tolerate the race so that it's no longer a problem, or changing your code to remove the race. Of these, just removing the race is often the easiest choice.

*I don't mean "relatively harmless" in a vague way here, but relative to the alternative. E.g. if we decide to leave the race in the C# example given, it's because we've decided that the cost of redundant object creation is less harmful than the relative cost of preventing it.

Question 3

If x and y fit into a machine register then assignment is atomic by default so locks won't change the outcome. It's equally possible to get y = 0 or y = 1 in the second case as well.