std::list thread_safety

https://stackoverflow.com/questions/11341834

19-06-2021
|

Вопрос

I have a list where one thread just does push_back and other thread occasionally loop through the list and prints all the elements. Do I need a lock in this case?
I have pointers to the elements in some other object. Is it safe to have? I know that vector will move all the objects when it needs more space so pointers will be invalidated.

mylist.push_back(MyObj(1));
if(someCond)
{
_myLastObj = &mylist.back();
}

_myLastObj is of type MyObj*

If I had used a vector, the object would have been moved to a different location and the pointer would be pointing to garbage. Is it safe with a list?

Решение

yes, you need a lock (some form of synchronization).
pointers to elements of a std::list are invalidated only if that same element is removed from the list. Since your code never removes anything from the list, your pointers are safe.

For a concrete reason why you need the lock, consider for example that list::push_back is permitted to do the following, in this order:

allocate a new node,
set the "next" pointer of the tail of the list to the new node,
set the "next" pointer of the new node to null (or some other end-of-list-marker),
set the "previous" pointer of the new node to the previous tail,
set the list's own pointer-to-tail to the new node.

If your reader thread comes in between 2 and 3, then it will go from the previous tail, to the new node, then try to follow an uninitialized pointer. Boom.

In general though, you need synchronization because it's the only way to guarantee that changes made in the writer thread are published to the reader thread in any kind of sensible order (or at all).

Your code should be correct if you imagine that different threads run on different planets, each with its own copy of your program's memory, and transmit changes to each other (a) when you use a synchronization object (or atomic variable in C++11), plus (b) when you don't use a synchronization object but transmitting some particular partial change will break your code (such as one half of a two-word object or in this case one of two pointer writes that you need to occur in a particular order). Sometimes this model is more conservative than strictly necessary, and results in slower code. But a less conservative model relies on implementation-specific details of the threading system and memory model.

Другие советы

I wanted to find out if lists were thread safe in general, so I asked and found this thread. The conclusion I've come to, is that, in the current implementation of std::list in gcc libstdc++, you may safely modify a list in one thread, and arbitrarily iterate through lists concurrently, but don't dare have two threads modifying the same list (without syncronization). Furthermore, this behavior should not be depended upon. I've ripped up the library code to point out the issues in a bit greater detail. I hope that it is helpful.

First let's start with the general question of thread safety for lists. I figured it would be nice to "prove" that lists are unsafe by example, so I threw the following code together.

#include <iostream>
#include <list>
#include <mutex>
#include <thread>

using namespace std;

list<int> unsafe_list;

class : public list<int> {
  mutex m;
  public:
  void push_back(int i) {
    lock_guard<mutex> lock{m};
    list<int>::push_back(i);
  }
} safe_list;

template <typename List> void push(List &l) {
  for (auto i = 0; i < 10000; ++i)
    l.push_back(0);
}

void report_size(const list<int> &li, char const *name) {
  size_t count{};
  for (auto && i : li) ++count;
  cout << name << endl;
  cout << "size() == " << li.size() << endl;
  cout << "count  == " << count << endl;
}

int main() {
  auto unsafe = []() { push(unsafe_list); };
  auto safe = []() { push(safe_list); };
  thread pool[]{
      thread{unsafe}, thread{unsafe}, thread{unsafe},
      thread{safe},   thread{safe},   thread{safe},
  };
  for (auto &&t : pool)
    t.join();
  report_size(unsafe_list, "unsafe_list");
  report_size(safe_list, "safe_list");
}

The output I got was:

unsafe_list
size() == 19559
count  == 390
safe_list
size() == 30000
count  == 30000

Yikes. That means that hardly any elements that I pushed ended up on the list. But it's worse than that! It not only doesn't have the right number of elements, it thinks it has a different number than it actually does, and that number isn't what I want either! While that means there is almost certainly a memory leak, when I ran it with valgrind the operations all completed successfully. I've heard that valgrind and other tools can be less than helpful when trying to deal with concurrency, I guess this is evidence of that.

First I tried pushing 10 elements at a time or so, but nothing fishy happened. I figured that it was managing to push everything within its time-slice, so I upped it to 10000 and got the results I wanted. Just a note for anyone trying to duplicate the experiment, it may work or not depending on system configuration and scheduling algorithm, etc.

Given the nature of linked lists, I was expecting such an experiment to result in a seg-fault or otherwise corrupted list. A seg-fault would be a mercy if this was the cause of some bug you were looking for.

What the heck is going on

Here I'll explain precisely what has happened and why (or at least give an extremely plausible explanation). If you are uninitialized to concurrency issues, consider this an intro. If you are an expert, please tell me where I'm wrong or incomplete.

I was curious, so I just looked at the gcc libstdc++ implementation. In order to explain the observed behavior, a quick explanation of how the list works is in order.

Implementation Details

There is nothing at all interesting or strange about the underlying structure or algorithm, but there are various C++ implementation details that need to be mentioned. First of all, the nodes of the list all derive from a common base class that only stores two pointers. In this way, all of the behavior of the list is encapsulated. The actual nodes, other than deriving from the base, are structs with the non-static data member __gnu_cxx::__aligned_membuf<_Tp> _M_storage. These nodes are aware of the value_type of the list, and derive from the allocator_type rebound to _List_node<_Tp>. The purpose of these nodes is to get and release storage for the list, and use their base to maintain the structure of the data. (I recommend this paper for an explanation of how types are factored out of iterators, it may shed some light on why certain things are implemented the way they are http://www.stroustrup.com/SCARY.pdf. For the masochistic, watch this wizard explain the beautiful nightmare that is c++ allocators https://www.youtube.com/watch?v=YkiYOP3d64E). The list then handles construction and destruction, and provides the interface to the library user, blah blah blah.

A major advantage of having a common (type-ignorant) base class for the nodes is that you can have arbitrary nodes linked together. This isn't very helpful if done with reckless abandon, but they use it in a controlled way. The "tail node" is not of type value_type, but of type size_t. The tail node holds the size of the list! (It took me a few minutes to figure out what was going on, but it was fun so no big deal. The major advantage of this is that every list in existence can have the same type of tail node, so there is less code duplication for handling the tail node, and the list only needs one non-static data member to do what it needs to do).

So, when I push a node to the back of the list, the end() iterator is passed to the following function:

 template<typename... _Args>
   void
   _M_insert(iterator __position, _Args&&... __args)
   {
 _Node* __tmp = _M_create_node(std::forward<_Args>(__args)...);
 __tmp->_M_hook(__position._M_node);
 this->_M_inc_size(1);
   }

_M_create_node() eventually uses the right allocator to get storage for the node, then attempts to construct an element there with the arguments provided. The "point" of the _M_hook() function is to point pointers to pointers to which they should point, and is listed here:

void
_List_node_base::
_M_hook(_List_node_base* const __position) _GLIBCXX_USE_NOEXCEPT
{
  this->_M_next = __position;
  this->_M_prev = __position->_M_prev;
  __position->_M_prev->_M_next = this;
  __position->_M_prev = this;
}

The order in which pointers are manipulated is important. It is the reason why I claim you can iterate through while manipulating the list concurrently. More on this later. Then, the size is adjusted:

void _M_inc_size(size_t __n) { *_M_impl._M_node._M_valptr() += __n; }

As I said before, the list has a tail node of type size_t, so, as you can guess, _M_impl._M_node._M_valptr() retrieves a pointer to this number, and then +='s it the right amount.

The Observed Behavior

So, what is happening? The threads are entering into a data race (https://en.cppreference.com/w/cpp/language/memory_model) in the _M_hook() and _M_inc_size() functions. I can't find a nice picture online, so lets say that T is the tail, B is the "back," and we want to push 1 to the back. That is, we have the list (fragment) B <-> T, and we want B <-> 1 <-> T. Eventually, 1 calls _M_hook on T, and then the following occurs:

1 points forwards to T
1 points backwards to B
B points forwards to 1
T points backwards to 1

This way, no location is ever "forgotten." Now say that 1 and 2 are being pushed back in different threads on the same list. It is completely plausible that steps (1) and (2) complete for 1, then 2 gets completely pushed back, then (1) must finish up. What happens in this case? We have the list B <-> 2 <-> T, but 1 is pointing to B and T, so when their pointers are adjusted the list looks like B <-> 1 <-> T, and that's a memory leak son.

As far as iterating through, it doesn't matter if you go backwards or forwards, you will always get incremented through the list properly. This behavior does not seem to be guaranteed by the standard, however, so if codes depends on this behavior it is fragile.

What about the size???

Okay, so that is like concurrency 101, an old story probably told better many times, I hope that it is at least worthwhile seeing the library code. The size issue I think is a little more interesting, and I certainly learned something from figuring it out.

Basically, because the value being incremented isn't a "local" variable, its value must be read into a register, one is added to that value, then that value is stored back into the variable. Let's look at some assembly (my assembly game is weak, please don't be nice if you have a correction). Consider the following program:

int i{};
int main() {
  ++i;
}

When I run objdump -D on the object, I get:

Disassembly of section .text:

0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   8b 05 00 00 00 00       mov    0x0(%rip),%eax        # a <main+0xa>
   a:   83 c0 01                add    $0x1,%eax
   d:   89 05 00 00 00 00       mov    %eax,0x0(%rip)        # 13 <main+0x13>
  13:   b8 00 00 00 00          mov    $0x0,%eax
  18:   5d                      pop    %rbp
  19:   c3                      retq

4: moves the value of i into register eax. 0x1 is added to eax, then eax is moved back into i. So, what does this have to do with data races? Take another look at the function which updates the size of the list:

void _M_inc_size(size_t __n) { *_M_impl._M_node._M_valptr() += __n; }

It is perfectly plausible that the current size of the list is loaded into a register, then another thread operating on this list steps on our operation. So, we have the old value of the list stored in the register, but we must save that state and transfer control to someone else. Maybe they successfully add an item to the list and update the size, then give us back control. We restore our state, but our state is no longer valid! We have the old size of the list, which we then increment, and whose value we store back into memory, forgetting about the operation the other thread performed.

One last thing

As I mentioned before, the "locality" of i came into play in the above program. The importance of this can be seen in the following:

int main() {
  int i{};
  ++i;
}

Disassembly of section .text:

0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
   b:   83 45 fc 01             addl   $0x1,-0x4(%rbp)
   f:   b8 00 00 00 00          mov    $0x0,%eax
  14:   5d                      pop    %rbp
  15:   c3                      retq

Here it can be seen that no value is stored to a register, and no register is pushed to some variable. Unfortunately, as far as I can tell, this isn't some nice trick to avoid problems with concurrency, as multiple threads operating on the same variable will necessarily have to operate on it through memory access, and not only through registers. I'm quickly getting out of my league here, but I'm pretty sure that is the case. The next best thing is to use atomic<int>, but this damn thing is already too long.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow