Why is it so slow iterating over a big std::list?

https://stackoverflow.com/questions/1402483

05-07-2019
|

Question

As title suggests, I had problems with a program of mine where I used a std::list as a stack and also to iterate over all elements of the list. The program was taking way too long when the lists became very big.

Does anyone have a good explanation for this? Is it some stack/cache behavior?

(Solved the problem by changing the lists to std::vector and std::deque (an amazing data structure by the way) and everything suddenly went so much faster)

EDIT: I'm not a fool and I don't access elements in the middle of the lists. The only thing I did with the lists was to remove/add elements at the end/beginning and to iterate through all elements of the list. And I always used iterators to iterate over the list.

Solution

Lists have terrible (nonexistent) cache locality. Every node is a new memory allocation, and may be anywhere. So every time you follow a pointer from one node to the next, you jump to a new, unrelated, place in memory. And yes, that hurts performance quite a bit. A cache miss may be two orders of magnitudes slower than a cache hit. In a vector or deque, pretty much every access will be a cache hit. A vector is one single contiguous block of memory, so iterating over that is as fast as you're going to get. A deque is several smaller blocks of memory, so it introduces the occasional cache miss, but they'll still be rare, and iteration will still be very fast as you're getting mostly cache hits.

A list will be almost all cache misses. And performance will suck.

In practice, a linked list is hardly ever the right choice from a performance point of view.

Edit: As a comment pointed out, another problem with lists is data dependencies. A modern CPU likes to overlap operations. But it can't do that if the next instruction depends on the result of this one.

If you're iterating over a vector, that's no problem. You can compute the next address to read on the fly, without ever having to check in memory. If you're reading at address x now, then the next element will be located at address x + sizeof(T) where T is the element type. So there are no dependencies there, and the CPU can start loading the next element, or the one after it, immediately, while still processing an earlier element. That way, the data will be ready for us when we need it, and this further helps mask the cost of accessing data in RAM.

In a list, we need to follow a pointer from node i to node i+1, and until i+1 has been loaded, we don't even know where to look for i+2. We have a data dependency, so the CPU is forced to read nodes one at a time, and it can't start reading future nodes ahead of time, because it doesn't yet know where they are.

If a list hadn't been all cache misses, this wouldn't have been a big problem, but since we're getting a lot of cache misses, these delays are costly.

OTHER TIPS

It is due to the large amounts of cache misses you get when using a list. With a vector the surrounding elements are stored in the processors cache.

Have a look at the following stackoverflow thread.

There is a cache issue: all data in vector are stored in a contiguous chunk, and each list element is allocated separately and may happen to be stored in quite a random place of memory, which leads to more cache misses. However, I bet that you encounter one of the issues described in the other answers.

The simple answer is because iterating over a vector isn't iterating at all, it's just starting at the base of an array and reading the elements one after another.

I see this is marked C++, not C, but since they do the same thing under the covers it's worth pointing out that you can add elements to the beginning and end of an array by allocating it arbitrarily large, and realloc()ing and memmove()ing between 2 companion arrays if and when you run out of room. Very fast.

The trick to adding elements to the beginning of an array is to bias the logical start of the array by advancing the pointer into the array at the start, and then backing it up when adding elements at the front. (also the way a stack is implemented)

In exactly the same way, C can be made to support negative subscripts.

C++ does all this for you with the vector STL class, but still worth remembering what's going on under the covers.

[Edit: I stand corrected. std::list doesn't have operator[]. Sorry.]

It's hard to tell from your description, but I suspect you were trying to access the items randomly (i.e., by index):

for(int i = 0; i < mylist.size(); ++i) { ... mylist[i] ... }

Instead of using the iterators:

for(list::iterator i = mylist.begin(); i != mylist.end(); ++i) { ... (*i) ... }

Both "vector" & "deque" are good at random access, so either will perform adequately for those types---O(1) in both cases. But "list" is not good at random access. Accessing the list by index would take O(n^2) time, versus O(1) when using iterators.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow