質問

Lets discuss a case when I have a huge std::vector. I need to iterate on all elements and call print function. There are two cases. If I store my objects in the vector, and the objects will be next to each other in memory, or I allocate my object is the heap, and store the pointers of the objects in the vector. In this case the objects will be distributed in all over the RAM.

In case copies of the objects are stored in std::vector<A>, when CPU brings data from RAM to CPU cache then it brings a chunk of memory, which contains multiple elements of the vector. In this case when you iterate on each element and call a function, then you know that multiple elements will be processed and only then the CPU will go to RAM to request the remaining part of data to process. And this is good because CPU does not have a lot of free cycles.

What about the case of the std::vector<A*>? When it brings a chunk of pointers is it easy for CPU to obtain objects by pointer? Or it should request from RAM the objects on which you call some functions and there will be cache misses and free CPU cycles? Is it bad compared with the case above in the aspect of performance?

役に立ちましたか?

解決

At least in a typical case, when the CPU fetches a pointer (or a number of pointers) from memory, it will not automatically fetch the data to which those pointers refer.

So, in the case of the vector of pointers, when you load the item that each of those pointers refers to, you'll typically get a cache miss, and access will be substantially slower than if they were stored contiguously. This is particularly true when/if each item is relatively small, so a number of them could fit in a single cache line (for some level of cache--keep in mind that a current processor will often have two or three levels of cache, each of which might have a different line size).

It may, however, be possible to mitigate this to some degree. You can overload operator new for a class to control allocations of objects of that class. Using this, you can at least keep objects of that class together in memory. That doesn't guarantee that the items in a particular vector will be contiguous, but could improve locality enough to make a noticeable improvement in speed.

Also note that the vector allocates its data via an Allocator object (which defaults to std::allocator<T>, which, in turn, uses new). Although the interface is kind of a mess so it's harder than you'd generally like, you can define an allocator to act differently if you wish. This won't generally have much effect on a single vector, but if (for example) you have a number of vectors (each of fixed size) and want them to use memory next to each other, you could do that via the allocator object.

他のヒント

If I store my objects in the vector, and the objects will be next to each other in memory, or I allocate my object is the heap

Regardless of using std::vector<A> or std::vector<A *>, the inner buffer of the vector will be allocated in the heap. You could, though, use an effecient memory pool to manage allocations and deletions, but you're still going to work with data on the heap.

Is it bad compared with the case above in the aspect of performance?

In the case of using std::vector<A *> without an specialized memory menagement, you may be lucky as to make the allocations and always get data nicely aligned in memory, but it is generally better to have the contiguous allocations performed by std::vector<A>. In the former case, it may take longer to have to reallocate the entire vector (since pointers are usually smaller than regular structs), but it will suffer from locality (considering memory accesses).

When it brings a chunk of pointers is it easy for CPU to obtain objects by pointer?

No, it isn't. CPU doesn't know they're pointers (everything CPU sees is just a bunch of bits, no semantics involved) until it fetches "dereferencing" instruction.

Or it should request from RAM the objects on which you call some functions and there will be cache misses and free CPU cycles?

That's right. CPU will try to load data corresponding to a cached pointer but it's likely that this data is located somewhere far away from recently accessed memory, so it'd be a cache miss.

Is it bad compared with the case above in the aspect of performance?

If the only thing you care about is accessing elements, then yes, it's bad. Yet in some cases vector of pointers is preferable. Namely, if your objects don't support moving (C++11 isn't mainstream yet) then vector copying becomes more expensive. Even if don't copy your vector it may be the case when you don't know in advance number of stored elements, so you can't call reverse(n) beforehand. Then all your objects will be copied when vector will exhaust its capacity and will be forced to resize.

But in the end it depends on concrete type. If your objects is small (tiny structs, ints or floats) then it's obviously better to work with then by copying because of overhead of pointers would be too big.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top