Question

I'm tring an implementation in C++ of a Pairing Heap, which i took from here: http://home.fnal.gov/~stoughto/build/graphviz-2.22.2/lib/vpsc/pairingheap/PairingHeap.h http://home.fnal.gov/~stoughto/build/graphviz-2.22.2/lib/vpsc/pairingheap/PairingHeap.cpp

I have compared that PairingHeap against a std::priority_queue and these are the results:

gcc 4.7 -O3, core i7 2.4Ghz rdstc instruction to measure cycles

-------------------------------------------------------------------------------

for 100.000 elements:
o std::priority_queue<int>
    - insert:           9,800,415 cycles
    - extract:         29,712,818 cycles
    - total:           39,513,233 cycles       [0.031secs]
o PairingHeap<int>
    - insert:          34,381,467 cycles
    - extract:        259,986,113 cycles
    - total:          294,367,580 cycles       [0.125secs]


-------------------------------------------------------------------------------


for 1.000.000 elements:
o std::priority_queue<int>
    - insert:         95,954,533 cycles
    - extract:       518,546,747 cycles
    - total:         614,501,280 cycles       [0.296secs]
o PairingHeap<int>
    - insert:        344,453,782 cycles
    - extract:     3,856,344,199 cycles
    - total:       4,200,797,981 cycles       [1.593secs]

-------------------------------------------------------------------------------


for 10.000.000 elements:
o std::priority_queue<int>
    - insert:        999,836,450 cycles
    - extract:    10,634,407,049 cycles
    - total:      11,634,243,499 cycles       [4.390secs]
o PairingHeap<int>
    - insert:      3,441,903,781 cycles
    - extract:    61,166,421,272 cycles
    - total:      64,608,325,053 cycles       [24.187secs]

The Pairing heap should be faster than a std::priority_queue because it should have asymptotically faster operations, but instead here the Pairing heap is very slower. I think this because the std::priority_queue uses a vector under the hoods, and this is much more cache friendly than allocating nodes for each integer, as a Pairing heap does.

So, my question is: can asymptotically better data structures be (largely) beaten by worse ones, just because they're much more cache-friendly? Is it really worth spending much time in a more complex data structure such as a Pairing heap, when a default std::priority_queue can largely be faster than it?

I just have not considered that the implementation of the Pairing heap I have used is just crap, but it seems it isn't, because other implementations I have tried are even worse! Thoughts?

Était-ce utile?

La solution

So, my question is: can asymptotically better data structures be (largely) beaten by worse ones, just because they're much more cache-friendly?

Yes, that happens all the time. There are other causes (constant factors) beside cache friendliness too. Like other uses of the same word, asymptotic here refers to something (usually, the problem size) going to infinity. A being asymptotically better than B only says that it'll eventually be better, not that it will be better (or even equal) for some given value. Note that the ratio does improve a bit for larger data sets, just not enough.

Note that even the binary heap isn't too cache friendly for somewhat large data sets (such as yours). A node's children and parents are likely to be on an entirely different page, so you only really get something out of the cache for the last few levels (or if you access elements that happen to have a similar path, but that's a given in almost any data structure). There is a variant called B-heap which improves on this, but I haven't been able to find much detail on it (just two implementations and a rant about how the RAM model of computation is misleading).

You'd have to profile to be sure, but it's possible that the repeated allocation and deallocation takes a significant chunk of time. A pool allocator (boost, or a hand-rolled one atop of a std::vector - which allows replacing pointers with integers, which may save some space) could greatly reduce this cost. The implementation also appears to use a linked lists for the children list, which probably hurts the cache even more. An array requires some additional copies, but could be an improvement in practice.

Is it really worth spending much time in a more complex data structure such as a Pairing heap, when a default std::priority_queue can largely be faster than it?

It's possible that a sufficiently large data set coupled with some optimizations (e.g. specialized allocator and clever node layout) will tip the balance in its favor. In any case, this statement is a bit self-defeating: If a pairing heap was faster than a binary heap for the expected use cases, chances are the standard library would use a pairing heap!

Also, at least in purely functional languages, a pairing heap is quite simple to implement (though it won't be very efficient). That may be of little use to you and C++, but it's something and challenges the "more complex" part.

Autres conseils

A major issue here is memory allocation and yes cache efficiency.

What you could try is to implement a fixed size allocator with a custom operator new + operator delete for the PairNode class to reduce allocation overhead (similar to the one in "More Effective C++", item 10). Additionally this approach may end up being more cache friendly as elements are more likely to have locality of reference.

I have done this with a QuadEdge structure (that suffers from similar problems) for Delaunay triangulation before and the speed increase was in excess of 10-20x IIRC. If you need to make the allocator thread-safe though then you'll pay a high cost for that in terms of performance.

As for actually answering the question of whether or not performance is better in 1 case or the other, it is unlikely to be universal and profiling on a case-by-case basis is the easiest way to know (any other method would be complicated as you can't predict the quality of the implementation without implementing it). Not only that, but different processors will vary and results may depend on the data that you tend to get.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top