Pairing Heap vs std::priority_queue

Question 1

So, my question is: can asymptotically better data structures be (largely) beaten by worse ones, just because they're much more cache-friendly?

Yes, that happens all the time. There are other causes (constant factors) beside cache friendliness too. Like other uses of the same word, asymptotic here refers to something (usually, the problem size) going to infinity. A being asymptotically better than B only says that it'll eventually be better, not that it will be better (or even equal) for some given value. Note that the ratio does improve a bit for larger data sets, just not enough.

Note that even the binary heap isn't too cache friendly for somewhat large data sets (such as yours). A node's children and parents are likely to be on an entirely different page, so you only really get something out of the cache for the last few levels (or if you access elements that happen to have a similar path, but that's a given in almost any data structure). There is a variant called B-heap which improves on this, but I haven't been able to find much detail on it (just two implementations and a rant about how the RAM model of computation is misleading).

You'd have to profile to be sure, but it's possible that the repeated allocation and deallocation takes a significant chunk of time. A pool allocator (boost, or a hand-rolled one atop of a std::vector - which allows replacing pointers with integers, which may save some space) could greatly reduce this cost. The implementation also appears to use a linked lists for the children list, which probably hurts the cache even more. An array requires some additional copies, but could be an improvement in practice.

Is it really worth spending much time in a more complex data structure such as a Pairing heap, when a default std::priority_queue can largely be faster than it?

It's possible that a sufficiently large data set coupled with some optimizations (e.g. specialized allocator and clever node layout) will tip the balance in its favor. In any case, this statement is a bit self-defeating: If a pairing heap was faster than a binary heap for the expected use cases, chances are the standard library would use a pairing heap!

Also, at least in purely functional languages, a pairing heap is quite simple to implement (though it won't be very efficient). That may be of little use to you and C++, but it's something and challenges the "more complex" part.

Question 2

A major issue here is memory allocation and yes cache efficiency.

What you could try is to implement a fixed size allocator with a custom operator new + operator delete for the PairNode class to reduce allocation overhead (similar to the one in "More Effective C++", item 10). Additionally this approach may end up being more cache friendly as elements are more likely to have locality of reference.

I have done this with a QuadEdge structure (that suffers from similar problems) for Delaunay triangulation before and the speed increase was in excess of 10-20x IIRC. If you need to make the allocator thread-safe though then you'll pay a high cost for that in terms of performance.

As for actually answering the question of whether or not performance is better in 1 case or the other, it is unlikely to be universal and profiling on a case-by-case basis is the easiest way to know (any other method would be complicated as you can't predict the quality of the implementation without implementing it). Not only that, but different processors will vary and results may depend on the data that you tend to get.