Question

I am looking into doing some quadratic programming, and have seen different libraries. I have seen various Eigen variants of QuadProg++ (KDE Forums, Benjamin Stephens, StackOverflow posts). Just as a test, I forked wingsit's Eigen variant, available on GitHub, to implement compile-time-sized problems to measure performance via templates.

I'm finding that I have worse performance in the templating case than the dynamically sized (MatrixXD / VectorXD) case from wingsit's code. I know this is not a simple question, but can anyone see a reason why this might be?

Note: I do need to increase the problem size / iteration count, will post that once I can.

EDIT: I am using GCC 4.6.3 on Ubuntu 12.04. These are the flags that I am using (modified from wingsit's code):

CFLAGS  = -O4 -Wall -msse2 -fopenmp      # option for obj
LFLAGS  = -O4 -Wall -msse2 -fopenmp      # option for exe (-lefence ...)
Était-ce utile?

La solution

The benefits of static-sized code generally diminishes as the sizes grow bigger. The typical benefits of static-sized code mainly include (but not limited to) the following:

  • stack-based allocations which are faster than heap allocations. However, at large sizes, stack-based allocations are no longer feasible (stack-overflow), or even beneficial from a point of view of pre-fetching and locality of reference.
  • loop unrolling when the compiler sees a small static-sized loop, it can just unroll it, and possibly use SSE instructions. This doesn't apply at larger sizes.

In other words, for small sizes (up to maybe N=12 or so), static-sized code can be much better and faster than the equivalent dynamically-sized code, as long as the compiler is fairly aggressive about in-lining and loop unrolling. But, when sizes are larger, there is no point.

Additionally, there are a number of drawbacks about static-sized code too:

  • No efficient move-semantics / swapping / copy-on-write strategies since such code is usually implemented with static arrays (in order to have the benefits mentioned above) which cannot be simply swapped (as in, swapping the internal pointer).
  • Larger executables which contain functions that are spread out. Say you have one function template to multiply two matrices, and so, a new compiled function (inlined or not) is generated for each size combination. Then, you have some algorithm that does a lot of matrix multiplications, well, each multiplication will have to jump to (or execute inline) the specialization for that size combination. At the end, you end up with a lot more code that needs to be cached, and then, cache misses become a lot more frequent, and that will completely destroy the performance of your algorithm.

So, the lesson to draw from that is to use static-sized vectors and matrices only for small things like 2 to 6 dimensional vectors or matrices. But beyond that, it's preferable to go with dynamically-sized code (or maybe try static-sized code, but verify that it performs better, because it is not guaranteed to do so). So, I would advise you to reconsider your idea of using static-sized code for larger problems.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top