Turn on optimisations.
Any library or set of functions that you link with will be compiled WITH optimisation (unless the names of the developer are Kermit, Swedish Chef, Miss Peggy (project manager) and Cookie Monster (tester) - in other words, the development team is a bunch of Muppets).
Since std::complex
uses templates, it is compiled by the compiler settings you give, so the code will be unoptimized. So your question is really "Why is function X faster than function Y that does the same thing, when function X is compiled with optimisation and Y is compiled without optimisation?" - which should really be obvious to answer: "Optimisation works nearly all of the time!" (If optimisation wasn't working most of the time, compiler developers would have a MUCH easier time)
Edit: So my above point has just been proven. Note that since templates can inline the code, it is often more efficient than an external library (because the compiler can just insert the instructions straight into the flow, rather than calling out to another function).
As to float
vs. double
, the only time that float
is slower than double
is if there is ONLY double
hardware available, with two functions added to "shorten" and "lengthen" between float
and double
. I'm not aware of any such hardware. double
has more bits, so it SHOULD take longer.
Edit2:
When it comes to choosing "one solution over another", there are so many factors. Performance is one (and in some cases, the most important, in other cases not). Other aspects are "ease of use", "availability", "fit for the project", etc.
If you look at ONLY performance, you can sometimes run simple benchmarks to determine that one solution is better or worse than another, but for complex libraries [not "real&imaginary" type complex numbers, but rather "complicated"], there are sometimes optimisations to deal with large amounts of data, where if you use a less sophisticated solution, the "large data" will not achieve the same performance, because less effort has been spent on solving the "big data" type problems. So, if you have a "simple" benchmark that does some basic calculations on a small set of data, and you are, in reality, going to run some much bigger datasets, the small benchmark MAY not reflect reality.
And there is no way that I, or anyone else, can tell you which solution will give you the best performance on YOUR system with YOUR datasets, unless we have access to your datasets, know exactly which calculations you are performance (that is, pretty much have your code), and have experience with running that with both "packages".
And going on to the rest of the criteria ("ease of use", etc), those are much more "personal opinion" based, so wouldn't be a good fit for an SO question in the first place.