Performance implications of long double. Why does C choose 64-bits instead of the hardware's 80-bit for its default?

https://stackoverflow.com/questions/10256019

02-06-2021
|

Question

For specifics I am talking about x87 PC architecture and the C compiler.

I am writing my own interpreter and the reasoning behind the double datatype confuses me. Especially where efficiency is concerned. Could someone explain WHY C has decided on a 64-bit double and not the hardware native 80-bit double? And why has the hardware settled on an 80-bit double, since that is not aligned? What are the performance implications of each? I would like to use an 80-bit double for my default numeric type. But the choices of the compiler developers make me concerned that this is not the best choice.

double on x86 is only 2 bytes shorter, why doesn't the compiler use the 10 byte long double by default?
Can I get an example of the extra precision gotten by 80-bit long double vs double?
Why does Microsoft disable long double by default?
In terms of magnitude, how much worse / slower is long double on typical x86/x64 PC hardware?

Solution

The answer, according to Mysticial, is that Microsoft uses SSE2 for its double data-type. The Floating point unit (FPU) x87 is seen as outdated and slow in comparison to modern CPU extensions. SSE2 does not support 80-bit, hence the compiler's choice of 64-bit precision.

On 32-bit x86 architecture, since all CPUs don't have SSE2 yet, Microsoft still uses the floating point unit (FPU) x87 unless the compiler switch /arch:SSE2 is given. Which makes the code incompatible with those older? CPUs.

OTHER TIPS

Wrong question. It has nothing to do with C, all languages use AFAIK as standard floating-point single precision with 32 bit and double precision with 64 bit. C as a language supporting different hardware defines only

sizeof(float) <= sizeof(double) <= sizeof(long double)

so it is perfectly acceptable that a specific C compiler uses 32bit floats for all datatypes.

Intel decided on Kahans advise that they support as much precision as possible and that calculations in less precise formats (32 & 64 bit) should be performed internally with 80bit precision.

The difference in precision and exponent range: 64bit has approx. 16 decimal digits and a max exponent of 308, 80bit has 19 digits and a max exponent of 4932.

Being much more precise and having a far greater exponent range you can calculate intermediate results without overflow or underflow and your result has less rounding errors.

So the question is why long double does not support 80bit. In fact many compilers did support it, but a lack of use and the run for benchmark performance killed it effectively.

This is actually so many questions in one, some of which are even too broad

Could someone explain WHY C has decided on a 64-bit double and not the hardware native 80-bit double?

It's irrelevant to C, because the C standard only mandates the minimum requirements for the built-in types and it's entirely up to the compiler implementation to choose whatever format they want to use for a type. Nothing prevents a C compiler to use some custom-made 77-bit floating-point type

And why has the hardware settled on an 80-bit double, since that is not aligned? What are the performance implications of each?

It's aligned to a multiple of 2 bytes. Remember that x87 dates back to 8086 + 8087.

It's a good trade-off for modern hardware implementers and software writers who needs more precision for exact rounding in double operations. Too big of a type and you'll need significantly more transistors. Double the number of bits in the significand and the multiplier will need to be 4 times as big

William Kahan, a primary designer of the x87 arithmetic and initial IEEE 754 standard proposal notes on the development of the x87 floating point: "An Extended format as wide as we dared (80 bits) was included to serve the same support role as the 13-decimal internal format serves in Hewlett-Packard’s 10-decimal calculators." Moreover, Kahan notes that 64 bits was the widest significand across which carry propagation could be done without increasing the cycle time on the 8087, and that the x87 extended precision was designed to be extensible to higher precision in future processors: "For now the 10-byte Extended format is a tolerable compromise between the value of extra-precise arithmetic and the price of implementing it to run fast; very soon two more bytes of precision will become tolerable, and ultimately a 16-byte format... That kind of gradual evolution towards wider precision was already in view when IEEE Standard 754 for Floating-Point Arithmetic was framed.

https://en.wikipedia.org/wiki/Extended_precision#IEEE_754_extended_precision_formats

As you can see, with the 64-bit significand you can share the components (adder, multiplier...) with the integer ALU.

I would like to use an 80-bit double for my default numeric type. But the choices of the compiler developers make me concerned that this is not the best choice. double on x86 is only 2 bytes shorter, why doesn't the compiler use the 10 byte long double by default?

It's actually intended for using as a temporary variable (like tmp = (b*c + d)/e) to avoid intra overflow or underflow issues without special techniques like the Kahan summation. It's not your default floating-point type. In fact so many people use floating-point literals incorrectly when they use long double or float. They forgot to add the correct suffix which results in a lack of precision and then they ask why long double is just exactly the same as double. In summary, double should be used for almost every cases, unless you're limited by bandwidth or precision and you really know what you're doing

Can I get an example of the extra precision gotten by 80-bit long double vs double?

You can print the full value and see it your own. There are also a lot of questions that are worth reading

Why does Microsoft disable long double by default?

Microsoft doesn't disable long double by default. They just choose to map long double to IEEE-754 double precision which incidentally the same format as double. The type long double can still be used normally. They did that because math on SSE is faster and more consistent. That way you'll avoid "bugs" like the below

Besides 64-bit long double doesn't have the odd size which requires compiler to pad 6 zero bytes more (or deal with a non-power-of-2 type width) which is a waste of resources.

That said, it's not even that 80-bit long double is not available on x86. Currently only MSVC abandoned the extended precision type, other compilers for x86 (like GCC, Clang, ICC...) still support it and made 80-bit IEEE-754 the default format for long double. For example GCC has -mlong-double-64/80/128 and -m96/128bit-long-double to control the exact format of long double

Or without potentially breaking ABI compatibility by changing long double, you can use GNU C floating point type names like __float80 on targets that support it. This example on Godbolt compiles to 80-bit FP math whether it targets Windows or Linux.

In terms of magnitude, how much worse / slower is long double on typical x86/x64 PC hardware?

This cannot be answered because latency and throughput depends on each specific microarchitecture. However if you do a lot of floating-point operations then double will be significantly faster, because it has fewer bits in the significand, and it can be parallelized with SIMD. For example you can working on a vector of 8 doubles at a time with AVX-512. That can't be done with the extended precision type

Also, 80-bit x87 fp load and store instructions are significantly slower than the "normal" versions that convert to/from 32 or 64-bit, and only fstp is available, not fst. See Peter Cordes's answer on retrocomputing about x87 performance on modern CPUs. (In fact that's a cross-site duplicate of this, asking why MSVC doesn't expose an 80-bit x87 type as long double.)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow