Ensuring C++ doubles are 64 bits

https://stackoverflow.com/questions/752309

09-09-2019
|

Question

In my C++ program, I need to pull a 64 bit float from an external byte sequence. Is there some way to ensure, at compile-time, that doubles are 64 bits? Is there some other type I should use to store the data instead?

Edit: If you're reading this and actually looking for a way to ensure storage in the IEEE 754 format, have a look at Adam Rosenfield's answer below.

Solution

An improvement on the other answers (which assume a char is 8-bits, the standard does not guarantee this..). Would be like this:

char a[sizeof(double) * CHAR_BIT == 64];

BOOST_STATIC_ASSERT(sizeof(double) * CHAR_BIT == 64);

You can find CHAR_BIT defined in <limits.h> or <climits>.

OTHER TIPS

In C99, you can just check if the preprocessor symbol __STDC_IEC_559__ is defined. If it is, then you are guaranteed that a double will be an 8-byte value represented with IEEE 754 (also known as IEC 60559) format. See the C99 standard, Annex F. I'm not sure if this symbol is available in C++, though.

#ifndef __STDC_IEC_559__
#error "Requires IEEE 754 floating point!"
#endif

Alternatively, you can check the predefined constants __DBL_DIG__ (should be 15), __DBL_MANT_DIG__ (should be 53), __DBL_MAX_10_EXP__ (should be 308), __DBL_MAX_EXP__ (should be 1024), __DBL_MIN_10_EXP (should be -307), and __DBL_MIN_EXP__ (should be -1021). These should be available in all flavors of C and C++.

Check std::numeric_limits< double >::is_iec559 if you need to know whether your C++ implementation supports standard doubles. This guarantees not only that the total number of bits is 64, but also the size and position of all fields inside the double.

I don't think you should focus on the "raw size" of your double (which is generally 80 bit, not 64 bit), but rather on its precision.

Thanks to numeric_limits::digits10 this is fairly easy.

You can use the Boost static assertions to do this. Look at the Use at namespace scope example.

The solution without boost is to define the array like so

char a[ 8 == sizeof(double) ];

If the double is not 64 bits then the code will looks like

char a[0];

which is an compile time error. Just put the appropriate comment near this instruction.

See this post for a similar problem and a non-boost compile time assertion called CCASSERT.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow