Domanda

How are the sizes of non IEEE754 floating-point types float, double, and long double constrained?

I know that each floating-point type must be able to represent all values from a smaller type, which implies sizeof(float) <= sizeof(double) <= sizeof(long double).

From what I can tell, the float.h/cfloat minimums require sizeof(float)*CHAR_BIT>=32, sizeof(double)*CHAR_BIT>=64, and sizeof(long double)*CHAR_BIT>=64.

Are there other constraints? If so, what are they, and do any imply a maximum on these sizes?

È stato utile?

Soluzione

I think the question is about constraints about the representable values. There are only fairly basic constraints which are not explicitly spelled out in the C++ but are spelled out in the C standard in section 5.2.4.2.2 ("Characteristics of floating types <float.h>), paragraph 11 (I'm merely quoting the values I consider interesting in this context):

The values given in the following list shall be replaced by constant expressions with implementation-defined values that are greater or equal in magnitude (absolute value) to those shown, with the same sign:

  • FLT_DECIMAL_DIG 6
  • DBL_DECIMAL_DIG 10
  • LDBL_DECIMAL_DIG 10
  • FLT_MIN_10_EXP -37
  • DBL_MIN_10_EXP -37
  • LDBL_MIN_10_EXP -37
  • FLT_MAX_10_EXP +37
  • DBL_MAX_10_EXP +37
  • LDBL_MAX_10_EXP +37
  • FLT_MAX 1E+37
  • DBL_MAX 1E+37
  • LDBL_MAX 1E+37
  • FLT_EPSILON 1E-5
  • DBL_EPSILON 1E-9
  • LDBL_EPSILON 1E-9

This pretty much says that float is likely to be smaller than double and double and long double can be the same thing and that they an be fairly far off compared to the constraints of IEEE-754.

Altri suggerimenti

From N3337:

3.9.1.8
There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined. Integral and floating types are collectively called arithmetic types. Specializations of the standard template std::numeric_limits (18.3) shall specify the maximum and minimum values of each arithmetic type for an implementation.

The C standard is also relevant here, so here what it (N1570) has to say about floating point types:

6.2.5.10
There are three real floating types, designated as float, double, and long double.42) The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double.

42) See ‘‘future language directions’’ (6.11.1).

6.11.1.1 Floating types
Future standardization may include additional floating-point types, including those with greater range, precision, or both than long double.

So as far as I can tell, floating point is almost all implementation defined. For good reason, floating point is implemented by the CPU. The standard can not make any guarantees about how big or small the various floating point types will be. If it did it might become simply incompatible with newer processors.

The float.h and cfloat headers are using their ability within the standard to define the implementation. The sizes you gave are not part of the standard.

So no, there are no other constraints.* And no, there are no implied maximum sizes.

  • This isn't strictly true. There are lots of other information defined in N1570 Section 5.2.4.2.2 but nothing that restricts floating point values in the way you're asking.
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top