Question

What information does the Standard library of C++ use when parsing a (float) number?

Here's the possibilities I know to parse a (single) float number with std c++:

It seems obvious, that at the very least, we have to know what character is used as decimal separator.

iostreams, in particular num_get::get, in addition also talk about:

  • ios_base I/O format flags - Is there any information here that is used when parsing floating point?
  • the thousands_separator (* see below)

On the other hand, in std::strtod, which seems to be what sscanf is defined in terms of (which in turn is referenced by num_get), there the only variable information seems to be what is considered a space and the decimal character, although it doesn't seem to be specified where that is defined. (At least neither on cppref nor on MSDN.)

So, what information is actually used, and what comprises a valid parseable float representation for the C++ Standard lib?

From what I see, only the decimal separator from the global (Cor C++ ???) is needed and, in addition, if the number contains a thousands separator, I would expect it to only be parsed correctly by num_get since strod/sscanf do not support the thousands separator.


(*) The group (thousands) separator is an interesting case to me. As far as I can tell the "C" functions do not make any reference to it and last time I checked C and C++ standard printf function will never write it. So is it really processed by the strtod/scanf functions? (I know that there is a POSIX printf extension for the group separator, but that's not really standard, and notably missing from Microsoft's implementation.)

Was it helpful?

Solution

The C11 spec for strtod() seems to have a opening big enough for any size truck to drive through. It appears so open ended, I see no limitation.

§7.22.1.3 6 In other than the "C" locale, additional locale-specific subject sequence forms may be accepted.


For non- "standard C" locales, the isspace(), decimal (radix) point, group separator, digits per group and sign seem to constitute the typical variants. But apparently there is no limit.


For fun experimented with 500+ locales using printf(), sscanf(), strftime() and isspace().

All tested locales had a radix (decimal) point of '.' or ',', the same +/- sign, no digit grouping, and the expected 0-9.

strftime(... "%Y" ...) did not use a digit separator over years 1000-99999.

sscanf("1,234.5", "%lf", .. and sscanf("1.234,5", "%lf", .. did not produce 1234.5 in any locale.

All int values in the range 0 to 255 produced the same isspace() results with the sometimes exception of 154 and 160.

Of course these test do not prove a limit to what may occur, but do represent a sample of possibilities.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top