Why `strtod` just ignore digits when current is already exceed `DBL_MAX*0.1`

https://stackoverflow.com/questions/18930304

29-06-2022
|

Domanda

The source code (I am not sure which version is this, it is just an excerpt from the website). At the very beginning of the for loop the comment says that "We've gotten enough digits, and we are just gonna ignore the rest".

Why is this true? And why this "doesn't necessarily mean the result will overflow."?

/* Convert NPTR to a double.  If ENDPTR is not NULL, a pointer to the
   character after the last one used in the number is put in *ENDPTR.  */
double
strtod (const char *nptr, char **endptr)
{
  register const char *s;
  short int sign;

  /* The number so far.  */
  double num;

  int got_dot;                  /* Found a decimal point.  */
  int got_digit;                /* Seen any digits.  */

  /* The exponent of the number.  */
  long int exponent;

  if (nptr == NULL) 
    {
      errno = EINVAL;
      goto noconv; 
    }

  s = nptr;

  /* Eat whitespace.  */
  while (ISSPACE (*s))
    ++s;

  /* Get the sign.  */
  sign = *s == '-' ? -1 : 1;
  if (*s == '-' || *s == '+')
    ++s;

  num = 0.0;
  got_dot = 0;
  got_digit = 0;
  exponent = 0;
  for (;; ++s)
    {
      if (ISDIGIT (*s))
        {
          got_digit = 1;

          /* Make sure that multiplication by 10 will not overflow.  */
          if (num > DBL_MAX * 0.1)
            /* The value of the digit doesn't matter, since we have already
               gotten as many digits as can be represented in a `double'.
               This doesn't necessarily mean the result will overflow.
               The exponent may reduce it to within range.

               We just need to record that there was another
               digit so that we can multiply by 10 later.  */
            ++exponent;
          else
            num = (num * 10.0) + (*s - '0');

          /* Keep track of the number of digits after the decimal point.
             If we just divided by 10 here, we would lose precision.  */
          if (got_dot)
            --exponent;
        }
      else if (!got_dot && *s == '.')
        /* Record that we have found the decimal point.  */
        got_dot = 1;
      else
        /* Any other character terminates the number.  */
        break;
    }

  if (!got_digit)
    goto noconv;

  if (TOLOWER (*s) == 'e')
    {
      /* Get the exponent specified after the `e' or `E'.  */
      int save = errno;
      char *end;
      long int exp;

      errno = 0;
      ++s;
      exp = strtol (s, &end, 10);
      if (errno == ERANGE)
        {
          /* The exponent overflowed a `long int'.  It is probably a safe
             assumption that an exponent that cannot be represented by
             a `long int' exceeds the limits of a `double'.  */
          if (endptr != NULL)
            *endptr = end;
          if (exp < 0)
            goto underflow;
          else
            goto overflow;
        }
      else if (end == s)
        /* There was no exponent.  Reset END to point to
           the 'e' or 'E', so *ENDPTR will be set there.  */
        end = (char *) s - 1;
      errno = save;
      s = end;
      exponent += exp;
    }

  if (endptr != NULL)
    *endptr = (char *) s;

  if (num == 0.0)
    return 0.0;

  /* Multiply NUM by 10 to the EXPONENT power,
     checking for overflow and underflow.  */

  if (exponent < 0)
    {
      if (num < DBL_MIN * pow (10.0, (double) -exponent))
        goto underflow;
    }
  else if (exponent > 0)
    {
      if (num > DBL_MAX * pow (10.0, (double) -exponent))
        goto overflow;
    }

  num *= pow (10.0, (double) exponent);

  return num * sign;

overflow:
  /* Return an overflow error.  */
  errno = ERANGE;
  return HUGE_VAL * sign;

underflow:
  /* Return an underflow error.  */
  if (endptr != NULL)
    *endptr = (char *) nptr;
  errno = ERANGE;
  return 0.0;

noconv:
  /* There was no number.  */
  if (endptr != NULL)
    *endptr = (char *) nptr;
  return 0.0;
}

Soluzione

To answer your first question literally, “Why is this true?”, it is because the code if (num > DBL_MAX * 0.1) causes program control not to go to the code that incorporates the current digit into the accumulating value.

The reason why the code is written this way is the author likely found it easier to stop processing digits than to design and implement a completely correct conversion routine. This code reads digits and builds a value from them in num. E.g., if the input is “1234”, the code will set num to 1, then to 12 (1•10+2), then to 123 (12•10+3), and then to 1234 (123•10+4). If the input contains so many digits that the maximum finite value of a double is approached, then it is not safe to continue this process, as the arithmetic could overflow the maximum finite value of a double. Instead, the program merely counts digits (by incrementing its exponent) so that it may adjust for them later.

Even if there are so many digits that they would, by themselves, overflow the maximum finite value of a double, the final value might not overflow because there may be a negative exponent. E.g., you could have a thousand decimal digits followed by “e-1000”, and they would, together, represent a number less than one.

This code allows rounding in floating-point operations to affect its results and should not be used when correctly rounded conversions are desired from decimal to double are desired.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow