Question

The situation is the following:

  1. a 32bit integer overflows
  2. malloc, which is expecting a 64bit integer uses this integer as input

Now on a 64bit machine, which statement is correct (if any at all):

Say that the signed binary integer 11111111001101100000101011001000 is simply negative due to an overflow. This is a practical existing problem since you might want to allocate more bytes than you can describe in a 32bit integer. But then it gets read in as a 64bit integer.

  1. Malloc reads this as a 64bit integer, finding 11111111001101100000101011001000################################ with # being a wildcard bit representing whatever data is stored after the original integer. In other words, it read a result close to its maximum value 2^64 and tries to allocate some quintillion bytes. It fails.
  2. Malloc reads this as a 64bit integer, casting to 0000000000000000000000000000000011111111001101100000101011001000, possibly because it is how it is loaded into a register leaving a lot of bits zero. It does not fail but allocates the negative memory as if reading a positive unsigned value.
  3. Malloc reads this as a 64bit integer, casting to ################################11111111001101100000101011001000, possibly because it is how it is loaded into a register with # a wildcard representing whatever data was previously in the register. It fails quite unpredictably depending on the last value.
  4. The integer does not overflow at all because even though it is 32bit, it is still in a 64bit register and therefore malloc works fine.

I actually tested this, resulting in the malloc failing (which would imply either 1 or 3 to be correct). I assume 1 is the most logical answer. I also know the fix (using size_t as input instead of int).

I'd just really want to know what actually happens. For some reason I don't find any clarification on how 32bit integers are actually treated on 64bit machines for such an unexpected 'cast'. I'm not even sure if it being in a register actually matters.

Was it helpful?

Solution 2

Once an integer overflows, using its value results in undefined behavior. A program that uses the result of an int after the overflow is invalid according to the standard -- essentially, all bets about its behavior are off.

With this in mind, let's look at what's going to happen on a computer where negative numbers are stored in two's complement representation. When you add two large 32-bit integers on such a computer, you get a negative result in case of an overflow.

However, according to C++ standard, the type of malloc's argument, i.e. size_t, is always unsigned. When you convert a negative number to an unsigned number, it gets sign-extended (see this answer for a discussion and a reference to the standard), meaning that the most significant bit of the original (which is 1 for all negative numbers) is set in the top 32 bits of the unsigned result.

Therefore, what you get is a modified version of your third case, except that instead of "wildcard bit #" it has ones all the way to the top. The result is a gigantic unsigned number (roughly 16 exbibytes or so); naturally malloc fails to allocate that much memory.

OTHER TIPS

The problem with your reasoning, is that it starts with the assumption that the integer overflow will result in a deterministic and predictable operation.

This, unfortunately, is not the case: undefined behavior means that anything can happen, and notably that compilers may optimize as if it could never happen.

As a result, it is nigh impossible to predict what kind of program the compiler will produce if there is such a possible overflow.

  • A possible output is that the compiler elides the allocation because it cannot happen
  • A possible output is that the resulting value is 0-extended or sign-extended (depending on whether it's known to be positive or not) and interpreted as an unsigned integer. You may get anything from 0 to size_t(-1) and thus may allocate either too few or too much memory, or even fail to allocate, ...
  • ...

Undefined Behavior => All Bets Are Off

So if we have a specific code example, a specific compiler and platform we can probably determine what the compiler is doing. Which is the approach taken in Deep C but even then it may not be fully predictable which is a hallmark of undefined behavior, generalizing about undefined behavior is not a good idea.

We only have to take a look at the advice from the gcc documentation to see how messy it can get. The documentation offers some good advice on integer overflow, which says:

In practice many portable C programs assume that signed integer overflow wraps around reliably using two's complement arithmetic. Yet the C standard says that program behavior is undefined on overflow, and in a few cases C programs do not work on some modern implementations because their overflows do not wrap around as their authors expected.

and in the sub-section Practical Advice for Signed Overflow Issues says:

Ideally the safest approach is to avoid signed integer overflow entirely.[...]

At the end of the day it is undefined behavior and therefore unpredictable in the general case but in the case of gcc, in their implementation defined section on Integer says that integer overflow wraps around:

For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.

but in their advice about integer overflow they explain how optimization can cause problems with wraparound:

Compilers sometimes generate code that is incompatible with wraparound integer arithmetic.

So this quickly gets complicated.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top