In case of integer overflows what is the result of (unsigned int) * (int) ? unsigned or int?

https://stackoverflow.com/questions/721861

05-09-2019
|

Question

In case of integer overflows what is the result of (unsigned int) * (int) ? unsigned or int? What type does the array index operator (operator[]) take for char*: int, unsigned int or something else?

I was auditing the following function, and suddenly this question arose. The function has a vulnerability at line 17.

// Create a character array and initialize it with init[] 
// repeatedly. The size of this character array is specified by 
// w*h.
char *function4(unsigned int w, unsigned int h, char *init)
{
    char *buf;
    int i;

    if (w*h > 4096)
        return (NULL);

    buf = (char *)malloc(4096+1);
    if (!buf)
        return (NULL);

    for (i=0; i<h; i++)
        memcpy(&buf[i*w], init, w);  // line 17

    buf[4096] = '\0';

    return buf;
}

Consider both w and h are very large unsigned integers. The multiplication at line 9 have a chance to pass the validation.

Now the problem is at line 17. Multiply int i with unsigned int w: if the result is int, it is possible that the product is negative, resulting in accessing a position that is before buf. If the result is unsigned int, the product will always be positive, resulting in accessing a position that is after buf.

It's hard to write code to justify this: int is too large. Does anyone has ideas on this?

Is there any documentation that specifies the type of the product? I have searched for it, but so far haven't found anything.

I suppose that as far as the vulnerability is concerned, whether (unsigned int) * (int) produces unsigned int or int doesn't matter, because in the compiled object file, they are just bytes. The following code works the same no matter the type of the product:

unsigned int x = 10;
int y = -10;

printf("%d\n", x * y);  // print x * y in signed integer
printf("%u\n", x * y);  // print x * y in unsigned integer

Therefore, it does not matter what type the multiplication returns. It matters that whether the consumer function takes int or unsigned.

The question here is not how bad the function is, or how to improve the function to make it better. The function undoubtedly has a vulnerability. The question is about the exact behavior of the function, based on the prescribed behavior from the standards.

Solution

To answer your question: the type of an expression multiplying an int and an unsigned int will be an unsigned int in C/C++.

To answer your implied question, one decent way to deal with possible overflow in integer arithmetic is to use the "IntSafe" set of routines from Microsoft:

http://blogs.msdn.com/michael_howard/archive/2006/02/02/523392.aspx

It's available in the SDK and contains inline implementations so you can study what they're doing if you're on another platform.

OTHER TIPS

do the w*h calculation in long long, check if bigger than MAX_UINT

EDIT : alternative : if overflown (w*h)/h != w (is this always the case ?! should be, right ?)

Ensure that w * h doesn't overflow by limiting w and h.

The type of w*i is unsigned in your case. If I read the standard correctly, the rule is that the operands are converted to the larger type (with its signedness), or unsigned type corresponding to the signed type (which is unsigned int in your case).

However, even if it's unsigned, it doesn't prevent the wraparound (writing to memory before buf), because it might be the case (on i386 platform, it is), that p[-1] is the same as p[-1u]. Anyway, in your case, both buf[-1] and buf[big unsigned number] would be undefined behavior, so the signed/unsigned question is not that important.

Note that signed/unsigned matters in other contexts - eg. (int)(x*y/2) gives different results depending on the types of x and y, even in the absence of undefined behaviour.

I would solve your problem by checking for overflow on line 9; since 4096 is a pretty small constant and 4096*4096 doesn't overflow on most architectures (you need to check), I'd do

if (w>4096 || h>4096 || w*h > 4096)
     return (NULL);

This leaves out the case when w or h are 0, you might want to check for it if needed.

In general, you could check for overflow like this:

if(w*h > 4096 || (w*h)/w!=h || (w*h)%w!=0)

In C/C++ the p[n] notation is really a shortcut to writting *(p+n), and this pointer arithmetic takes into account the sign. So p[-1] is valid and refers to the value immediately before *p.

So the sign really matters here, the result of arithmetic operator with integer follow a set of rules defined by the standard, and this is called integer promotions.

Check out this page: INT02-C. Understand integer conversion rules

2 changes make it safer:

if (w >= 4096 || h >= 4096 || w*h > 4096)  return NULL;

...

unsigned i;

Note also that it's not less a bad idea to write to or read from past the buffer end. So the question is not whether iw may become negative, but whether 0 <= ih +w <= 4096 holds.

So it's not the type that matters, but the result of h*i. For example, it doesn't make a difference whether this is (unsigned)0x80000000 or (int)0x80000000, the program will seg-fault anyway.

For C, refer to "Usual arithmetic conversions" (C99: Section 6.3.1.8, ANSI C K&R A6.5) for details on how the operands of the mathematical operators are treated.

In your example the following rules apply:

C99:

Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.

Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

ANSI C:

Otherwise, if either operand is unsigned int, the other is converted to unsigned int.

Why not just declare i as unsigned int? Then the problem goes away.

In any case, i*w is guaranteed to be <= 4096, as the code tests for this, so it's never going to overflow.

memcpy(&buf[iw > -1 ? iw < 4097? iw : 0 : 0], init, w); I don't think the triple calculation of iw does degrade the perfomance)

w*h could overflow if w and/or h are sufficiently large and the following validation could pass.

9.      if (w*h > 4096)
10.         return (NULL);

On int , unsigned int mixed operations, int is elevated to unsigned int, in which case, a negative value of 'i' would become a large positive value. In that case

&buf[i*w]

would be accessing a out of bound value.

Unsigned arithmetic is done as modular (or wrap-around), so the product of two large unsigned ints can easily be less than 4096. The multiplication of int and unsigned int will result in an unsigned int (see section 4.5 of the C++ standard).

Therefore, given large w and a suitable value of h, you can indeed get into trouble.

Making sure integer arithmetic doesn't overflow is difficult. One easy way is to convert to floating-point and doing a floating-point multiplication, and seeing if the result is at all reasonable. As qwerty suggested, long long would be usable, if available on your implementation. (It's a common extension in C90 and C++, does exist in C99, and will be in C++0x.)

There are 3 paragraphs in the current C1X draft on calculating (UNSIGNED TYPE1) X (SIGNED TYPE2) in 6.3.1.8 Usual arithmetic coversions, N1494,

WG 14: C - Project status and milestones

Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.

Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.

Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

So if a is unsigned int and b is int, parsing of (a * b) should generate code (a * (unsigned int)b). Will overflow if b < 0 or a * b > UINT_MAX.

If a is unsigned int and b is long of greater size, (a * b) should generate ((long)a * (long)b). Will overflow if a * b > LONG_MAX or a * b < LONG_MIN.

If a is unsigned int and b is long of the same size, (a * b) should generate ((unsigned long)a * (unsigned long)b). Will overflow if b < 0 or a * b > ULONG_MAX.

On your second question about the type expected by "indexer", the answer appears "integer type" which allows for any (signed) integer index.

6.5.2.1 Array subscripting

Constraints

1 One of the expressions shall have type ‘‘pointer to complete object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.

Semantics

2 A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).

It is up to the compiler to perform static analysis and warn the developer about possibility of buffer overrun when the pointer expression is an array variable and the index may be negative. Same goes about warning on possible array size overruns even when the index is positive or unsigned.

To actually answer your question, without specifying the hardware you're running on, you don't know, and in code intended to be portable, you shouldn't depend on any particular behavior.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow