Question

#include <stdio.h>

int arr[] = {1,2,3,4,5,6,7,8};
#define SIZE (sizeof(arr)/sizeof(int))

int main()
{
        printf("SIZE = %d\n", SIZE);
        if ((-1) < SIZE)
                printf("less");
        else
                printf("more");
}

The output after compiling with gcc is "more". Why the if condition fails even when -1 < 8?

Was it helpful?

Solution

The problem is in your comparison:

    if ((-1) < SIZE)

sizeof typically returns an unsigned long, so SIZE will be unsigned long, whereas -1 is just an int. The rules for promotion in C and related languages mean that -1 will be converted to size_t before the comparison, so -1 will become a very large positive value (the maximum value of an unsigned long).

One way to fix this is to change the comparison to:

    if (-1 < (long long)SIZE)

although it's actually a pointless comparison, since an unsigned value will always be >= 0 by definition, and the compiler may well warn you about this.

As subsequently noted by @Nobilis, you should always enable compiler warnings and take notice of them: if you had compiled with e.g. gcc -Wall ... the compiler would have warned you of your bug.

OTHER TIPS

TL;DR

Be careful with mixed signed/unsigned operations (use -Wall compiler warnings). The Standard has a long section about it. In particular, it is often but not always true that signed is value-converted to unsigned (although it does in your particular example). See this explanation below (taken from this Q&A)

Relevant quote from the C++ Standard:

5 Expressions [expr]

10 Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:

[2 clauses about equal types or types of equal sign omitted]

— Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.

— Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.

— Otherwise, both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

Your actual example

To see into which of the 3 cases your program falls, modify it slightly to this

#include <stdio.h>

int arr[] = {1,2,3,4,5,6,7,8};
#define SIZE (sizeof(arr)/sizeof(int))

int main()
{
        printf("SIZE = %zu, sizeof(-1) = %zu,  sizeof(SIZE) = %zu \n", SIZE, sizeof(-1), sizeof(SIZE));
        if ((-1) < SIZE)
                printf("less");
        else
                printf("more");
}

On the Coliru online compiler, this prints 4 and 8 for the sizeof() of -1 and SIZE, respectively, and selects the "more" branch (live example).

The reason is that the unsigned type is of greater rank than the signed type. Hence, clause 1 applies and the signed type is value-converted to the unsigned type (on most implementation, typically by preserving the bit-representation, so wrapping around to a very large unsigned number), and the comparison then proceeds to select the "more" branch.

Variations on a theme

Rewriting the condition to if ((long long)(-1) < (unsigned)SIZE) would take the "less" branch (live example).

The reason is that the signed type is of greater rank than the unsigned type and can also accomodate all the unsigned values. Hence, clause 2 applies and the unsigned type is converted to the signed type, and the comparison then proceeds to select the "less" branch.

Of course, you would never write such a contrived if() statement with explicit casts, but the same effect could happen if you compare variables with types long long and unsigned. So it illustrates the point that mixed signed/unsigned arithmetic is very subtle and depends on the relative sizes ("ranking" in the words of the Standard). In particular, there is no fixed rules saying that signed will always be converted to unsigned.

When you do comparison between signed and unsigned where unsigned has at least an equal rank to that of the signed type (see TemplateRex's answer for the exact rules), the signed is converted to the type of the unsigned.

With regards to your case, on a 32bit machine the binary representation of -1 as unsigned is 4294967295. So in effect you are comparing if 4294967295 is smaller than 8 (it isn't).

If you had enabled warnings, you would have been warned by the compiler that something fishy is going on:

warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

Since the discussion has shifted a bit on how appropriate the use of unsigned is, let me put a quote by James Gosling with regards to the lack of unsigned types in Java (and I will shamelessly link to another post of mine on the subject):

Gosling: For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up.

This is an historical design bug of C that was also repeated in C++.

It dates back to 16-bit computers and the error was deciding to use all 16 bits to represent sizes up to 65536 giving up the possibility to represent negative sizes.

This in se wouldn't have been an error if unsigned meaning was "non-negative integer" (a size cannot logically be negative) but it's a problem with the conversion rules of the language.

Given the conversion rules of the language the unsigned type in C doesn't represent a non-negative number, but it's instead more like a bitmask (the mathematical term is actually "a member of the ℤ/n ring"). To see why consider that for the C and C++ language

  • unsigned - unsigned gives an unsigned result
  • signed + unsigned gives and unsigned result

both of them clearly make no sense at all if you read unsigned as "non-negative number".

Of course saying that the size of an object is a member of ℤ/n ring doesn't make any sense at all and here it's where the error resides.

Practical implications:

Every time you deal with the size of an object be careful because the value is unsigned and that type in C/C++ has a lot of properties that are illogical for a number. Please always remember that unsigned doesn't mean "non-negative integer" but "member of ℤ/n algebraic ring" and that, most dangerous, in case of a mixed operation an int is converted to unsigned int and not the opposite.

For example:

void drawPolyline(const std::vector<P2d>& pts) {
    for (int i=0; i<pts.size()-1; i++) {
        drawLine(pts[i], pts[i+1]);
    }
}

is buggy, because if passed an empty vector of points it will do illegal (UB) operations. The reason is that pts.size() is an unsigned.

The rules of the language will convert 1 (an integer) to 1{mod n}, will perform the subtraction in ℤ/n resulting in (size-1){mod n}, will convert i also to a {mod n} representation and will do the comparison in ℤ/n.

C/C++ actually defines a < operator in ℤ/n (rarely done in math) and you will end up accessing pts[0], pts[1] ... and so on until huge numbers even if the input vector was empty.

A correct loop could be

void drawPolyline(const std::vector<P2d>& pts) {
    for (int i=1; i<pts.size(); i++) {
        drawLine(pts[i-1], pts[i]);
    }
}

but I normally prefer

void drawPolyline(const std::vector<P2d>& pts) {
    for (int i=0,n=pts.size(); i<n-1; i++) {
        drawLine(pts[i], pts[i+1]);
    }
}

in other words getting rid of unsigned as soon as possible, and just working with regular ints.

Never use unsigned to represent size of containers or counters because unsigned means "member of ℤ/n" and the size of a container is not one of those things. Unsigned types are useful, but NOT to represent size of objects.

The standard C/C++ library unfortunately made this wrong choice, and it's too late to fix it. You are not forced to do the same mistake however.

In the words of Bjarne Stroustrup:

Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea. Attempts to ensure that some values are positive by declaring variables unsigned will typically be defeated by the implicit conversion rules

well, i'm not going to repeat the strong words Paul R said, but when you are comparing unsigned and integers you are going to experience dome bad things.

do if ((-1) < (int)SIZE)

instead of your if condition

Convert the unsigned type returned from sizeof operator to signed

when you compare two unsigned and signed number compiler implicitly converts signed to unsigned.
-1 signed representation in 4 byte int is 11111111 11111111 11111111 11111111 when converted to unsigned this representation would refer to 2^16-1
So basically your are comparing that 2^16-1>SIZE, which would be true.
You have to override that by explicitly casting the unsigned value to signed. Since sizeof operator returns unsigned long long you should cast it to signed long long

if((-1)<(signed long long)SIZE)

use this if condition in your code

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top