Array's index and argc signedness

https://stackoverflow.com/questions/2442411

19-09-2019
|

Question

The C standard (5.1.2.2.1 Program startup) says:

The function called at program startup is named main. [...]
It shall be deﬁned with a return type of int and with no parameters:
int main(void) { /* ... */ }

or with two parameters [...] :
int main(int argc, char *argv[]) { /* ... */ }

And later says:

The value of argc shall be nonnegative.

Why shouldn't argc be defined as an unsigned int, argc supposedly meaning 'argument count'?
Should argc be used as an index for argv?

So I started wondering if the C standard says something about the type of array's index. Is it signed?

6.5.2.1 Array subscripting:

One of the expressions shall have type ‘‘pointer to object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.

It doesn't say anything about its signedness (or I didn't find it). It is pretty common to see codes using negatives array indexes (array[-1]) but isn't it undefined behavior?

Should array's indexes be unsigned?

Solution

The reason for the int in main() is historical - it's always been that way, since long before the language was standardised. The requirement of an array index is that it is within the bounds of the array (or in some circumstances, one past the end) - anything else is undefined, so the signedness is immaterial.

OTHER TIPS

1) About main() argc type: IMHO the standard continues a very old tradition (more than 30 years!), and now... it's simply too late to change things (NOTE: on most systems neither the compiler, nor the linker, nor the CPU will complain if "argc" is defined "unsigned", but you are out of the standard!)

2) On the majority of implementations argv[argc] is legal and evaluates to NULL. Indeed, an alternate way to find the end of the argument list is to iterate on argv from 0 terminating when argv[i] is NULL.

3) Array/pointer arithmetic with negative numbers is legal as far as the address range from (p-n) to p belongs to the same memory object. I.E. you can have

char array[100];
char *p;

p = &array[50];
p += -30; /* Now p points to array[20]. */

This usage of pointer arithmetic is legal because the resulting pointer still stays inside the original memory object ("array"). On most system the pointer arithmetic can be used to navigate in memory in violation of this rule, but this is NOT portable since it's completely system-dependent.

In general in C, the "principle of least surprise" implies that it is preferable to make a variable signed unless there is a good reason for it to be unsigned. This is because the type-promotion rules can lead to unexpected results when you mix signed and unsigned values: for example, if argc was unsigned then this simple comparison would lead to surprising results:

if (argc > -1)

(The -1 is promoted to unsigned int, so its value is converted to UINT_MAX, which is almost certainly greater than argc).

1) Argc is an argument count, but to be quite honest, how can you prepend an argument before the program name which argv[0]. Imagine a program called foo, you cannot simply say args1 foo args2 as that is meaningless, despite the argc being a signed type of int, i.e. no such thing as argv[-1] which will get you 'args1'...

2) The reason argc is not really an index to the argument vector (hence 'argv') as the run-time stuffs the executable program name into the zero'th offset, i.e. argv[0] hence the argc will be off by 1.

3) Array indexes, in terms of pointer manipulation, provided you are within the boundaries of the block of memory where the pointer is at, using array subscripts as negative is legal as the array subscripts are a shortcut for the pointers, and not alone that, they are commutative e.g.

char v[100];
char *p = &v[0];

You can do this:

p[55] = 'a'; 

Which is the same as

*(p + 55) = 'a';

You can even do this:

p = &v[55];

p[-10] = 'b' /* This will stuff 'b' into 45'th offset! */

Which is the same as

*(p - 10) = 'b';

Also if you use and manipulate arrays in such a way that is outside of the boundaries - that is undefined behaviour and will depend on the implementation of the run-time on how to handle it, perhaps a segmentation fault, or a program crash....

4) In *nix environments, some would have a third parameter supplied to main char **endvp, again this is rarely used in the Microsoft world of DOS/Windows. Some *nix run-time implementations, for pre-historic reasons, you could pass in the environment variables via the run-time.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow