Question

I have learned recently that size_t was introduced to help future-proof code against native bit count increases and increases in available memory. The specific use definition seems to be on the storing of the size of something, generally an array.

I now must wonder how far this future proofing should be taken. Surely it is pointless to have an array length defined using the future-proof and appropriately sized size_t if the very next task of iterating over the array uses say an unsigned int as the index array:

void (double* vector, size_t vectorLength) {
    for (unsigned int i = 0; i < vectorLength; i++) {
        //...
    }
}

In fact in this case I might expect the syntax strictly should up-convert the unsigned int to a size_t for the relation operator.

Does this imply the iterator variable i should simply be a size_t?

Does this imply that any integer in any program must become functionally identified as to whether it will ever be used as an array index?

Does it imply any code using logic that develops the index programmatically should then create a new result value of type size_t, particularly if the logic relies on potentially signed integer values? i.e.

double foo[100];
//...
int a = 4;
int b = -10;
int c = 50;

int index = a + b + c;
double d = foo[(size_t)index];

Surely though since my code logic creates a fixed bound, up-converting to the size_t provides no additional protection.

Was it helpful?

Solution

You should keep in mind the automatic conversion rules of the language.

Does this imply the iterator variable i should simply be a size_t?

Yes it does, because if size_t is larger than unsigned int and your array is actually larger than can be indexed with an unsigned int, then your variable (i) can never reach the size of the array.

Does this imply that any integer in any program must become functionally identified as to whether it will ever be used as an array index?

You try to make it sound drastic, while it's not. Why do you choose a variable as double and not float? Why would you make a variable as unsigned and one not? Why would you make a variable short while another is int? Of course, you always know what your variables are going to be used for, so you decide what types they should get. The choice of size_t is one among many and it's similarly decided.

In other words, every variable in a program should be functionally identified and given the correct type.

Does it imply any code using logic that develops the index programmatically should then create a new result value of type size_t, particularly if the logic relies on potentially signed integer values?

Not at all. First, if the variable can never have negative values, then it could have been unsigned int or size_t in the first place. Second, if the variable can have negative values during computation, then you should definitely make sure that in the end it's non-negative, because you shouldn't index an array with a negative number.

That said, if you are sure your index is non-negative, then casting it to size_t doesn't make any difference. C11 at 6.5.2.1 says (emphasis mine):

A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2th element of E1 (counting from zero).

Which means whatever type of index for which some_pointer + index makes sense, is allowed to be used as index. In other words, if you know your int has enough space to contain the index you are computing, there is absolutely no need to cast it to a different type.

OTHER TIPS

Surely it is pointless to have an array length defined using the future-proof and appropriately sized size_t if the very next task of iterating over the array uses say an unsigned int as the index array

Yes it is. So don't do it.

In fact in this case I might expect the syntax strictly should up-convert the unsigned int to a size_t for the relation operator.

It will only be promoted in that particular < operation. The upper limit of your int variable will not be changed, so the ++ operation will always work with an int, rather than a size_t.

Does this imply the iterator variable i should simply be a size_t?

Does this imply that any integer in any program must become functionally identified as to whether it will ever be used as an array index?

Yeah well, it is better than int... But there is a smarter way to write programs: use common sense. Whenever you declare an array, you can actually stop and consider in advance how many items the array would possibly need to store. If it will never contain more than 100 items, there is absolutely no reason for you to use int nor to use size_t to index it.

In the 100 items case, simply use uint_fast8_t. Then the program is optimized for size as well as speed, and 100% portable.

Whenever declaring a variable, a good programmer will activate their brain and consider the following:

  • What is the range of the values that I will store inside this variable?
  • Do I actually need to store negative numbers in it?
  • In the case of an array, how many values will I need in the worst-case? (If unknown, do I have to use dynamic memory?)
  • Are there any compatibility issues with this variable if I decide to port this program?

As opposed to a bad programmer, who does not activate their brain but simply types int all over the place.

As discussed by Neil Kirk, iterators are a future proof counterpart of size_t.

An additional point in your question is the computation of a position, and this typically includes an absolute position (e.g. a in your example) and possibly one or more relative quantities (e.g. b or c), potentially signed.

The signed counterpart of size_t is ptrdiff_t and the analogous for iterator type I is typename I::difference_type.

As you describe in your question, it is best to use the appropriate types everywhere in your code, so that no conversions are needed. For memory efficiency, if you have e.g. an array of one million positions into other arrays and you know these positions are in the range 0-255, then you can use unsigned char; but then a conversion is necessary at some point.

In such cases, it is best to name this type, e.g.

using pos = unsigned char;

and make all conversions explicit. Then the code will be easier to maintain, should the range 0-255 increase in the future.

Yep, if you use int to index an array, you defeat the point of using size_t in other places. This is why you can use iterators with STL. They are future proof. For C arrays, you can use either size_t, pointers, or algorithms and lambdas or range-based for loops (C++11). If you need to store the size or index in variables, they will need to be size_t or other appropriate types, as will anything else they interact with, unless you know the size will be small. (For example, if you store the distance between two elements which will always be in a small range, you can use int).

double *my_array;
for (double *it = my_array, *end_it = my_array + my_array_size, it != end_it; ++it)
{
    // use it
}

std::for_each(std::begin(my_array), std::end(my_array), [](double& x)
{
    // use x
});

for (auto& x : my_array)
{
    // use x
}

Does this imply that any integer in any program must become functionally identified as to whether it will ever be used as an array index?

I'll pick that point, and say clearly Yes. Besides, in most cases a variable used as an array index is only used as that (or something related to it).

And this rule does not only apply here, but also in other circumstances: There are many use cases where nowadays a special type exists: ptrdiff_t, off_t (which even may change depeding on the configuration we use!), pid_t and a lot of others.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top