Foreach-loop with break/return vs. while-loop with explicit invariant and post-condition

https://softwareengineering.stackexchange.com/questions/376688

07-02-2021
|

Question

This is the most popular way (it seems to me) of checking if a value is in an array:

for (int x : array)
{
    if (x == value)
        return true;
}
return false;

However, in a book I’ve read many years ago by, probably, Wirth or Dijkstra, it was said that this style is better (when compared to a while-loop with an exit inside):

int i = 0;
while (i < array.length && array[i] != value)
    i++;
return i < array.length;

This way the additional exit condition becomes an explicit part of the loop invariant, there are no hidden conditions and exits inside the loop, everything is more obvious and more in a structured-programming way. I generally preferred this latter pattern whenever possible and used the for-loop to only iterate from a to b.

And yet I cannot say that the first version is less clear. Maybe it is even clearer and easier to understand, at least for very beginners. So I’m still asking myself the question of which one is better?

Maybe someone can give a good rationale in favor of one of the methods?

Update: This is not a question of multiple function return points, lambdas or finding an element in an array per se. It’s about how to write loops with more complex invariants than a single inequality.

Update: OK, I see the point of people who answer and comment: I mixed-in the foreach loop here, which itself is already much more clear and readable than a while-loop. I should not have done that. But this is also an interesting question, so let's leave it as it is: foreach-loop and an extra condition inside, or a while-loop with an explicit loop invariant and a post-condition after. It seems that the foreach-loop with a condition and an exit/break is winning. I will create an additional question without the foreach-loop (for a linked list).

Solution

I think for simple loops, such as these, the standard first syntax is much clearer. Some people consider multiple returns confusing or a code smell, but for a piece of code this small, I do not believe this is a real issue.

It gets a bit more debatable for more complex loops. If the loop's contents cannot fit on your screen and has several returns in the loop, there is an argument to be made that the multiple exit points can make the code more difficult to maintain. For example, if you had to ensure some state maintenance method ran before exiting the function, it would be easy to miss adding it to one of the return statements and you would cause a bug. If all the end conditions can be checked in a while loop, you only have one exit point and can add this code after it.

That said, with loops especially it is good to try and put as much logic as possible into separate methods. This avoids a lot of cases where the second method would have advantages. Lean loops with clearly separated logic will matter more than which of these styles you use. Also, if most of your application's code base is using one style, you should stick with that style.

OTHER TIPS

This is easy.

Almost nothing matters more than clarity to the reader. The first variant I found incredibly simple and clear.

The second 'improved' version, I had to read several times and make sure all the edge conditions were right.

There is ZERO DOUBT which is better coding style (the first is much better).

Now - what is CLEAR to people may vary from person to person. I'm not sure there are any objective standards for that (though posting to a forum like this and getting a variety of peoples inputs can help).

In this particular case, however, I can tell you why the first algorithm is more clear: I know what the C++ iterate over a container syntax looks like and does. I've internalized it. Someone UNFAMILIAR (its new syntax) with that syntax might prefer the second variation.

But once you know and understand that new syntax, its a basic concept you can just use. With the loop iteration (second) approach, you have to carefully check that the user is CORRECTLY checking for all the edge conditions to loop over the entire array (e.g. less than in stead of less-or-equal, same index used for test and for indexing etc).

int i = 0;
while (i < array.length && array[i] != value)
    i++;
return i < array.length;
[…] everything is more obvious and more in a structured-programming way.

Not quite. The variable i exists outside the while loop here and is thus part of the outer scope, while (pun intended) x of the for-loop exists only within the scope of the loop. Scope is one very important way to introduce structure to programming.

The two loops have different semantics:

The first loop simply answers a simple yes/no question: "Does the array contain the object I'm looking for?" It does so in the most brief manner possible.
The second loop answers the question: "If the array contains the object I'm looking for, what is the index of the first match?" Again, it does so in the most brief manner possible.

Since the answer to the second question does provide strictly more information than the answer to the first, you can choose to answer the second question and then derive the answer of the first question. That is what the line return i < array.length; does, anyway.

I believe that it's usually best to just use the tool that fits the purpose unless you can reuse an already existing, more flexible tool. I.e.:

Using the first variant of the loop is fine.
Changing the first variant to just set a bool variable and break is also fine. (Avoids second return statement, answer is available in a variable instead of a function return.)
Using std::find is fine (code reuse!).
However, explicitly coding a find and then reducing the answer to a bool is not.

I'll suggest a third option altogether:

return array.find(value);

There are many different reasons to iterate over an array: Check if a specific value exists, transform the array into another array, calculate an aggregate value, filter some values out of the array... If you use a plain for loop, it's unclear at a glance specifically how the for loop is being used. However, most modern languages have rich APIs on their array data structures that make these different intents very explicit.

Compare transforming one array into another with a for loop:

int[] doubledArray = new int[array.length];
for (int i = 0; i < array.length; i++) {
  doubledArray[i] = array[i] * 2;
}

and using a JavaScript-style map function:

array.map((value) => value * 2);

Or summing an array:

int sum = 0;
for (int i = 0; i < array.length; i++) {
  sum += array[i];
}

versus:

array.reduce(
  (sum, nextValue) => sum + nextValue,
  0
);

How long does it take you to understand what this does?

int[] newArray = new int[array.length];
int numValuesAdded = 0;

for (int i = 0; i < array.length; i++) {
  if (array[i] >= 0) {
    newArray[numValuesAdded] = array[i];
    numValuesAdded++;
  }
}

versus

array.filter((value) => (value >= 0));

In all three cases, while the for loop is certainly readable, you have to spend a few moments to figure out how the for loop is being used and checking that all of the counters and exit conditions are correct. The modern lambda-style functions make the purposes of the loops extremely explicit, and you know for certain that the API functions being called are implemented correctly.

Most modern languages, including JavaScript, Ruby, C#, and Java, use this style of functional interaction with arrays and similar collections.

In general, while I don't think using for loops is necessarily wrong, and it is a matter of personal taste, I've been finding myself strongly favoring using this style of working with arrays. This is specifically because of the increased clarity in determining what each loop is doing. If your language has similar features or tools in its standard libraries, I suggest you consider adopting this style as well!

It all boils down to precisely what is meant by 'better'. For practical programmers, it generally means efficient - i.e in this case, exiting directly from the loop avoids one extra comparison, and returning a Boolean constant avoids a duplicate comparison; this saves cycles. Dijkstra is more concerned with making code that is easier to prove correct. [it has seemed to me that CS education in Europe takes 'proving code correctness' far more seriously than CS education in the US, where economic forces tend to dominate coding practice]

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange