Should I always use iterators when working with strings?

https://softwareengineering.stackexchange.com/questions/389748

23-02-2021
|

Pergunta

Here is the known old way to iterate over the string:

   for (int i = 0; i < str.length(); i++) {
      char c = str[i];
   }

However recently I have also seen in multiple places the usage of iterators:

   for (auto i = str.begin(); i != str.end(); ++i) {
      char c = *i;
   }

I have no difficulties in understanding both constructs. However the version without iterators looks to me (maybe subjectively) somewhat simpler to read. I do not like excessive ()s, I do not like *i and the != creates some unsafe feeling (what if i > str.end() ?). It is seen that the loop line is noticeably longer.

But all this may be subjective.

Would the first ("classic") version be seen as unprofessional in these days? The object we are iterating over is exactly the string and is really very unlikely to become anything else till the retirement of the code.

P.S: I know we can also do

for (char c: str) {
}

that is cool but let's do not consider this assuming the index or iterator is required for the things we do in the loop.

This question is specific to std::string and std::wstring.

Solução

Go for what'll be least surprising to yourself and your colleagues in the future. I've heard this termed the principle of least surprise, and it's a pretty simple idea. If there's a simple way to write some code, use the simple way (at least until a simpler or less error-prone way comes along)

If you don't need the index, then you can use the range'd form:

for(char c : str) {
    // stuff
}

If you do need the index, then we can use the index form:

for(int i = 0; i < str.size(); i++) {
    char c = str[i]; 
    // stuff
}

Alternatively, if your colleagues are familiar with the standard library, you can also use std::distance:

for(char const& c : str) {
    auto i = std::distance(str.data(), &c); 
    // stuff
}

Outras dicas

Well, there are functional differences between the examples:

The index example can deal with the string being arbitrarily modified during the loop.
It buys that by always going back to it, and re-reading. Which might not be quite trivial if SSO comes into play, and the compiler cannot prove the string is never re-allocated, or at least always stays dynamically allocated.
The manual iterator example can deal with the string growing, as long as no re-allocation occurs. It is also the least concise, which does actually matter.
And yes, it is bought by an equivalent pessimisation due to always having to re-read the end-iterator unless the compiler can prove no change occurred.
The last cannot deal with the string being re-allocated or changing size at all.
Though as a bonus, the compiler need not prove that doesn't occurr for optimal code.

Now you think to disqualify that option due to not having access to index or iterator. Not So!
A trivial change:
```
for (char& c : str) {
    auto iterator = &c;
}
```

Also, the major reason to use iterators is uniformly best performance, even in the face of non-contiguuous containers. Or arbitrary other sequences not backed by a container. That is especially important when generalising with templates.

The primary reason for iterators is to allow/support generic algorithms that can work with containers of different sorts--arrays, trees, linked lists, etc. The iterator decouples the algorithmic part (the loop and any processing in it) from the container itself, and things like how you specify an item in that particular type of container.

In your case, however, you're writing algorithmic code that's tightly coupled to the container it's working with. As such, using an iterator is unlikely to provide any real benefit in this case.

That does prompt an obvious question though: specifically, whether you can decouple them, and use a generic algorithm rather than writing an explicit loop at all. You usually can, and when/if you can, it's almost always a clear win.

As such, what you should be thinking about in most cases isn't: "how should I write this loop?", but instead: "how should I eliminate this loop?"

Pragmatically? Its about clear communication

Fundamentally? Its about coupling.

Express the intentions of the code clearly.

As to what clearly means is a discussion in and of itself and everyone will arrive at their own definition.

Precision capabilities

When I'm in full control of the data-structure (its function local or a private variable of the object/class) any changes I make to the data-structure will not ripple out very far.

As a consequence this also entices me to keep my objects smaller as changing the data-structure in a 200 line class is far simpler than a 200k line class. So I naturally get high cohesion.

Because the scope is kept small, we can afford higher coupling, and use the full capabilities of the data-structure and its exact nature to express the functions with clarity. Such as by using array indexing, and memory layout.

With this clarity I can easily spot bugs, and make modifications.

The first example is a case of the client code is treating the string as a data-structure:

for (int i = 0; i < str.length(); i++) {
   char c = str[i];
}

But it does mean that string must fill the role of a data-structure. It must always provide those exact properties and that exact interface:

int length() const; //O(1) and length() <= MAX_INT
T operator[](int); //O(1) and return type is convertable to char in O(1)

and the defaults over int

++int;
int < int;

Not to mention everything else a string must provide to be considered stringy... But as it isn't used we can ignore it when substituting for another data structure.

But there are not many data-structures that can offer that kind of performance, or capabilities.

Contiguous arrays are pretty much it.
Perhaps a sparse hash table... but only up to a point.
Perhaps a constant time expression for calculating the ith member of the sequence.

And even those that do are still limited by the fact that int has a maximum value.

Just the necessary

When I do not have full control, I prefer to treat it as a service provider so that my code is as general as it can be. This does force me to consider exactly what my function needs, which often leads me to a better understanding of the issue.

The main advantage to this approach is that both the user and provider do not overly restrict each other. The provider is not forced to provide anything more than the exact capabilities, and the user does not demand more than is absolutely necessary.

This makes the function much more reusable, and I believe clearer.

for (auto i = str.begin(); i != str.end(); ++i) {
   char c = *i;
}

The second example treats the string as a service provider. It requests a specific kind of access, in this case sequential read access.

The string is free to provide this access as it sees fit within the contract of the interface:

iterator begin(sequence); //O(1) something that has forward sequential access
iterator end(sequence); //O(1) something to signify when the forward access has accessed everything
bool operator!=(iterator, iterator); //O(1) a test to determine when the forward access has reached this point in the sequence
iterator iterator.operator++(); //O(1) something to advance the forward sequential access to the next location
T iterator.operator*(); //O(1) something to access that next element.

This interface fits a specific use case (forward sequential access) like a glove.

The user does not demand any more functionality
The provider need not provide further functionality

The difference is that a Tree could provide the forward sequential access that satisfies this second interface where it could never satisfy the O(1) random access of the first interface, even though the service required by the user is forward sequential access.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange