How much an iterator should do

https://softwareengineering.stackexchange.com/questions/388333

21-02-2021
|

Pergunta

I am working on creating iterators, for strings, lists, trees, graphs, and potentially other things.

First a side note.

I have a string data type in my engine. The string is implemented as a bunch of small chunks separated physically in memory, let's say 256-byte chunks. So a single string might look like this in memory.

asdfasdfasd...256
[something else]
[something else]
fasdfasdfasdf

So the final string would be:

asdfasdfasd...256fasdfasdfasdf

However, when in code, I want to access the string like this:

string[10]

for (char in string) {
  // ...
}

Basically, it feels like a simple array.

But when you access a character in string, it has to have some context. It needs to know which chunk we are currently on, the length of chunks, and if another chunk follows this current chunk. There's a lot under the hood of for (char in string).

Given that, I was thinking of having an iterator implement this. But the first part of my confusion is how much should go into the iterator, vs. how much should go into some generic looper/api that takes the iterator as an argument.

For example.

iterator.iterate(function(item, index){

})

The iterator does everything. This would mean, if we were implementing "graph traversal", there would be an entire graph traversal algorithm in the iterator, and it would take a graph as input. This would mean that we could have multiple different iterators like a DFS and a BFS iterator, etc.

But we could instead have it split between the iterator and the outside syntax:

for (item, index in iterator.next()) {
  // ...
}

This seems like what you see more often, but it seems to make implementation harder as the state is now split between the external environment and the iterator. Maybe not though.

The reason for asking is because there are these cases:

while loop
for loop
get item at index like string[10]
get item by name (hashtable)
etc.

It seems that if you want to have a generic API to iterating these things, or accessing these properties, you need an abstraction layer like the iterator on top of it. Wondering if that is true, or if not, what options there are.

Solução

Different iterators provide different contracts. C++ probably has the most fine-grained iterator concepts, distinguishing operators that

can move forward
can move backwards
can jump to another element by random access
can read elements
can write the accessed element

Similarly, Java iterators only move in one direction but can optionally delete the current element from the containing collection.

The most general kind of iterator is an iterator that only moves forward to the next element and only provides read access to that element. This can be implemented via an object with a next() method that both moves forward and returns the next value.

There are different approaches to detect the end of the iterator. There could be an explicit method to check whether the end has been reached, but that makes the implementation of the iterator more complicated (possibly, the end cannot be detected without creating the next element, which could be costly for lazy streams). Instead, the next() method will typically return a sentinel value such as None or throw an exception.

This next()-based iterator is very primitive, but allows you to implement all kinds of semantics on top of it. Compare also how you can implement any for-loop in terms of a while-loop. In particular, an item–index iterator could be implemented either

by having the collection implement an iterator that directly yields an item–index tuple, or
by creating an iterator decorator that counts the past items, like enumerate() in Python.

Some kind of functionality is simply out of scope for an iterator, such as offering indexed access to some collection. An iterator may be a cursor into a collection, but may also not be related to any collection. Instead, iterators are sometimes used for data streams, lazy computations, or simple kinds of reactive programming. Iterators are also closely related to coroutines and can therefore be used to implement interesting control flow abstractions.

If the language already has some context of iterators, you should definitively use that protocol/interface. Do not invent your own iterator system that competes with them. In particular, do not create a callback-based iteration protocol when the language already has an object-based iterator approach.

However, some data structures benefit from letting the user decide how to traverse the data structure.

Graphs may want to offer iterators for common iteration orders (e.g. iterating a tree in-order or depth-first), but will generally want to offer the user full control as to how to traverse the graph. This could mean that you offer no iteration abstraction at all, but just expose a list of reachable nodes.
If an object graph consists of objects of different types, a callback based approach such as the Visitor Pattern could be more appropriate.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange