Pergunta

I have a custom iterator (TokenIterator to be precise, which iterates, well, tokenized php code). Items are simple objects ("property bags" with some normalisation methods added)

I have to implement search functionality, which have to find if 1. one iterator contains another or 2. two (or more) iterators are overlapping (with some parametrisation).

Currently i use naive approach to (1) - O(NxM) double loop search, and (2) is not implemented yet.

Before starting to reimplement really smart string-search algorithms I would like to know if there exists some effective implementation of that? Maybe something buried deep in some framework or generic library to reuse? And which algorithm will be most suitable here?

Foi útil?

Solução

The first thing that comes to mind is that you're talking about set operations, for which iterators are arguably not the best solution.

I don't know if there's any existing solution for your problem, but, as a general solution, I'd use hash tables. For example, construct a hash table using the tokens of the first set (I'll call it set from now on, since I feel Iterator is not the best word) and you can do it in Theta(N), and then try to insert the other set in the same hash table. The first time you get a collision, you'll know there's an overlap. Of course this works well if the hash space is wide and the hash function guarantees a negligible amount of collisions, however it is always possible to code some sort of workaround.

Given PHP has associative arrays (which are a form of hash tables) you can also create an array having the tokens as the keys, which again can be done in Theta(N), and then use array_key_exists. It is absolutely possible that array_key_exists is nothing more than a linear scan of the list of keys, as I am not familiar with PHP's internals, but I'm quite confident that if associative arrays are implemented as hash tables, it should be implemented much more efficiently than a linear scan.

Outras dicas

If your iterators can be casted to arrays, you can use array_diff and array_intersect. Otherwise you have to implement what these functions do under the hood - walk over your structures and compare. Because the data you iterate is not sorted nor do you know anything else about it, you have no other choice.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top