Question

Are there any algorithms/tools to detect an a priori unknown pattern in input sequence of discrete symbols?

For example, for string "01001000100001" it is something like ("0"^i"1"), and for "01001100011100001111" it is like ("0"^i"1"^i)

I've found some approaches but they are applied when a set of patterns to detect in a sequence is a priori known. I've also found the sequitur algorithm for hierarchical structure detection in data but it does not work when the sequence is like "arithmetic progression" like in my examples.

So I'll be very thankful for any information about method/algorithm/tool/scientific paper.

No correct solution

OTHER TIPS

I believe that as someone pointed out, the general case is not solvable. Douglas Hofstadter spent a lot of time studying this problem, and describes some approaches (some automated, some manual), see the first chapter of:

http://www.amazon.com/Fluid-Concepts-And-Creative-Analogies/dp/0465024750

I believe his general approach was to do use an AI search algorithm (depth or breadth first search combined with some good heuristics). Using the algorithm would generate possible sequences using different operators (such as repeat the previous digit i times, or i/2 times) and follow branches in the search tree where the operations specified by the nodes along that branch had correctly predicted the next digit(s), until it can successfully predict the sequence far enough ahead that you are satisfied that it has the correct answer. It would then output the sequence of operations that constitute the pattern (although these operations need to be designed by the user as building blocks for the pattern generation).

Also, genetic programming may be able to solve this problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top