Find long connected similar subsequences in two given sequences

https://cs.stackexchange.com/questions/63704

04-11-2019
|

質問

I'm looking for pointers to algorithms which will find long connected similar subsequences which two given sequences have in common. For example, in case of two strings:

abcaabbaabUVWXYZ
UVWXeYZababababab

I'm interested in:

**********UVWXYZ
UVWXYeZ**********

Not in:

ab*aabba*b*****
******aba*ab*bab

(which would be one possible longest common subsequence for the given strings).

For the example above, e represents a (small) difference in the otherwise identical strings UVWXYZ and UVWXeYZ. This is where the similarity comes in. e is not necessarily an addition of a single character. It may be a change as well. When thinking on longer strings, multiple characters (even in direct succession) may be different.

The algorithm should probably be driven by a rating function for the length and the similarity of subsequences.

I'm aware that this problem is rather vague, so any pointers to possibly related problem domains and corresponding algorithms are appreciated as well.

Update: Removed exclusion criterion "LCS", because it actually seems to be what I'm looking for.

正しい解決策はありません

ライセンス： CC-BY-SA と帰属

所属していません cs.stackexchange