سؤال

I'm looking for an Algorithm (Preferably with a java implementation) for merging Strings.

my problem is as following :

suppose I have an Array/List of Strings {"myString1" , "my String1" , "my-String-1" ... } I'd like the algorithm to point out that there is a very high probability that all of these values denote the "myString1".

so I would like to compact my list. maybe this can be done with KMP or maybe there is something more suitable.

Thanks.

هل كانت مفيدة؟

المحلول

I think that Edit distance is good heuristic for merging strings.

EDIT:

You can modify the edit distance algorithm:

You can give different value for d(-,c) for character c.

So in the following example: "String1","String2", you can "punish" the score but letting d(1,2) be high, in contrast to "String 1","String1" that won't be punished because the score will be d(-,' ').

نصائح أخرى

Alternatively, Approximate string matching could be of some use. I dont believe KMP would suit the purpose, because it is designed for precise substring matching

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top