Data structure or algorithm for quickly finding differences between strings
-
05-11-2019 - |
Question
I have an array of 100,000 strings, all of length $k$. I want to compare each string to every other string to see if any two strings differ by 1 character. Right now, as I add each string to the array, I'm checking it against every string already in the array, which has a time complexity of $\frac{n(n-1)}{2} k$.
Is there a data structure or algorithm that can compare strings to each other faster than what I'm already doing?
Some additional information:
Order matters:
abcde
andxbcde
differ by 1 character, whileabcde
andedcba
differ by 4 characters.For each pair of strings that differ by one character, I will be removing one of those strings from the array.
Right now, I'm looking for strings that differ by only 1 character, but it would be nice if that 1 character difference could be increased to, say, 2, 3, or 4 characters. However, in this case, I think efficiency is more important than the ability to increase the character-difference limit.
$k$ is usually in the range of 20-40.
No correct solution