Question

I have an array of 100,000 strings, all of length $k$. I want to compare each string to every other string to see if any two strings differ by 1 character. Right now, as I add each string to the array, I'm checking it against every string already in the array, which has a time complexity of $\frac{n(n-1)}{2} k$.

Is there a data structure or algorithm that can compare strings to each other faster than what I'm already doing?

Some additional information:

  • Order matters: abcde and xbcde differ by 1 character, while abcde and edcba differ by 4 characters.

  • For each pair of strings that differ by one character, I will be removing one of those strings from the array.

  • Right now, I'm looking for strings that differ by only 1 character, but it would be nice if that 1 character difference could be increased to, say, 2, 3, or 4 characters. However, in this case, I think efficiency is more important than the ability to increase the character-difference limit.

  • $k$ is usually in the range of 20-40.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top