Question

I'm trying to implement the longest common substring algorithm in C, and after reading the post below, i'm really confused about the following part:

Now, the greatest value is LCP[2]=3, but it is for SA[1] and SA[2], both of which start in the string A. So, we ignore that. On the other hand, LCP[4]=2 is for SA[3] (corresponds to the suffix bc of B) and SA[4] (corresponding to suffix bcabc#bc of A). So, this is the longest common substring.

My LCP result is also different from the post example.

https://cs.stackexchange.com/questions/9555/computing-the-longest-common-substring-of-two-strings-using-suffix-arrays

string A = abcabc

string B = bc

string separator: '#'

Suffix array

[#bc, abc#bc, abcabc#bc, bc, bc#bc, bcabc#bc, c, c#bc, cabc#bc]

LCP

[-1, 0, 3, 0, 2, 2, 0, 1, 1]

OR removing the first element

Suffix array

[abc#bc, abcabc#bc, bc, bc#bc, bcabc#bc, c, c#bc, cabc#bc]

LCP

[0, 3, 0, 2, 2, 0, 1, 1]

I see that SA[3] corresponds to bc, but SA[4] corresponds (i guess) to #bcbc. So, that's what confuses me.

Anyone can clarify on this? Thanks!

Was it helpful?

Solution

I see that SA[3] corresponds to bc, but SA[4] corresponds (i guess) to #bcbc.

There is no #bcbc to be found anywhere above. The quote says "SA[4] (corresponding to suffix bcabc#bc of A)" and that's certainly true:

 0       1          2   3      4         5  6     7        index
[abc#bc, abcabc#bc, bc, bc#bc, bcabc#bc, c, c#bc, cabc#bc] Suffix Array
[0,      3,         0,  2,     2,        0, 1,    1]       LCP

SA[2] is a suffix of B and SA[3] is a suffix of A(#B), so the longest common substring is bc, of length 2.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top