Determining what is the subsequence in LCS algorithm

https://stackoverflow.com/questions/20152715

04-08-2022
|

Question

I've been studing for the last week the LCS problem, and I have a question.

Whenever we are also interested in finding the longest subsequence itself (and not just its length), we create an auxiliary (string1.length X string2.length) matrix, to determing what is the subsequence by adding up arrow, left arrow or diagonal arrow, corresponding to where we came from, so we can later retrace our steps and find the subsequence itself.

(see example here: http://www.columbia.edu/~cs2035/courses/csor4231.F13/lcs.pdf at the last page)

my question is this: consider the following output matrix for running these two strings: "abfcytyruc" and "vxczcxabfc" through LCS:

- 0 0 0 0 0 0 0 0 0 0 0
- 0 0 0 0 0 0 0 1 1 1 1
- 0 0 0 0 0 0 0 1 2 2 2
- 0 0 0 0 0 0 0 1 2 3 3
- 0 0 0 1 1 1 1 1 2 3 4
- 0 0 0 1 1 1 1 1 2 3 4
- 0 0 0 1 1 1 1 1 2 3 4
- 0 0 0 1 1 1 1 1 2 3 4
- 0 0 0 1 1 1 1 1 2 3 4
- 0 0 0 1 1 1 1 1 2 3 4
- 0 0 0 1 1 2 2 2 2 3 4

is it true that we can find the common subsequence, based on the last column in the matrix alone, without the need for some more space complexity?

Solution

When using the simple dynamic programming approach (like above) you can only determine the length of the LCS with the last column, but not the actual sequence.

Here an example where it fails:

  a a a a a c b b
c 0 0 0 0 0 1 1 1
b 0 0 0 0 0 1 2 2
b 0 0 0 0 0 1 2 3
a 1 1 1 1 1 1 2 3
a 1 2 2 2 2 2 2 3
a 1 2 3 3 3 3 3 3
a 1 2 3 4 4 4 4 4

Have a look at the Hirschberg algorithm. It is a divide and conquer adaptation of the Needleman-Wunsch algorithm and works in linear space.

For further reading: http://www.akira.ruc.dk/~keld/teaching/algoritmedesign_f03/Artikler/05/Hirschberg75.pdf

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow