Question

I have a table that contains 2 columns, one is a id, and other is column containing long strings eg.

Id  strings
1   AGTTAGGACCTTACTCTATATCTGTTCTGTTGGTATGGAG
2   GTACTTGTATTCTGATATCTAGGGTTTTCTAATTACTTCTG
3   GTATTCTCTTTCTAGCTGATCGTAATTAAATCTTATCTAA

when the user is performing a search, I would find the longest common subsequence in the search string and all the data in the table. Eg. the search sequence is

TCTGTTCTG

1.  Its a 100% match, with the whole match found.
2.  The LCS is TCTGTTCTG, but with some gaps.
3.  The LCS is TCTGTTCT, with some gaps in BTW.

Is there a way to store the information about the match that where exactly it started finding the match and then upto where it found the match, and then where it started again and so on? So, that I can represent data in somewhat this format

First one   =>

AGTTAGGACCTTACTCTATATCTGTTCTGTTGGTATGGAG
                    |||||||||
                    TCTGTTCTG

Second one =>

GTACTTGTATTCTGATATCTAGGGTTTTCTAATTACTTCTG
 | || |  |||||
 T CT G  TTCTG

Basically somehow I could store this, start and end position for each sequence for each subsequence found, so that when I show this page again in future, I don't have to compute this match again and can somehow pick out this data about start and end from database and just show this in the format shown? I know the question might be a little hazy, but please let me know how else I can elaborate if you have any doubts?

No correct solution

OTHER TIPS

The first case is easy enough using PATINDEX.

Case 1:

select Id, PATINDEX('%TCTGTTCTG%', strings) FROM table

That should return the Id of all 'Full' matches and the starting position of the match.

Case 2:

select id, PATINDEX('%T%C%T%G%T%T%C%T%G%', strings) FROM table

This one appears to return a value for partial match, does not select 'Best' partial match)

Will get back to it when I can, lots of edge cases from what I see. (Edge cases: What if there are multiple full matches, do you need to return a match with the least amount of gap in it or just a match with gaps? Same goes for partial matches)

That should give you a start, while I think about the rest of it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top