Question

I'm trying to find ALL common subsequences in an array of strings in ruby not just the longest single subsequence.

This means that if the input is

["aaaF hello", "aaaG hello", "aaaH hello"]

The expected output is

["aaa", " hello"]

I've been messing with the longest single subsequence algorithm, but can't figure out how to get the appropriate output. The problem with most approaches is they have other elements in the final array like "a", "aa", "h", "he", "hel", "hell"

Was it helpful?

Solution

Well, these smaller subsequences like "a" and "aa" are common subsequences, so that wouldn't be incorrect.

If you really only want the longest common subsequences (i.e. those subsequences not contained in any other common subsequence), what you need to do is to check whether these subsequences are part of the larger common subsequence, and if so, discard them.

This can be done by (in pseudocode)

finalsubsequences = copy(subsequences);
for(subseq in subsequences) {
   for(subseq2 in subsequences) {
       if(subseq2.indexOf(subseq) != false)
       // subseq contains subseq2, thus discard subseq2
       finalsubsequences.remove(subseq2);
   }
}

Good luck!

OTHER TIPS

  1. choose a string s from the array.
  2. iteratate over all substrings of s that are shorter by one then s.
  3. for each substring check to see if it exists in the entire array.
  4. if it is, add it to the result and continue., if it is not, go to 1 with the substring as s.

heres some code in python/pseudo:

A = String["aaaf hello","aaag hello"]

def find(s):
   res = []
   for sub in [s[1:],s[:-1]]
     if sub in all items in A:
       res.append(sub)
     else:
       res.append(find(sub))
   return res
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top