Question

Suppose I have a sequence:

    Seq = 'hello my name'

and a string:

    Str = 'hello hello my friend, my awesome name is John, oh my god!'

And then I look for matches for my sequence within the string, so I get the "word" index of each match for each word of the sequence in a cell array, so the first element is a cell containing the matches for 'hello', the second element contains the matches for 'my' and the third for 'name'.

    Match = {[1 2];      %'hello' matches
             [3 5 11];   %'my' matches
             [7]}        %'name' matches

I need code to somehow get an answer saying that possible sub-sequence matches are:

    Answer = [1 3 7;     %[hello my name]
              1 5 7;     %[hello my name]
              2 3 7;     %[hello my name]
              2 5 7;]    %[hello my name]

In such a way that "Answer" contains all possible ordered sequences (that's why my(word 11) never appears in "Answer", there would have to be a "name" match after position 11.

NOTE: The length and number of matches of "Seq" may vary.

Was it helpful?

Solution

Since the length of Matches may vary, you need to use comma-separated lists, together with ndgrid to generate all combinations (the approach is similar to that used in this other answer). Then filter out combinations where the indices are not increasing, using diff and logical indexing:

cc = cell(1,numel(Match)); %// pre-shape to be used for ndgrid output
[cc{end:-1:1}] = ndgrid(Match{end:-1:1}); %// output is a comma-separated list
cc = cellfun(@(v) v(:), cc, 'uni', 0) %// linearize each cell
combs = [cc{:}]; %// concatenate into a matrix
ind = all(diff(combs.')>0); %'// index of wanted combinations
combs = combs(ind,:); %// remove unwanted combinations

The desired result is in the variable combs. In your example,

combs =
     1     3     7
     1     5     7
     2     3     7
     2     5     7
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top