Question

Deterministic automata to find number of subsequences in string ? How can I construct a DFA to find number of occurence string as a subsequence in another string?

eg. In "ssstttrrriiinnngggg" we have 3 subsequences which form string "string" ?

also both string to be found and to be searched only contain characters from specific character Set . I have some idea about storing characters in stack poping them accordingly till we match , if dont match push again . Please tell DFA solution ?

Was it helpful?

Solution

OVERLAPPING MATCHES

If you wish to count the number of overlapping sequences then you simply construct a DFA that matches the string, e.g.

1 -(if see s)-> 2 -(if see t)-> 3 -(if see r)-> 4 -(if see i)-> 5 -(if see n)-> 6 -(if see g)-> 7

and then compute the number of ways of being in each state after seeing each character using dynamic programming. See the answers to this question for more details.

DP[a][b] = number of ways of being in state b after seeing the first a characters
         = DP[a-1][b] + DP[a-1][b-1] if character at position a is the one needed to take state b-1 to b
         = DP[a-1][b] otherwise

Start with DP[0][b]=0 for b>1 and DP[0][1]=1.

Then the total number of overlapping strings is DP[len(string)][7]

NON-OVERLAPPING MATCHES

If you are counting the number of non-overlapping sequences, then if we assume that the characters in the pattern to be matched are distinct, we can use a slight modification:

DP[a][b] = number of strings being in state b after seeing the first a characters
         = DP[a-1][b] + 1 if character at position a is the one needed to take state b-1 to b and  DP[a-1][b-1]>0
         = DP[a-1][b] - 1 if character at position a is the one needed to take state b to b+1 and DP[a-1][b]>0
         = DP[a-1][b] otherwise

Start with DP[0][b]=0 for b>1 and DP[0][1]=infinity.

Then the total number of non-overlapping strings is DP[len(string)][7]

This approach will not necessarily give the correct answer if the pattern to be matched contains repeated characters (e.g. 'strings').

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top