Some derivation of Levenshtein distance comes to mind - possibly not the fastest algorithm, but it should be quick to implement.
We can ignore ^
at the start and $
at the end - anywhere else is invalid.
Then we construct a 2D grid where each row represents a unit [1] in the expression and each column represents a character in the test string.
[1]: A "unit" here refers to a single character, with the exception that *
shall be attached to the previous character
So for a*b*c
and aaabccc
, we get something like:
a a a b c c c
a*
b*
c
Each cell can have a boolean value indicating validity.
Now, for each cell, set it to valid if either of these hold:
The value in the left neighbour is valid and the row is
x*
or.*
and the column isx
(x
being any charactera-z
)This corresponds to a
*
matching one additional character.The value in the upper-left neighbour is valid and the row is
x
or.
and the column isx
(x
being any charactera-z
)This corresponds to a single-character match.
The value in the top neighbour is valid and the row is
x*
or.*
.This corresponds to the
*
matching nothing.
Then check if the bottom-right-most cell is valid.
So, for the above example, we get: (V
indicating valid)
a a a b c c c
a* V V V - - - -
b* - - - V - - -
c - - - - V - -
Since the bottom-right cell isn't valid, we return invalid.
Running time: O(stringLength*expressionLength)
.
You should notice that we're mostly exploring a fairly small part of the grid.
This solution can be improved by making it a recursive solution making use of memoization (and just calling the recursive solution for the bottom-right cell).
This will give us a best-case performance of O(1)
, but still a worst-case performance of O(stringLength*expressionLength)
.
My solution assumes the expression must match the entire string, as inferred from the result of the above example being invalid (as per the question).
If it can instead match a substring, we can modify this slightly so, if the cell is in the top row it's valid if:
The row is
x*
or.*
.The row is
x
or.
and the column isx
.