similarity between pattern ( regex ) and the value found

Question

A specific regex cannot be used to match a pattern with ambiguities without modifications that allow such ambiguities. For example, if you would like to allow an insertion of extra characters in arbitrary spots of the string being matched, the regex pattern would need to have provisions that much these arbitrary characters. This makes the pattern ugly very quickly: for example, while a pattern for matching an int is very simple,

\\d+

the same pattern that allows non-digits in between would look like this:

(\\d\\D*)+

This gets uglier and uglier as the pattern gets larger, so this approach is not very good.

I would recommend replacing a pattern-based matching with an algorithm that implements a variation of Levenshtein distance.

The original Levenshtein distance algorithm takes two strings, and returns the number of modifications that need to be done on one string in order to get the other one. Your algorithm should take a string and a pattern. The pattern should use some sort of a designator for digits (say, #) and treat all other characters "literally", as string characters. You would modify the indicator function used in the algorithm to return zero when you send it a # and any digit, and 1 otherwise.

Take a look at the implementation with two matrix rows, it is the most space-efficient. The indicator function is implemented on this line:

var cost = (s[i] == t[j]) ? 0 : 1;

Changing it to

int cost = (s[i] == t[j] || (Character.isDigit(s[i]) && t[j] == '#')) ? 0 : 1;

would allow you to "match" digits. Your code could also remove all whitespace from the string before doing the match.

You could decide on the quality of the match by checking the Levenshtein distance. A distance of zero shows a perfect match; a distance of one or two is pretty good for short patterns; a distance of five or more is probably unacceptable.