What code would I use to convert a SQL like expression to a regex on the fly?
Question
I'm looking to convert a SQL like statement on the fly to the equivalent regex i.e.
LIKE '%this%'
LIKE 'Sm_th'
LIKE '[C-P]arsen'
What's the best approach to doing this?
P.S. I'm looking to do this on the .Net Framework (C#).
Solution
The following Regex converts an SQL like pattern into a Regex pattern with the help of a MatchEvaluator
delegate. It correctly handles square bracket blocks and escapes special Regex characters.
string regexPattern = Regex.Replace(
likePattern,
@"[%_]|\[[^]]*\]|[^%_[]+",
match =>
{
if (match.Value == "%")
{
return ".*";
}
if (match.Value == "_")
{
return ".";
}
if (match.Value.StartsWith("[") && match.Value.EndsWith("]"))
{
return match.Value;
}
return Regex.Escape(match.Value);
});
OTHER TIPS
In addition to @Nathan-Baulch's solution you can use the code below to also handle the case where a custom escape character has been defined using the LIKE '!%' ESCAPE '!'
syntax.
public Regex ConvertSqlLikeToDotNetRegex(string regex, char? likeEscape = null)
{
var pattern = string.Format(@"
{0}[%_]|
[%_]|
\[[^]]*\]|
[^%_[{0}]+
", likeEscape);
var regexPattern = Regex.Replace(
regex,
pattern,
ConvertWildcardsAndEscapedCharacters,
RegexOptions.IgnorePatternWhitespace);
regexPattern = "^" + regexPattern + "$";
return new Regex(regexPattern,
!m_CaseSensitive ? RegexOptions.IgnoreCase : RegexOptions.None);
}
private string ConvertWildcardsAndEscapedCharacters(Match match)
{
// Wildcards
switch (match.Value)
{
case "%":
return ".*";
case "_":
return ".";
}
// Remove SQL defined escape characters from C# regex
if (StartsWithEscapeCharacter(match.Value, likeEscape))
{
return match.Value.Remove(0, 1);
}
// Pass anything contained in []s straight through
// (These have the same behaviour in SQL LIKE Regex and C# Regex)
if (StartsAndEndsWithSquareBrackets(match.Value))
{
return match.Value;
}
return Regex.Escape(match.Value);
}
private static bool StartsAndEndsWithSquareBrackets(string text)
{
return text.StartsWith("[", StringComparison.Ordinal) &&
text.EndsWith("]", StringComparison.Ordinal);
}
private bool StartsWithEscapeCharacter(string text, char? likeEscape)
{
return (likeEscape != null) &&
text.StartsWith(likeEscape.ToString(), StringComparison.Ordinal);
}
From your example above, I would attack it like this (I speak in general terms because I do not know C#):
Break it apart by LIKE '...', put the ... pieces into an array. Replace unescaped % signs by .*, underscores by ., and in this case the [C-P]arsen translates directly into regex.
Join the array pieces back together with a pipe, and wrap the result in parentheses, and standard regex bits.
The result would be:
/^(.*this.*|Sm.th|[C-P]arsen)$/
The most important thing here is to be wary of all the ways you can escape data, and which wildcards translate to which regular expressions.
% becomes .*
_ becomes .
I found a Perl module called Regexp::Wildcards. You can try to port it or try Perl.NET. I have a feeling you can write something up yourself too.