Question

I want to find a sequence of "select from where " in a string. For example,

"select * from T WHERE t.c1 =1"

should be matched.

Here is the regular expression pattern I wrote:

"SELECT\s\.*FROM\s\.*WHERE\s\.*";

But it doesn't work. What is wrong with it?

Was it helpful?

Solution

You shouldn't have backslash-escaped the dots; since you did, the engine is trying to match a literal dot, not "any character" like you're expecting.

Try:

SELECT\s.*FROM\s.*WHERE\s.*

Also, as others have posted, make sure it's in case-insensitive mode. How you do that depends on the language you're using.

OTHER TIPS

I'm not sure what regex engine you're targeting, but you might try this:

# note the trailing i, which in perl means case-insensitive
# this will also put the interesting bits into regex backreferences
#
# This also changes \s to \s+ in case you have a situation with multiple
# spaces between terms
#
/select\s+(.*)\s+from\s+(.*)\s+where\s+(.*)/i

One problem with the RE is that it's case sensitive. Depending on the form of RE, there is probably a flag to specify case insensitive matching. For example, Perl-compatible REs use a "/i" flag: /SELECT\s.*FROM\s.*WHERE\s.*/i

Assumptions:

  • Your sql statement will not span across lines.
  • You will not have two sql statements on a single line.

This works for me.

SELECT\s+?[^\s]+?\s+?FROM\s+?[^\s]+?\s+?WHERE.*

Java escaped version:

String regex = "SELECT\\s+?[^\\s]+?\\s+?FROM\\s+?[^\\s]+?\\s+?WHERE.*";

You may append a terminator instead of the .* depending on your case. Of course, you have to run it in case-insensitive mode, OR modify the regex appropriately.

Thanks your guys's reply. I got the answer from mmyers, here is my final solution:

        string szPattern = @"SELECT\s.*FROM\s.*WHERE\s.*";
        Regex  rRegEX = new Regex ( szPattern,RegexOptions.IgnoreCase | RegexOptions.Multiline );
        Match match =rRegEX.Match(testCase.Statement);
        if (match.Success)

Try this regex, I tried to be a bit more restrictive with table and column names and also consider using filter with (=, <=,> =, <,> and IN ()):

string regex = "SELECT\W(([a-zA-z0-9]+)([,]*)\W)+\WFROM\W([a-zA-z0-9#]+)(\W[a-zA-Z])*\WWHERE\W([a-zA-z0-9_]+)\W([=<>IN(]+)\W(.+)"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top