You can simplify your expression
(?<=\b(from|join)\s+[a-zA-Z0-9_$#-]*\.\s*)[a-zA-Z0-9_]+
It will yield the table name directly, without having to use named groups. I am using the pattern
(?<=prefix)find
This returns only the part "find" as match value. As prefix I use "from" or "join" followed by the database name the dot and possibly spaces.
\b
is the beginning or end of a word.
The last part [a-zA-Z0-9_]+
is the table name.
It makes no difference whether you have an inner join, outer join etc., so I dropped this distinction.
Comments are ugly to detect, beacuse they can contain about anything and can appear about everywhere. Also you have two types of comments -- Until end of line
and \* ... *\
.
You could try to detect comments and spaces like this
(\s|--.*?$|\\\*.*?\*\\)+
Note that the backspace and the star have to be escaped, so \*
becomes \\\*
and *\
becomes \*\\
.
$
denotes the end of line. The ?
after .*
ensures that not several comments are skipped at once, thus skipping meaningful text between them.
This will work in most cases; however, comment like structures could be included within string literals: SELECT 'hello -- world' AS greeting FROM ...
. This is not a comment! Without a full syntax analysis of the SQL text you will not be able to cover all possible cases. But you cannot do this with Regex. Regex has its limits.