flex (python PLY) regex for strings

https://stackoverflow.com/questions/12067592

27-06-2021
|

Question

I'm using the python module PLY to write a parser, and I am implementing as I go. I have a simple rule to detect strings:

r'("|\').*("|\')'

When lexer errors are thrown I have this:

def t_error (t) :
    print 'Illegal lexer input line ' + str(t.lineno) + ' ' + t.value[:16]
    sys.exit(-1)

When I feed my parser the following input:

parse("preg_match('%^[\*\%]+$%', $keywords)")

I get back this in return:

Illegal lexer input line 1 %^[\*\%]+$%', $k

My questions are:

1) Why am I not parsing this string? It seems like my regex should properly handle this string.

2) How can I fix this?

edit:

I have narrowed the problem down a bit. The following strings throw illegal lexer input errors by themselves:

'%'
'^'

Solution

Even if this regex were working it isn't quite doing what you want it to, for example it would accept "this', which isn't really a string. This is also the cause of the "illegal lexer input"...

After having done it's job and found the first string in "preg_match(' the lexer is then upset when each of the next 11 characters %^[\*\%]+$% are illegal (and not in t_ignore), since they don't even start with " or '.

Try doing this with two cases for " and ': "Starts with quote, some things which aren't quote, ends with quote." That is:

r'("[^"]*")|(\'[^\']*\')'

Or, if you want to include escaped speech marks:

r'("(\\"|[^"])*")|(\'(\\\'|[^\'])*\')'

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow