Pergunta

I'm building a lexer using ply in python. I have 2 tokens called TkConjuncion (which refers to logical and) and TkDisjuncion (which refers to logical and).

The rules for both of them are written as follows (there are other rules as well but irrelevant):

t_TkDisjuncion = '\\\/'
t_TkConjuncion = '\/\\'

Where \\\/ is \/ and \/\\ is /\. But when I test my code it says:

ERROR: Invalid regular expression for rule 't_TkConjuncion'. unbalanced parenthesis

The \\ is read by the lexer as \, so it accepts t_TkDisjuncion, but I don't understand why it doesn't accept the other token. I've been researching on the web but I found nothing.

Any ideas of why this is happening?

Foi útil?

Solução

I don't know, but I bet there's more than 1 level of backslash interpretation going on. Python certainly does a level when it compiles the string literals. The actual strings you create in your example are

\/

and

/\

If ply goes on to embed those in a regular expression without escaping them first (this is the part I don't know about - but think it's likely), then the trailing backslash in the second string will act to escape whatever follows it. Which is likely to be a right parenthesis, and hence an "unbalanced parentheses" complaint.

Anyway, try making these raw strings instead:

t_TkDisjuncion = r'\\\/'
t_TkConjuncion = r'\/\\'

The "r" prefix prevents Python from treating backslashes specially, so that the actual strings those lines create are

\\\/

and

\/\\

If those are then embedded in a regular expression without escaping them first (which is up to ply, not up to you), they'll do what you intended.

EDIT I'm pretty sure that's it. Looking at the ply docs, tokens are indeed specified using regexps, and the docs recommend using raw strings because of this (to avoid the double interpretation of backslashes I talked about above).

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top