Python PLY Lex ambiguity

https://stackoverflow.com/questions/14147529

python
lex
ply
ambiguity

12-01-2022
|

Question

I have a problem with ambiguity on tokens level.

The problem looks like this. My code looks like this so token t_UN1 has higher precedence.

t_ignore = ' \t\v\r' # whitespace 

....

def t_UN1(t): #NS_
    r'NS\_'
    return t
def t_IDENTIFIER(t):
    r'[a-zA-Z][a-zA-Z0-9_]*'
    return t

....

I would like to achieve that eg. string: NS_XYZ is identified as "IDENTIFIER" and single NS_ surrounded by white spaces is identified as "UN_1".

How shall I handle that ? Currently string NS_XYZ is simply splited into two tokens UN1 and IDENTIFIER

Solution

If you're looking to get 'single NS_ surrounded by white spaces', you can add the white space character class into your token string:

def t_UN1(t): #NS_
    r'\s+NS\_\s+'
    return t

Side note: for PLY questions, the ply-hack google group is a good place to ask PLY-related questions.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow