Question

im using the python module ply.lex to write a lexer. I got some of my tokens specified with regular expression but now im stuck. I've a list of Keywords who should be a token. data is a list with about 1000 Keywords which should be all recognised as one sort of Keyword. This can be for example: _Function1 _UDFType2 and so on. All words in the list are separated by whitespaces thats it. I just want that lexer to recognise the words in this list, so that it would return a token of type `KEYWORD.

data = 'Keyword1 Keyword2 Keyword3 Keyword4'
def t_KEYWORD(t):
    # ... r'\$' + data ??
    return t

text = '''
Some test data


even more

$var = 2231




$[]Test this 2.31 + / &
'''

autoit = lex.lex()
autoit.input(text)
while True:
    tok = autoit.token()
    if not tok: break
    print(tok)

So i was trying to add the variable to that regex, but it didnt work. I'm always gettin: No regular expression defined for rule 't_KEYWORD'.

Thank you in advance! John

Was it helpful?

Solution

As @DSM suggests you can use the TOKEN decorator. The regular expression to find tokens like cat or dog is 'cat|dog' (that is, words separated by '|' rather than a space). So try:

from ply.lex import TOKEN
data = data.split() #make data a list of keywords

@TOKEN('|'.join(data))
def t_KEYWORD(t):
    return t

OTHER TIPS

ply.lex uses the docstring for the regular expression. Notice the order which you define tokens defines their precedence, which this is usually important to manage.

.

The docstring at the top cannot be an expression, so you need to do this token definition by token definition.

We can test this in the interpreter:

def f():
    "this is " + "my help"  #not a docstring :(
f.func_doc #is None
f.func_doc = "this is " + "my help" #now it is!

Hence this ought to work:

def t_KEYWORD(token):
    return token
t_KEYWORD.func_doc=r'REGULAR EXPRESSION HERE' #can be an expression

Not sure if this works with ply, but the docstring is the __doc__ attribute of a function so if you write a decorator that takes a string expression and sets that to the __doc__ attribute of the function ply might use that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top