It's much easier to write such things, when you think of your tokens
list as of an object, that is itself responsible for storing its position. That's why in general, the tokens
should come from a Lexer, which usually just has one method: next_token
. We'll simulate this with an iterator:
def parse(tokens):
"""parse: tokens_iter or generator -> Node
From an infix stream of tokens, and the current index into the
token stream, construct and return the tree, as a collection of Nodes,
that represent the expression."""
next_tok = next(tokens)
if next_tok.isdigit():
return ('literal', next_tok)
elif next_tok == "+":
return ('add', parse( tokens ), parse( tokens )) # first argument is the node.left, second is the node.right
elif next_tok == "-":
return ('sub', parse( tokens ), parse( tokens ))
elif next_tok == "*":
return ('mul', parse( tokens ), parse( tokens ))
elif next_tok == "//":
return ('div', parse( tokens ), parse( tokens ))
else:
return ('variable', next_tok )
# And, example:
print(parse(iter(['-', '//', 'y', '2', 'x'])))
That will print:
('sub', ('div', ('variable', 'y'), ('literal', '2')), ('variable', 'x'))
You'll also need to handle the StopIteration
exception, and turn it into a meaningful ParseError
.