Python PLY parser

https://stackoverflow.com/questions/7908898

14-02-2021
|

سؤال

I've tried to search around for the answer to this question, but can't seem to find one. I'm trying to write a parser in Python using PLY for a made up language. A simplified version of my BNF looks like this:

statement-list -> statement ',' statement-list |
                 'print' expr

statement -> ident 'was' 'a' type |
             ident 'became' expr

type -> 'number' | 'letter'

expr -> factor |
       expr '+' factor |
       expr '-' factor

factor -> number | letter | ident

where number and letter are like int and char.

The Yacc documentation (http://www.dabeaz.com/ply/ply.html#ply_nn23) only shows the syntax for simple arithmetic expressions where it's clear what p[0] should be.

def p_expression_plus(p):
   'expression : expression PLUS term'
    p[0] = p[1] + p[3]

My question is what do I do for statement-list in my BNF? I've got:

def p_statement_list_comma(p):
    'statement-list : statement COMMA statement-list'

but I'm really not sure what to put next. Any help would be very much appreciated!

المحلول

It really depends on how you structure your code and how you want to evaluate it. If you are evaluating as you go along, provided it evaluates in the right order, you don't want you probably don't want anything after the docstring of p_statement_list_comma i.e. just like how you had it - the statements will be evaluated anyway, and if needed you can keep a global dictionary of variables or something similar to keep track of some state such as identifier values.

If you want to build up a parse tree e.g. for evaluation separately if you don't like ply's order of evaluation, you might do something like this:

def p_statement_list_comma(p):
    'statement-list : statement COMMA statement-list'
    p[0] = [p[1]] + p[3]

def p_statement_print_expr(p):
    'statement-list : PRINT expr'
    p[0] = [p[2]]

This would then give you a list of statements, with the last element in the list being an expression. This uses lists for simplicity; you can use your own classes too if you want - just assign whatever python object you want to p[0] and it will be available to the level above.

If you want the result of the print expression being returned from yacc.parse (the value from the top level of the parse tree will be returned from yacc.parse), you might do it like this:

def p_statement_list_comma(p):
    'statement-list : statement COMMA statement-list'
    p[0] = p[3]

def p_statement_print_expr(p):
    'statement-list : PRINT expr'
    p[0] = p[2]

نصائح أخرى

I can't speak for a PLY solution, but here is one using pyparsing. Sometimes, a pyparsing example can be useful even if you eventually want to implement your parser using some other library, as a quick-and-dirty prototype/exercise. Unfortunately, this example makes heavy use of the operatorPrecedence method, which buries a lot of the infix parsing magic, so I don't know how easily you will be able to translate it. A more traditional expr/term/factor parser example can be found at the pyparsing wiki on the Examples page (http://pyparsing.wikispaces.com/Examples), titled fourFn.py.

bnf = """
statement-list -> statement ',' statement-list

statement -> ident 'was' 'a' type | 
             ident 'became' expr |
             'print' expr |
             'if' conditional-expr statement

type -> 'number' | 'letter' 

expr -> factor | 
       expr '+' factor | 
       expr '-' factor 

factor -> number | letter | ident 
"""

from pyparsing import (CaselessKeyword, Word, nums, alphas, alphanums, operatorPrecedence, 
    Forward, MatchFirst, opAssoc, oneOf, Group, delimitedList)

PRINT, WAS, A, BECAME, NUMBER, LETTER, IF, ELSE, TRUE, FALSE, AND, OR, NOT = map(
    CaselessKeyword,
    "print was a became number letter if else true false and or not".upper().split())
keyword = MatchFirst([PRINT, WAS, A, BECAME, NUMBER, LETTER, IF, ELSE, TRUE, FALSE, AND, OR, NOT])

typeSpecifier = NUMBER | LETTER

number = Word(nums)
ident = ~keyword + Word(alphas, alphanums+'_')
operand = number | ident

expr = operatorPrecedence(operand,
    [
    ('-', 1, opAssoc.RIGHT),
    (oneOf('* /'), 2, opAssoc.LEFT),
    (oneOf('+ -'), 2, opAssoc.LEFT),
    ])

comparisonExpr = operatorPrecedence(expr,
    [
    ("!", 1, opAssoc.RIGHT),
    (oneOf("< > = <= >= !="), 2, opAssoc.LEFT),
    ])

booleanExpr = operatorPrecedence(TRUE | FALSE | comparisonExpr,
    [
    (NOT, 1, opAssoc.RIGHT),
    (AND, 2, opAssoc.LEFT),
    (OR, 2, opAssoc.LEFT),
    ])

statement = Forward()
printStmt  = PRINT + expr
wasaStmt   = ident + WAS + A + typeSpecifier
becameStmt = ident + BECAME + expr
ifStmt = IF + booleanExpr + statement
statement << Group(printStmt | wasaStmt | becameStmt | ifStmt)

statementList = delimitedList(statement)

tests = """\
    x was a number
    y became 2+5
    print y
    print 100*(5+2)
    print 100*5+2
    if 5 > y print 1000
    if y < 10 y became y+1, print y
    """.splitlines()[:-1]

for t in tests:
    print t.strip()
    for s in statementList.parseString(t).asList():
        print(s)
    print

Prints:

x was a number
['x', 'WAS', 'A', 'NUMBER']

y became 2+5
['y', 'BECAME', ['2', '+', '5']]

print y
['PRINT', 'y']

print 100*(5+2)
['PRINT', ['100', '*', ['5', '+', '2']]]

print 100*5+2
['PRINT', [['100', '*', '5'], '+', '2']]

if 5 > y print 1000
['IF', ['5', '>', 'y'], ['PRINT', '1000']]

if y < 10 y became y+1, print y
['IF', ['y', '<', '10'], ['y', 'BECAME', ['y', '+', '1']]
['PRINT', 'y']

I took the liberty of adding print as a type of statement so it could appear anywhere in your program body. Also, I tried adding an IF-THEN statement, and this does show how adding such a control-flow statement starts to take you down the path of writing a recursive grammar (didn't need recursion just for 'was a', 'became', and 'print').

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow