Come effettuo il parsing rientri e dedents con pyparsing?

https://stackoverflow.com/questions/1547944

20-09-2019
|

Domanda

Qui è un sottoinsieme della grammatica Python:

single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE

stmt: simple_stmt | compound_stmt
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE

small_stmt: pass_stmt
pass_stmt: 'pass'

compound_stmt: if_stmt
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]

suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT

(Potete leggere l'intera grammatica nel repository SVN Python: http://svn.python.org/.../Grammar )

Sto cercando di utilizzare questa grammatica per generare un parser per Python, in Python. Quello che sto avendo difficoltà con è come esprimere i gettoni INDENT e DEDENT come oggetti pyparsing.

Ecco come ho implementato gli altri terminali:

import pyparsing as p

string_start = (p.Literal('"""') | "'''" | '"' | "'")
string_token = ('\\' + p.CharsNotIn("",exact=1) | p.CharsNotIn('\\',exact=1))
string_end = p.matchPreviousExpr(string_start)

terminals = {
    'NEWLINE': p.Literal('\n').setWhitespaceChars(' \t')
        .setName('NEWLINE').setParseAction(terminal_action('NEWLINE')),
    'ENDMARKER': p.stringEnd.copy().setWhitespaceChars(' \t')
        .setName('ENDMARKER').setParseAction(terminal_action('ENDMARKER')),
    'NAME': (p.Word(p.alphas + "_", p.alphanums + "_", asKeyword=True))
        .setName('NAME').setParseAction(terminal_action('NAME')),
    'NUMBER': p.Combine(
            p.Word(p.nums) + p.CaselessLiteral("l") |
            (p.Word(p.nums) + p.Optional("." + p.Optional(p.Word(p.nums))) | "." + p.Word(p.nums)) +
                p.Optional(p.CaselessLiteral("e") + p.Optional(p.Literal("+") | "-") + p.Word(p.nums)) +
                p.Optional(p.CaselessLiteral("j"))
        ).setName('NUMBER').setParseAction(terminal_action('NUMBER')),
    'STRING': p.Combine(
            p.Optional(p.CaselessLiteral('u')) +
            p.Optional(p.CaselessLiteral('r')) +
            string_start + p.ZeroOrMore(~string_end + string_token) + string_end
        ).setName('STRING').setParseAction(terminal_action('STRING')),

    # I can't find a good way of parsing indents/dedents.
    # The Grammar just has the tokens NEWLINE, INDENT and DEDENT scattered accross the rules.
    # A single NEWLINE would be translated to NEWLINE + PEER (from pyparsing.indentedBlock()), unless followed by INDENT or DEDENT
    # That NEWLINE and IN/DEDENT could be spit across rule boundaries. (see the 'suite' rule)
    'INDENT': (p.LineStart() + p.Optional(p.Word(' '))).setName('INDENT'),
    'DEDENT': (p.LineStart() + p.Optional(p.Word(' '))).setName('DEDENT')
}

terminal_action è una funzione che restituisce l'azione parsing corrispondente, a seconda dei suoi argomenti.

Sono consapevole della funzione pyparsing.indentedBlock aiuto, ma io non riesco a capire come adottare che ad una grammatica, senza il token PEER.

(Guardate il pyparsing codice souce per vedere che cosa sto parlando)

Si può vedere il mio codice sorgente completo qui: http://pastebin.ca/1609860

Soluzione

Ci sono un paio di esempi sul pyparsing wiki Esempi che potrebbe dare alcuni spunti:

Per usare indentedBlock di pyparsing, penso che si potrebbe definire come suite:

indentstack = [1]
suite = indentedBlock(stmt, indentstack, True)

Si noti che indentedGrammarExample.py pre-date l'inclusione di indentedBlock in pyparsing, così fa il suo L'implementazione di parsing trattino.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow