Вопрос

I want to parse possibly nested groups in a LaTeX file: something like this:

import pyparsing as pp
qs = pp.QuotedString(quoteChar='{', endQuoteChar='}')
s = r'''{ This is a \textbf{\texttt{example}} of \textit{some $\mb{y}$ text} to parse.}'''
print qs.parseString(s)

But that can't be right (it stops on the first closing brace). The output is:

([' This is a \\textbf{\\texttt{example'], {})

How can I get a result I can iterate through, I'm thinking of a return like this, if all I wanted was the groups:

{ This is a \textbf{\texttt{example}} of \textit{some $\mb{y}$ text} to parse.}
{\texttt{example}}
{example}
{some $\mb{y}$ text}
{y}

The use-case is to test a LaTeX source file for common tagging errors.

Это было полезно?

Решение

The key here is that you nested brackets to be matched properly to their closing bracket. The grammar as you have written it will indeed stop at the first closing bracket, rather than the matching closing bracket. The solution is to define a grammar such that new opening brackets are matched as another section.

import pyparsing as pp

allSections = []
def rememberSection(m):
    allSections.append(''.join(m))
other = pp.Word(pp.printables.replace('{','').replace('}','') + ' \t\r\n')
section = pp.Forward()
section << ('{' + pp.OneOrMore(other | section) + '}').setParseAction(rememberSection)

s = r'''{ This is a \textbf{\texttt{example}} of \textit{some $\mb{y}$ text} to parse.}'''
print section.parseString(s)
print allSections

This defines what is allowed to be inside a section as everything except braces, or another section. Each brace is then matched with the corresponding closing brace. If the braces don't match, a pyparsing.ParseException will be raised.

Normally, the tokens would all be returned as a list of tokens, each of which either matches '{', '}' or a series other non-brace characters. Since we want every bracketed expression to be remembered, the parseAction here adds them to an external list. I'm not sure of any cleaner way to handle it, but this will construct the allSections list which contains the groups as you wanted.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top