Handling escapes in pyparsing

https://stackoverflow.com/questions/23163537

06-07-2023
|

Domanda

I'm trying to write a SGF parser using pyparsing. The parser is mostly done; but I can't figure out the Text token. Here is my current code:

import pyparsing as pp

Number = pp.Optional(pp.Literal("+") ^ pp.Literal("-")) \
             + pp.OneOrMore(pp.nums) 
Real   = Number + pp.Optional(pp.Literal(".") + pp.OneOrMore(pp.nums))
Double = pp.Literal("1") ^ pp.Literal("2")
Color  = pp.Literal("B") ^ pp.Literal("W")
Text   = """???"""
Stone  = Move = Point = pp.Word("abcdefghijklm", exact=2)

ValueType = pp.Empty() ^ Number ^ Real ^ Double ^ Color \
                ^ Text ^ Point ^ Move ^ Stone

Compose    = ValueType + pp.Literal(":") + ValueType
CValueType = ValueType ^ Compose

PropIdent = pp.Word(pp.alphas.upper(), min=1)
PropValue = pp.Literal("[") + CValueType + pp.Literal("]")
Property  = PropIdent + pp.OneOrMore(PropValue)

Node = pp.Literal(";") + pp.ZeroOrMore(Property)
Sequence  = pp.ZeroOrMore(Node)

GameTree = pp.Forward()
GameTree << pp.Literal("(") \
               + Sequence \
               + pp.ZeroOrMore(GameTree) \
            + pp.Literal(")")

Collection = pp.OneOrMore(GameTree)

And here is the Text token defined in SGF spec:

Text is a formatted text. White spaces other than linebreaks are converted to space (e.g. no tab, vertical tab, ..).

Formatting: Soft line break: linebreaks preceded by a "\" (soft linebreaks are converted to "", i.e. they are removed) Hard line breaks: any other linebreaks encountered

Escaping: "\" is the escape character. Any char following "\" is inserted verbatim (exception: whitespaces still have to be converted to space!). Following chars have to be escaped, when used in Text: "]", "\" and ":" (only if used in compose data type).

The problem is escaping part, I can't figure out a grammar or regex to specify this token; it looks like I should define "Some text without unescaped ], \ or :" , but I don't see how.

Here is an example:

C[emlroka [11k\] gg]

This is a Property containing a Text. The Text part is emlroka [11k\] gg.

It looks like pyparsing.QuotedString does what I want, but it needs enclosing characters, like "'s, so it doesn't work in my problem.

Thank you for your time.

Soluzione

I think I got it.

Escape = Suppress(Literal("\\")) + Word("\\)]:", exact=1)
Text   = Combine(ZeroOrMore(Escape ^ Regex("[^\\]\\\\:]")))

There could be some edge cases I missed, but this works for me for now.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow