I think I got it.
Escape = Suppress(Literal("\\")) + Word("\\)]:", exact=1)
Text = Combine(ZeroOrMore(Escape ^ Regex("[^\\]\\\\:]")))
There could be some edge cases I missed, but this works for me for now.
Domanda
I'm trying to write a SGF parser using pyparsing. The parser is mostly done; but I can't figure out the Text
token. Here is my current code:
import pyparsing as pp
Number = pp.Optional(pp.Literal("+") ^ pp.Literal("-")) \
+ pp.OneOrMore(pp.nums)
Real = Number + pp.Optional(pp.Literal(".") + pp.OneOrMore(pp.nums))
Double = pp.Literal("1") ^ pp.Literal("2")
Color = pp.Literal("B") ^ pp.Literal("W")
Text = """???"""
Stone = Move = Point = pp.Word("abcdefghijklm", exact=2)
ValueType = pp.Empty() ^ Number ^ Real ^ Double ^ Color \
^ Text ^ Point ^ Move ^ Stone
Compose = ValueType + pp.Literal(":") + ValueType
CValueType = ValueType ^ Compose
PropIdent = pp.Word(pp.alphas.upper(), min=1)
PropValue = pp.Literal("[") + CValueType + pp.Literal("]")
Property = PropIdent + pp.OneOrMore(PropValue)
Node = pp.Literal(";") + pp.ZeroOrMore(Property)
Sequence = pp.ZeroOrMore(Node)
GameTree = pp.Forward()
GameTree << pp.Literal("(") \
+ Sequence \
+ pp.ZeroOrMore(GameTree) \
+ pp.Literal(")")
Collection = pp.OneOrMore(GameTree)
And here is the Text
token defined in SGF spec:
Text is a formatted text. White spaces other than linebreaks are converted to space (e.g. no tab, vertical tab, ..).
Formatting: Soft line break: linebreaks preceded by a "\" (soft linebreaks are converted to "", i.e. they are removed) Hard line breaks: any other linebreaks encountered
Escaping: "\" is the escape character. Any char following "\" is inserted verbatim (exception: whitespaces still have to be converted to space!). Following chars have to be escaped, when used in Text: "]", "\" and ":" (only if used in compose data type).
The problem is escaping part, I can't figure out a grammar or regex to specify this token; it looks like I should define "Some text without unescaped ]
, \
or :
"
, but I don't see how.
Here is an example:
C[emlroka [11k\] gg]
This is a Property
containing a Text
. The Text
part is emlroka [11k\] gg
.
It looks like pyparsing.QuotedString
does what I want, but it needs enclosing characters, like "
's, so it doesn't work in my problem.
Thank you for your time.
Soluzione
I think I got it.
Escape = Suppress(Literal("\\")) + Word("\\)]:", exact=1)
Text = Combine(ZeroOrMore(Escape ^ Regex("[^\\]\\\\:]")))
There could be some edge cases I missed, but this works for me for now.