Question

I'm trying to use pyparsing to match a multiline string that can continue in a similar fashion to those of python:

Test = "This is a long " \
       "string"

I can't find a way to make pyparsing recognize this. Here is what I've tried so far:

import pyparsing as pp

src1 = '''
Test("This is a long string")
'''

src2 = '''
Test("This is a long " \
     "string")
'''

_lp = pp.Suppress('(')
_rp = pp.Suppress(')')
_str = pp.QuotedString('"', multiline=True, unquoteResults=False)
func = pp.Word(pp.alphas)

function = func + _lp + _str + _rp
print src1
print function.parseString(src1)
print '-------------------------'
print src2
print function.parseString(src2)
Was it helpful?

Solution

The problem is that having a multi-line quoted string doesn't do what you think. A multiline quoted string is literally that -- a string with newlines inside:

import pyparsing as pp

src0 = '''
"Hello
 World
 Goodbye and go"
'''

pat = pp.QuotedString('"', multiline=True)
print pat.parseString(src0)

The output of parsing this string would be ['Hello\n World\n Goodbye and go'].

As far as I know, if you want a string that's similar to how Python's strings behave, you have to define it yourself:

import pyparsing as pp

src1 = '''
Test("This is a long string")
'''

src2 = '''
Test("This is a long"
    "string")
'''

src3 = '''

Test("This is a long" \\
     "string")
'''

_lp = pp.Suppress('(')
_rp = pp.Suppress(')')
_str = pp.QuotedString('"')
_slash = pp.Suppress(pp.Optional("\\"))
_multiline_str = pp.Combine(pp.OneOrMore(_str + _slash), adjacent=False)

func = pp.Word(pp.alphas)

function = func + _lp + _multiline_str + _rp

print src1
print function.parseString(src1)
print '-------------------------'
print src2
print function.parseString(src2)
print '-------------------------'
print src3
print function.parseString(src3)

This produces the following output:

Test("This is a long string")

['Test', 'This is a long string']
-------------------------

Test("This is a long"
    "string")

['Test', 'This is a longstring']
-------------------------

Test("This is a long" \
     "string")

['Test', 'This is a longstring']

Note: The Combine class merges the various quoted strings into a single unit so that they appear as a single string in the output list. The reason why the backslash is suppressed so that it isn't combined as a part of output string.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top