Question

I am attempting to write a grammar for a small language utility using the python library parsimonious, but I am struggling with writing a part, which covers strings, especially strings with escaped quotes and other special characters.

I have the following:

string         = doubleString / singleString
doubleString   = "\"" escapedString "\""
singleString   = "'" escapedString "'"

escapedString is as if yet undefined, but should accept anything one would reasonably expect a string in a programming language to accept. I cannot think of where to begin. Does anyone have any suggestions?

Was it helpful?

Solution

I don't know parsimonious's syntax, but in a regex-style one I'd do something like:

string         = doubleString / singleString
doubleString   = ~'"([^"]|(\"))*"'
singleString   = ~"'([^']|(\'))*'"

i.e. you'd need a different escaped string for each kind of string, each made of a possibly empty sequence of either characters that are not the end quote char or escaped end quote chars.

OTHER TIPS

You may want to do two things here. The first is adding the prefix r before your string. The other thing you would probably want to do is use triple quotes i.e """. The use of the prefix will make it so that escape sequences in strings are "interpreted according to rules similar to those used by Standard C". The second is to take care of any extra quotes/apostrophes that would otherwise make your string quit early.

Consider the following example:

string = r"""this %is m%y crazy s"\tri""'""ng\s\n%\d\\r''\'"""
print string

This may not work if there are also triple quotes in the string you are feeding thou; I'm not sure how one goes about dealing with that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top