Question

For a python program I have an input that I am taking from stdin and the input is something like:

"-------/--------\---------/------\"

When I print it out as a string value it is printed as it is. I am trying to split the string into a list of strings based of forward and backward slash while keeping the separators as well. I have used something like this:

re.split('(\\/)',string)

but the result that I get is:

['------' , '/' , '--------\\\\---------' , '/' , '---------\\\']

I was rather expecting it to be something like:

['------' , '/' , '---------' , '\' , '---------', '/' , '---------' , '\']

What am I doing wrong here and how to solve this problem?

Was it helpful?

Solution

To capture a delimiter, it's easier to use findall instead of split:

re.findall(r'[^\\/]+|[\\/]', string)

[^\\/]+ would find 1 or more occurrences of sub-strings that do not contain forward or backward slash. | works as an or operator. Finally, [\\/] will match with the occurrences of forward and backward slash. The result would provide separate sub-strings for the occurrences of forward and backward slash and string matches where they do not occur.

As for why your code didn't work, your expression is (\\/). When Python interpreter parses this, it sees an escaped slash and creates a string of four characters: ( \ / ). Then, this string is sent to the regex engine, which also does escaping. It sees a slash followed by a backslash, and since backslash is not special, it "escapes" to itself, so the final expression is just (/). Finally, re applies this expression, splits by a backslash and captures it - exactly what you're observing.

The correct command for your approach would be re.split('([\\\/])',string) due to double escaping.

The moral of the story: always use raw literals r"..." with regexes to avoid double escaping issues.

OTHER TIPS

I think, this solution gives exactly what you want:

import re
testStr = '-------/--------\\---------/------\\'
parts = re.split('(\\\\|/)', testStr)
for p in parts:
    print('p=' + p)

Result:

p=-------
p=/
p=--------
p=\
p=---------
p=/
p=------
p=\
p=
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top