To capture a delimiter, it's easier to use findall
instead of split
:
re.findall(r'[^\\/]+|[\\/]', string)
[^\\/]+
would find 1 or more occurrences of sub-strings that do not contain forward or backward slash. |
works as an or operator. Finally, [\\/]
will match with the occurrences of forward and backward slash. The result would provide separate sub-strings for the occurrences of forward and backward slash and string matches where they do not occur.
As for why your code didn't work, your expression is (\\/)
. When Python interpreter parses this, it sees an escaped slash and creates a string of four characters: ( \ / )
. Then, this string is sent to the regex engine, which also does escaping. It sees a slash followed by a backslash, and since backslash is not special, it "escapes" to itself, so the final expression is just (/)
. Finally, re applies this expression, splits by a backslash and captures it - exactly what you're observing.
The correct command for your approach would be re.split('([\\\/])',string)
due to double escaping.
The moral of the story: always use raw literals r"..."
with regexes to avoid double escaping issues.