Question

What would be the most pythonic way of removing single newlines but keeping multiple newlines from a string?

As in

"foo\n\nbar\none\n\rtwo\rthree\n\n\nhello"

turning into

"foo\n\nbar one two three\n\n\nhello"

I was thinking about using splitlines(), then replacing empty lines by "\n" and then concatenating everything back again, but I suspect there is a better/simpler way. Maybe using regexes?

Was it helpful?

Solution

>>> re.sub('(?<![\r\n])(\r?\n|\n?\r)(?![\r\n])', ' ', s)
'foo\n\nbar one two three\n\n\nhello'

This looks for \r?\n or \n?\r and uses lookbehind and lookahead assertions to prevent there from being a newline on either side.

For what it's worth, there are three types of line endings found in the wild:

  1. \n on Linux, Mac OS X, and other Unices
  2. \r\n on Windows, and in the HTTP protocol
  3. \r on Mac OS 9 and earlier

The first two are by far the most common. If you want to limit the possibilities to just those three, you could do:

>>> re.sub('(?<![\r\n])(\r?\n|\r)(?![\r\n])', ' ', s)
'foo\n\nbar one two three\n\n\nhello'

And of course, get rid of the |\r if you don't care about Mac line endings, which are rare.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top