Question

Is there a cross-platform library function that would collapse a multiline string into a single-line string with no repeating spaces?

I've come up with some snip below, but I wonder if there is a standard function which I could just import which is perhaps even optimized in C?

def collapse(input):
    import re
    rn = re.compile(r'(\r\n)+')
    r = re.compile(r'\r+')
    n = re.compile(r'\n+')
    s = re.compile(r'\ +')
    return s.sub(' ',n.sub(' ',r.sub(' ',rn.sub(' ',input))))

P.S. Thanks for good observations. ' '.join(input.split()) seems to be the winner as it actually runs faster about twice in my case compared to search-replace with a precompiled r'\s+' regex.

Was it helpful?

Solution

The built-in string.split() method will split on runs of whitespace, so you can use that and then join the resulting list using spaces, like this:

' '.join(my_string.split())

Here's a complete test script:

TEST = """This
is        a test\twith a
  mix of\ttabs,     newlines and repeating
whitespace"""

print ' '.join(TEST.split())
# Prints:
# This is a test with a mix of tabs, newlines and repeating whitespace

OTHER TIPS

You had the right idea, you just needed to read the python manual a little more closely:

import re
somewhitespace = re.compile(r'\s+')
TEST = """This
is        a test\twith a
  mix of\ttabs,     newlines and repeating
whitespace"""

somewhitespace.sub(' ', TEST)

'This is a test with a mix of tabs, newlines and repeating whitespace'
multi_line.replace('\n', '')

will do the job. '\n' is a universal end of line character in python.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top