Is there a string-collapse library function in python?
-
12-09-2019 - |
Question
Is there a cross-platform library function that would collapse a multiline string into a single-line string with no repeating spaces?
I've come up with some snip below, but I wonder if there is a standard function which I could just import which is perhaps even optimized in C?
def collapse(input):
import re
rn = re.compile(r'(\r\n)+')
r = re.compile(r'\r+')
n = re.compile(r'\n+')
s = re.compile(r'\ +')
return s.sub(' ',n.sub(' ',r.sub(' ',rn.sub(' ',input))))
P.S. Thanks for good observations. ' '.join(input.split())
seems to be the winner as it actually runs faster about twice in my case compared to search-replace with a precompiled r'\s+'
regex.
Solution
The built-in string.split()
method will split on runs of whitespace, so you can use that and then join the resulting list using spaces, like this:
' '.join(my_string.split())
Here's a complete test script:
TEST = """This
is a test\twith a
mix of\ttabs, newlines and repeating
whitespace"""
print ' '.join(TEST.split())
# Prints:
# This is a test with a mix of tabs, newlines and repeating whitespace
OTHER TIPS
You had the right idea, you just needed to read the python manual a little more closely:
import re
somewhitespace = re.compile(r'\s+')
TEST = """This
is a test\twith a
mix of\ttabs, newlines and repeating
whitespace"""
somewhitespace.sub(' ', TEST)
'This is a test with a mix of tabs, newlines and repeating whitespace'
multi_line.replace('\n', '')
will do the job. '\n'
is a universal end of line character in python.