Question

I hava a string which has some whitespace. I would like to replace them with a period, but not the one which already ends with a period.

For example.

text = "This is the oldest European-settled town in the continental " \
   "U.S.\r\nExplore the town at your leisure\r\nUpgrade to add a " \
   "scenic cruise aboard \r\n"

I am trying to change it to the below by using regex.

text = "This is the oldest European-settled town in the continental " \
   "U.S. Explore the town at your leisure. Upgrade to add" \
   " a scenic cruise aboard."

What I have now is:

new_text = re.sub("(( )?(\\n|\\r\\n)+)", ". ", text).strip()

But, it does not take care of the sentence ends with a period. Should I use some lookarounds here and how?

Thanks in advance!!

Was it helpful?

Solution 2

Well, I'm not sure whether you mean the \r\n to be literal or not, so...

Literal:

>>> import re
>>> text = r"This is the oldest European-settled town in the continental U.S.\r\nExplore the town at your leisure\r\nUpgrade to add a scenic cruise aboard \r\n"
>>> result = re.sub(r'[ .]*(?:(?:\\r)?\\n)+', '. ', text).strip()
>>> print(result)
This is the oldest European-settled town in the continental U.S. Explore the town at your leisure. Upgrade to add a scenic cruise aboard.

ideone demo.

Not literal:

>>> import re
>>> text = "This is the oldest European-settled town in the continental U.S.\r\nExplore the town at your leisure\r\nUpgrade to add a scenic cruise aboard \r\n"
>>> result = re.sub(r'[ .]*(?:\r?\n)+', '. ', text).strip()
>>> print(result)
This is the oldest European-settled town in the continental U.S. Explore the town at your leisure. Upgrade to add a scenic cruise aboard.

ideone demo

I removed some of the unnecessary groups and turned some other groups into non-capture groups.

I also turned (\\n|\\r\\n)+) into a slightly more performant form (?:(?:\\r)?\\n)+)

OTHER TIPS

You can add the "." in the regexp: (( )?\.?(\\n|\\r\\n)+). If there is a "." it will also be replaced with a "."

If you're just trying to get rid of the new lines use this

text = "This is the oldest European-settled town in the continental U.S.\r\nExplore the town at your leisure\r\nUpgrade to add a scenic cruise aboard \r\n"
text = text.replace('\r\n','')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top