Regular expression to match comma or newline but not both

https://stackoverflow.com/questions/10446908

05-06-2021
|

Domanda

I've got a problem with the following python script which extracts some options from text in an internal company web app text area.

import re

text = 'option one\noption two, option three, option four'
correct = 'option one, option two, option three, option four'

pattern = re.compile('(\s*[,]\s*)')
fixed = pattern.sub(', ', text)

print fixed
option one
option two, option three, option four

print fixed.split(', ')
['option one\noption two', 'option three', 'option four']

This obviously fails to split up 'option one\noption two' into 'option one', 'option two'

So the input could end up as

option one
option two, option three, option four

which would need to be converted to

option one, option two, option three, option four

it works fine if its a comma

a comma followed by a newline

but not if its just a newline by itself.

Soluzione

Extend your character class from [,] to [,\n], maybe? Also, why don't you split on the regex directly, rather than search-and-replacing first and then splitting? This function: http://docs.python.org/library/re.html?highlight=re.split#re.split could come handy for this.

Altri suggerimenti

Can you just try

(\s*(,|\n)\s*)

Or probably even better

(\s*[,\n]\s*)

...I always forget you can put \n in a character class...

I got there without a regex:

print [x.strip() for x in text.replace('\n', ', ').split(', ')]

Result:

['option one', 'option two', 'option three', 'option four']

I'm not claiming this to be a good answer for your usage case. If you need to add extra delimiters it means adding an extra .replace() for each.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow