python split and re.split not capturing what appear to be either tabs or spaces in string

https://stackoverflow.com/questions/23231330

07-07-2023
|

Domanda

I have a string like:

'Agendas / Schedules meetings and speakers       4 F     1928-1209       Box 2'

And I am trying to split it on what appear to be tabs. Though if I print with print repr(str) I only see special characters at the end:

'Agendas / Schedules meetings and speakers       4 F     1928-1209       Box 2\r\n'

And if I try things like print re.split('\t+', str) or print re.split('\s+', str), nothing is split, ie output is still:

['Agendas / Schedules meetings and speakers       4 F     1928-1209       Box 2\r\n']

Is there a way to isolate these fixed width items if regex is not working out?

Update: am hoping to split exclusively on the larger white spaces, so .split() creating a list element of every word is not what I'm looking for.

Soluzione 3

Thanks everyone for the input. Didn't realize that python had a zero-width-space bug (http://bugs.python.org/issue13391)

Anyway, it appears that matching on more than one whitespace did the trick:

>>>re.split('\s{2,}', s)
['Agendas / Schedules meetings and speakers', '4 F', '1928-1209', 'Box 2']
>>>

Altri suggerimenti

I've ran across this a few times in the past, you may have a case of Zero-Width-Space.

>>> s = 'Agendas / Schedules meetings and speakers       4 F     1928-1209       Box 2'
>>> re.split(ur'[\u200b\s]+', s, flags=re.UNICODE)

['Agendas', '/', 'Schedules', 'meetings', 'and', 'speakers', '4', 'F', '1928-1209', 'Box', '2']

The split() method of string will split on whitespace by default. So:

print str.split()

should do the trick.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow