문제

While parsing data from a web request, I came across the following string -

dateRange = 'September\xa04,\xa01978 – September 1980'

The encoding of the extracted string seems to be Latin-1 (based on \xa0). I got rid of that by replacing the codes with spaces.

dateRange = dateRange.replace(u'\xa0', u' ')

Keeping that aside, I can't split the string on the hyphen(-).

When I call split() as follows:

print(dateRange.split('-'))

The output is as follows:

['September\xa04,\xa01978 – September 1980']

It is as if there was no hyphen in the string. I sense that it has something to do with the encoding, but I can't seem to comprehend the issue exactly.

So, how to work around this issue?

EDIT:

I have already tried the following to no avail:

dateRange.split('\-')
도움이 되었습니까?

해결책

That's not an hyphen. That's an U+2013 ᴇɴ ᴅᴀsʜ.

Just copy & paste it into your split call:

dateRange.split('–')

Alternatively, you can replace it with an actual hyphen. Make sure to copy & paste the en dash into the replace call :)

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top