Cannot split a seemingly encoded string

https://stackoverflow.com/questions/16104374

04-04-2022
|

質問

While parsing data from a web request, I came across the following string -

dateRange = 'September\xa04,\xa01978 – September 1980'

The encoding of the extracted string seems to be Latin-1 (based on \xa0). I got rid of that by replacing the codes with spaces.

dateRange = dateRange.replace(u'\xa0', u' ')

Keeping that aside, I can't split the string on the hyphen(-).

When I call split() as follows:

print(dateRange.split('-'))

The output is as follows:

['September\xa04,\xa01978 – September 1980']

It is as if there was no hyphen in the string. I sense that it has something to do with the encoding, but I can't seem to comprehend the issue exactly.

So, how to work around this issue?

EDIT:

I have already tried the following to no avail:

dateRange.split('\-')

解決

That's not an hyphen. That's an U+2013 ᴇɴ ᴅᴀsʜ.

Just copy & paste it into your split call:

dateRange.split('–')

Alternatively, you can replace it with an actual hyphen. Make sure to copy & paste the en dash into the replace call :)

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow