質問

While parsing data from a web request, I came across the following string -

dateRange = 'September\xa04,\xa01978 – September 1980'

The encoding of the extracted string seems to be Latin-1 (based on \xa0). I got rid of that by replacing the codes with spaces.

dateRange = dateRange.replace(u'\xa0', u' ')

Keeping that aside, I can't split the string on the hyphen(-).

When I call split() as follows:

print(dateRange.split('-'))

The output is as follows:

['September\xa04,\xa01978 – September 1980']

It is as if there was no hyphen in the string. I sense that it has something to do with the encoding, but I can't seem to comprehend the issue exactly.

So, how to work around this issue?

EDIT:

I have already tried the following to no avail:

dateRange.split('\-')
役に立ちましたか?

解決

That's not an hyphen. That's an U+2013 ᴇɴ ᴅᴀsʜ.

Just copy & paste it into your split call:

dateRange.split('–')

Alternatively, you can replace it with an actual hyphen. Make sure to copy & paste the en dash into the replace call :)

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top