Question

I'm trying to solve this issue when importing a CSV file. I try to save a string variable that contains latin-1 characters and when I try to print them, it changes it to an encoding. Is there anything I can do to keep the encoding? I simply want to keep the character as it is, nothing else.

Here's the issue (as seen from Django's manage shell

>>> variable = "{'job_title': 'préventeur'}"
>>> variable
"{'job_title': 'pr\xc3\xa9venteur'}"

Why does Django or Python automatically change the string? Do I have to change the characterset or something?

Anything will help. Thanks!

Was it helpful?

Solution

Your terminal is entering encoded characters; you are using UTF-8, and thus Python receives two bytes when you type é.

Decode from UTF-8 in that case:

>>> print 'pr\xc3\xa9venteur'.decode('utf8')
préventeur

You really want to read up on Python and Unicode though:

OTHER TIPS

"{'job_title': 'pr\xc3\xa9venteur'}"

The characters have been encoded into UTF-8 for you, which is pretty nice, because you don't want to stick with Latin-1 if you value your sanity. Convert to Unicode for best results:

>>> '\xc3\xa9'.decode('UTF-8')
u'é'

Have you tried using print statement instead?

>>> variable = "{'job_title': 'préventeur'}"

>>> variable
"{'job_title': 'pr\x82venteur'}"

>>> repr(variable)
'"{\'job_title\': \'pr\\x82venteur\'}"'

>>> print variable
{'job_title': 'préventeur'}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top