string convresion from input file

https://stackoverflow.com/questions/21349727

02-10-2022
|

Question

i'm new to python and i need some hand to work this code:

this code works right, it converts strings as i need.

# -*- coding: utf-8 -*-
import sys
import arabic_reshaper
from bidi.algorithm import get_display

reshaped_text = arabic_reshaper.reshape(u' الحركات')
bidi_text = get_display(reshaped_text)
print >>open('out', 'w'), reshaped_text.encode('utf-8') # This is ok

I get the following error when i try to read the string from a file:

# -*- coding: utf-8 -*-
import sys
import arabic_reshaper
from bidi.algorithm import get_display

with open ("/home/nemo/Downloads/mpcabd-python-arabic-reshaper-552f3f4/data.txt" , "r") as myfile:
data=myfile.read().replace('\n', '')    
reshaped_text = arabic_reshaper.reshape(data)
bidi_text = get_display(reshaped_text)
print >>open('out', 'w'), reshaped_text.encode('utf-8')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 0: ordinal not in range(128).

Any hand

Thanks

Solution

The method decode() decodes the string using the codec registered for encoding. It defaults to the default string encoding.

When you reading utf-8 encoded file, you need to use string.decode('utf8')

Write:

data = 'my data'
with open("file.txt" , "w") as f:
    f.write(data.encode('utf-8'))

Read:

with open("file.txt" , "r") as f:
    data = f.read().decode('utf-8')

OTHER TIPS

You can also use the optional encoding parameter of the built-in open function:

with open("/home/nemo/Downloads/mpcabd-python-arabic-reshaper-552f3f4/data.txt",
          'rt',
          encoding='utf8') as f:

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow