Python convert gibbrish to hebrew [closed]

Question 1

As @Martijn suggests, decoding your original file correctly would be a better solution. If your file is Hebrew but displays array characters, it is probably being displayed as latin1 or cp1252 encoding. cp1255 looks like a close match. Perhaps your array1 isn't quite right. Also note strings are iterable so you can simplify your arrays:

# coding: utf8
array  = u'àáâãäåæçèéêëìíîïðñóôõöøùúûüýþÿ'
array1 = u'אבגדהוזחטיךכלםמןנסעףפץצקרשת'
print(array)
print(array1)
print(array.encode('cp1252').decode('cp1255',errors='replace'))

The last line above reverses the "incorrect" encoding and decodes it with cp1255 (a Hebrew encoding) instead. Output:

àáâãäåæçèéêëìíîïðñóôõöøùúûüýþÿ
אבגדהוזחטיךכלםמןנסעףפץצקרשת
אבגדהוזחטיךכלםמןנסףפץצרשת��‎‏�

It's not a perfect match, but close enough that I think your original file was encoded with cp1255.

Question 2

You used a ' where you should have used a ":

'ÿ"

for the last entry in:

array=["à","á","â","ã","ä","å","æ","ç","è","é","ê","ë","ì","í","î","ï","ð","ñ","ó","ô","õ","ö","ø","ù","ú","û","ü","ý","þ",'ÿ"]

Make that single quote a double.

As for your translation program; it sounds as if your file encoding is incorrect, or is decoded incorrectly. Perhaps you should figure out the correct encoding instead, and not blindly replace Latin-1 bytes with UTF-8 sequences for Hebrew codepoints?

If you were to use the codec module to open the file with the correct codec and decode to Unicode, you most probably will find the data is correctly encoded anyway.

I strongly urge you to study up on Unicode, codecs and Python before you continue:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder