how to convert unicode text to utf8 text readable?

https://stackoverflow.com/questions/19621433

01-07-2022
|

Question

I got a serious problem regarding Unicode and utf8, I saved a paragraph of Arabic/Persian text file into notepad and saved it, now I saw my information like

Êæ Çíä ÓæÑÓ ÈÑäÇãå ÚÏÏ ÏáÎæÇåí Ñæ ÇÒ æÑæÏí ãííÑå æ Èå Øæá åãæä ÚÏÏ ãËáËí Ñæ ÑÓã ãí ˜äå

my question is how to get back my data, it is important for me to get this data back, thanks in advance

Solution

The paragraph was scrambled by saving as code page 1256 (Arabic/Persian), then interpreted as code page 1252 (Western Europe), and finally saved as Unicode text. You can use C# to reverse this procedure:

string scrambled = "Êæ Çíä ÓæÑÓ ÈÑäÇãå ÚÏÏ ÏáÎæÇåí Ñæ ÇÒ æÑæÏí ãííÑå æ " + 
                   "Èå Øæá åãæä ÚÏÏ ãËáËí Ñæ ÑÓã ãí ˜äå";
byte[] bytes = Encoding.GetEncoding("windows-1252").GetBytes(scrambled);
string plainText = Encoding.GetEncoding("windows-1256").GetString(bytes);
Console.WriteLine(text);

The plain text output is: "تو اين سورس برنامه عدد دلخواهي رو از ورودي ميگيره و به طول همون عدد مثلثي رو رسم مي کنه"

OTHER TIPS

On Linux you can use Gedit to open it as a 1256 encoded file:

gedit shahnameh.txt --encoding WINDOWS-1256

You can do the same work via gui. You just need select the correct encoding from "open" dialog box when opening a file. It should be at the bottom of the open dialog.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow